Hierarchical Cluster Analysis: Definition, Dendrograms, and Electrofacies
What Is Hierarchical Cluster Analysis?
Hierarchical cluster analysis groups well log measurements by computing the distance between every pair of data points and arranging the relationships on a dendrogram. Petrophysicists apply the method to define electrofacies, often after reducing the inputs through principal component analysis, then validate the clusters against core analysis.
Key Takeaways
- Hierarchical clustering links every pair of data points and visualises results on a dendrogram.
- The method delivers exact distance calculations but scales poorly above several thousand depth samples.
- Petrophysicists prefer it for small intervals and cored wells where accuracy outweighs speed.
- K-means clustering replaces the hierarchical approach for full-field datasets with millions of samples.
- Electrofacies derived this way feed permeability prediction and reservoir zonation across international basins.
How Hierarchical Cluster Analysis Works
The algorithm begins by treating each depth sample as its own cluster, then iteratively merges the two closest clusters using a chosen distance metric, typically Euclidean distance applied to standardised log curves such as gamma ray, neutron porosity, bulk density, and resistivity. The output dendrogram shows merge heights on the vertical axis, allowing the analyst to cut the tree at a chosen distance threshold to fix the number of facies.
Linkage rules govern how cluster-to-cluster distance is computed. Single linkage uses the minimum pair distance, complete linkage the maximum, and Ward's method minimises within-cluster variance. Ward's linkage typically produces compact, balanced electrofacies in carbonate and clastic reservoirs and is the default in commercial petrophysics platforms such as Techlog and IP. Computational cost scales as O(n^2) memory and O(n^2 log n) time, which constrains practical datasets to roughly 5,000 to 10,000 samples per run on a standard workstation.
Hierarchical Cluster Analysis Across International Jurisdictions
In Canada, operators in the Montney and Duvernay rely on hierarchical clustering of triple-combo logs to define rock types prior to hydraulic fracturing design, with workflows reviewed under AER Directive 083 for unconventional completions. United States operators in the Permian and Eagle Ford apply the same techniques for stacked-pay zonation, with disclosures filed under SEC modernised reserves rules requiring auditable petrophysical methods.
Norwegian operators on Johan Sverdrup and Troll integrate clustering into log analysis reports submitted to Sodir, with documentation aligned to NORSOK D-010 well integrity expectations. Australia's NOPSEMA accepts hierarchical clustering for facies-based volumetrics in Carnarvon Basin field development plans. Saudi Aramco and ADNOC apply the method on Ghawar and Upper Zakum carbonates where thin-bedded heterogeneity defeats simpler cutoff workflows.
Fast Facts
Equinor reported that hierarchical clustering of 14 log curves across 27 Johan Sverdrup wells produced eight stable electrofacies that matched 92% of the cored intervals, supporting the field's 2,800,000 bbl/d (445,308 m³/d) plateau plan submitted to Sodir.
Dendrogram Interpretation and Cluster Validation
The dendrogram is read top down. A horizontal cut at a high merge distance yields few broad facies; a lower cut yields many narrow facies. Analysts choose the cut by inspecting the largest jumps in merge height, which signal natural breaks in the data. Cophenetic correlation, typically targeted above 0.75, measures how faithfully the dendrogram preserves the original pairwise distances.
Cluster quality is validated against core photographs, thin sections, and routine core analysis at intervals where ground truth exists. Permeability, capillary pressure, and net-to-gross within each cluster should show tight distributions. Outliers often flag bad hole intervals or tool malfunctions rather than genuine geology.
Tip: Standardise log curves to zero mean and unit variance before clustering, otherwise high-magnitude curves such as bulk density dominate the distance metric. For datasets above 10,000 samples, run hierarchical clustering on a representative core-calibrated subset, then propagate the labels to the full well using k-means or a supervised classifier.
Hierarchical Cluster Analysis Synonyms and Related Terminology
Hierarchical cluster analysis is also known as:
- HCA — common abbreviation in petrophysics literature
- Agglomerative clustering — the bottom-up algorithm variant
- Dendrogram clustering — informal term referencing the output diagram
Related terms: electrofacies, principal component analysis, log analysis, core analysis
Frequently Asked Questions
When should petrophysicists prefer hierarchical clustering over k-means?
Choose hierarchical clustering when the dataset is small, fewer than about 10,000 samples, and when the number of natural facies is unknown. The dendrogram lets the analyst inspect the data structure and pick the cluster count after the fact. K-means requires the cluster count up front and scales to millions of samples but cannot reveal hierarchy or merge sequence.
How many electrofacies should a hierarchical analysis produce?
Most reservoir studies converge on between four and ten electrofacies. Fewer than four typically loses geological detail; more than ten introduces clusters that are statistical artefacts rather than physical rock types. The optimum is set by core-calibrated permeability ranges and the resolution of the logging suite, then audited against geological core descriptions.
Does hierarchical clustering require principal component analysis first?
It is not required but is widely used. Principal component analysis reduces correlated log curves to a smaller set of orthogonal components, which removes redundancy and accelerates the clustering algorithm. On a triple-combo dataset, two or three principal components often capture more than 90% of the variance, simplifying both computation and interpretation.
Why Hierarchical Cluster Analysis Matters in Oil and Gas
Reliable electrofacies underpin permeability prediction, net-pay calculation, and completion targeting across every producing basin. Hierarchical cluster analysis remains the reference workflow for cored wells because the dendrogram is auditable and the distances are exact. Operators in the Montney, Permian, North Sea, Carnarvon, and Ghawar all rely on the method to convert raw log curves into rock-type frameworks that govern multibillion-dollar development decisions.