cluster analysis

Cluster analysis in petroleum geoscience and petrophysics is a multivariate statistical technique that partitions a dataset of observations (wireline log readings, core measurements, or production parameters) into groups (clusters) such that observations within each group are more similar to each other than to observations in other groups, enabling the identification of natural groupings in high-dimensional formation evaluation data that would not be apparent from inspection of individual log curves in isolation; in Western Canada Sedimentary Basin formation evaluation and reservoir characterization, cluster analysis is applied primarily to generate electrofacies classifications from multi-curve wireline log suites (gamma-ray, neutron porosity, bulk density, photoelectric factor, deep resistivity, and sonic) across WCSB Cardium, Viking, Montney, and Duvernay intervals, where each depth level in the log is assigned to an electrofacies cluster that represents a distinct combination of lithological, mineralogical, and porosity properties observed in the wireline response, providing a continuous vertical facies classification that can be correlated between wells across a WCSB reservoir to construct a 3D reservoir model framework for volumetrics and flow simulation. The two principal cluster analysis algorithms used in WCSB petrophysical workflows are hierarchical cluster analysis and k-means cluster analysis, each with distinct computational approaches and practical strengths: hierarchical analysis computes a pairwise distance matrix between all data points (using Euclidean distance in the normalized log space, or Mahalanobis distance when log curves are correlated), builds a dendrogram (tree diagram) showing the hierarchical merging of data points and clusters from individual points at the base to a single cluster at the top, and allows the petrophysicist to select the optimal number of clusters by cutting the dendrogram at the level that yields a geologically interpretable number of electrofacies (typically 4 to 8 in WCSB Cardium and Viking log suites); k-means analysis requires the analyst to specify the number of clusters k in advance, then iteratively assigns each data point to the nearest cluster centroid and recomputes centroids until cluster assignments stabilize, making it computationally efficient for large WCSB multi-well datasets (10,000 to 100,000 depth levels) where hierarchical analysis of the full distance matrix would be computationally prohibitive. Principal component analysis (PCA) is frequently applied before cluster analysis in WCSB multi-well log datasets to transform correlated log curves into orthogonal principal components that capture maximum variance in a reduced number of dimensions, reducing computational burden while preserving geological information.

  • Electrofacies generation from wireline log cluster analysis in WCSB Cardium and Viking reservoir characterization: Electrofacies classification using k-means cluster analysis in WCSB Cardium and Viking log suites typically uses 5 to 8 clusters to represent the principal lithofacies encountered in these shallow Cretaceous clastic reservoirs: clean porous sandstone (low GR, high neutron-density separation, high resistivity in the hydrocarbon zone), argillaceous sandstone (intermediate GR, reduced porosity, lower resistivity from clay conductance), silty shale (moderate GR, compressed neutron-density, low resistivity), calcareous sandstone (low GR, high PE from calcite cement, reduced porosity), and marine shale (high GR, high neutron, low resistivity). Each electrofacies cluster is labeled by the petrophysicist based on its centroid position in the multi-dimensional log space and validated against core lithological descriptions from cored WCSB Cardium wells; the labeled electrofacies classification is then propagated to uncored wells across the WCSB pool to provide a continuous sedimentological interpretation of the reservoir interval from wireline data alone. The stability of k-means electrofacies classifications in WCSB log datasets is tested by running the algorithm with multiple random starting centroid positions (typically 10 to 20 random initializations) and checking that the cluster assignments converge to the same solution; instability (different assignments on different runs) indicates that the number of clusters k is too large for the data or that the cluster boundaries are poorly defined in the log space, requiring either reducing k or applying a pre-clustering normalization to improve separation.
  • Hierarchical cluster analysis and dendrogram interpretation in WCSB Duvernay and Montney shale log suites: Hierarchical cluster analysis is preferred over k-means for WCSB Duvernay and Montney shale log suites where the number of electrofacies is not known in advance and must be determined from the data structure itself; the dendrogram generated by agglomerative hierarchical clustering (using Ward's minimum variance linkage method, which minimizes within-cluster variance at each merging step) reveals the natural cluster hierarchy in the WCSB shale log data by showing at what distance threshold different lithofacies groups merge. In WCSB Duvernay shale log suites where GR, PE, bulk density, neutron porosity, deep resistivity, and U (uranium concentration from spectral GR) are all available, hierarchical cluster analysis of a type well typically identifies 4 to 6 natural clusters representing: carbonate-rich marl (high PE, low neutron, high resistivity, low U); silica-rich chert (low PE, low neutron, low density); organic-rich black shale (high U from spectral GR, low density from kerogen, low resistivity from pyrite); mixed siliciclastic shale (moderate GR, moderate density); and calcareous shale (intermediate PE, moderate GR). The optimal cut point in the dendrogram for WCSB Duvernay electrofacies is identified where the between-cluster distance (the height of the next dendrogram merge) shows the largest relative increase, indicating that the next merge combines fundamentally different geological facies that should remain separated for reservoir characterization purposes.
  • PCA pre-processing and dimensionality reduction for WCSB multi-well cluster analysis workflows: Principal component analysis pre-processing before k-means cluster analysis in WCSB multi-well log datasets (50 to 200 wells covering a WCSB Cardium or Viking pool) reduces the 5 to 7 input log curves to 2 to 4 principal components that explain 80 to 95 percent of total log variance, improving cluster separation by eliminating noise dimensions while retaining geologically significant variation. In WCSB Cardium log suites, PC1 typically explains 45 to 65 percent of variance and represents the shale-to-sand gradient, while PC2 explains 15 to 25 percent and represents the carbonate-to-siliciclastic gradient (high PE, low neutron at the calcite-cemented end; low PE, higher neutron in porous sandstone). Running k-means on 3 to 4 principal components rather than 6 to 7 original log curves reduces computation time for a 100-well WCSB Cardium dataset from 30 to 60 minutes to 2 to 5 minutes, and produces more robust electrofacies assignments because each dimension contributes independent geological information rather than duplicating variance from correlated log curves.
  • Completion cluster analysis and hydraulic fracture interval selection in WCSB Montney horizontal wells: Cluster analysis in WCSB horizontal well completion engineering refers to the selection and grouping of perforation clusters within each hydraulic fracture stage, where log-based rock quality clustering (using brittleness index, mineralogy from spectral GR, and closure pressure from geomechanical modeling) identifies intervals along the horizontal lateral that are mechanically similar and can be grouped into stages that will fracture uniformly. In WCSB Montney horizontal completions, the brittleness index (computed from Young's modulus and Poisson's ratio derived from dipole sonic logs) varies continuously along the 2,000 to 3,500 m lateral length; cluster analysis of the brittleness index log, together with closure pressure and clay volume, groups the lateral into 3 to 6 rock quality categories that inform stage placement, cluster spacing, and proppant loading for each stage. Stages placed in high-brittleness clusters (Young's modulus above 45 GPa, clay volume below 0.20, closure pressure 40 to 45 MPa) receive slickwater with higher proppant concentration (300 to 500 kg/m3 of 100-mesh sand) to generate complex fracture networks in the brittle rock; stages in lower-brittleness clusters receive crosslinked gel with coarser proppant to maintain fracture width in more ductile, clay-rich intervals.
  • Production cluster analysis and well performance grouping in WCSB Cardium and Montney development programs: Production cluster analysis groups WCSB wells into performance classes based on multiple production metrics (IP30, IP90, 12-month cumulative gas or oil, GOR, WOR, and production decline rate) using k-means or hierarchical methods to identify natural performance categories that correlate with completion design, landing zone, or geological factors. In WCSB Cardium waterflood performance analysis, cluster analysis of injection and production well data (injection rate, injectivity index, producer oil rate, WOR, and areal position relative to injection pattern) groups wells into performance clusters that reveal the dominant flood front geometry and identify bypassed pattern areas where additional infill drilling or injection profile correction could improve recovery. In WCSB Montney horizontal development programs, production cluster analysis of 50 to 200 wells identifies geological or completion design factors correlated with the high-performance cluster, providing a data-driven basis for landing zone and completion parameter selection for subsequent drilling phases.

Electrofacies Cluster Analysis Improving WCSB Viking Net Pay Correlation

A WCSB Viking pool study in central Alberta applied k-means cluster analysis (k equals 6) to the GR, neutron porosity, bulk density, PE, and deep resistivity logs from 34 wells across the pool, using PCA pre-processing to reduce the 5 log curves to 3 principal components before clustering. The 6 electrofacies clusters were labeled from core in 5 cored wells: clean shoreface sand, argillaceous sand, silty sandstone, calcareous-cemented sand, shaly sand, and marine shale. Electrofacies-based net pay (clean shoreface sand plus argillaceous sand above resistivity cutoff) was compared against the prior manual interpretation in all 34 wells; electrofacies net pay differed from manual interpretation by 0.3 to 2.8 m (average 1.2 m) across 22 wells. The electrofacies interpretation reclassified 6 wells as having thicker net pay (argillaceous sand missed by manual log picks) and 3 wells as thinner (calcareous cemented sand misidentified as clean reservoir). Pool OOIP revised from 3.1 million m3 to 3.4 million m3 using the electrofacies-based net pay, with reserves rebooking of 120,000 m3 of proved developed producing oil.

Fast Facts: Cluster Analysis
  • Definition: Multivariate statistical technique partitioning observations into similar groups; generates electrofacies from WCSB wireline log suites (GR, neutron, density, PE, resistivity, sonic) for reservoir characterization
  • Hierarchical: Builds dendrogram showing merge distances; Ward's linkage preferred for WCSB log data; optimal k identified where between-cluster distance shows largest relative increase; preferred when number of facies is unknown
  • K-means: Requires pre-specified k (typically 4-8 for WCSB Cardium/Viking); 10-20 random initializations for stability; computationally efficient for 10,000-100,000 depth level multi-well WCSB datasets
  • PCA pre-processing: Reduces 5-7 correlated logs to 2-4 orthogonal components explaining 80-95% of variance; first PC typically represents shale-to-sand gradient in WCSB Cardium/Viking log suites
  • Completion application: Brittleness index clustering along WCSB Montney laterals groups 2,000-3,500 m horizontal into 3-6 rock quality categories for stage placement and proppant loading optimization

Electrofacies are the primary output of cluster analysis applied to WCSB wireline log suites; each cluster represents a distinct GR, porosity, density, and resistivity combination calibrated to core lithofacies in cored WCSB Cardium and Viking wells. Principal component analysis decorrelates multi-curve WCSB log suites before cluster analysis, reducing 5-7 correlated input curves to 2-4 orthogonal components that improve cluster separation and reduce computation time. Wireline logging provides the multi-curve input for WCSB electrofacies cluster analysis; GR, neutron-density, PE, deep resistivity, and dipole sonic are the standard suite for Cardium, Viking, Montney, and Duvernay formation evaluation. Reservoir characterization in WCSB development programs uses electrofacies from cluster analysis to construct continuous vertical and lateral facies models for 3D simulation and net pay estimation. Petrophysics workflows in WCSB pool studies apply cluster analysis to achieve consistent electrofacies classification across all wells, replacing inconsistent manual log picks with a reproducible statistical classification tied to core lithology.