Autocorrelation: Definition, Seismic Deconvolution, and Wavelet Extraction

Autocorrelation is a mathematical operation that measures the similarity of a signal with a time-shifted version of itself, revealing the periodic structure, embedded wavelets, and repetitive patterns within the signal without requiring any external reference. In continuous form, the autocorrelation function R(τ) is defined as the integral of the product of a signal f(t) and its delayed copy f(t+τ) over all time: R(τ) = ∫f(t)f(t+τ)dt. For discrete seismic data sampled at interval Δt, the summation form R(k) = Σx(n)x(n+k) is applied over all sample indices n, where k is the lag index expressed in samples or milliseconds. The resulting autocorrelation function is symmetric about zero lag, peaks at zero lag where the signal is perfectly correlated with itself (the zero-lag value equals the signal's total energy), and decays toward zero at long lags for signals with no long-range periodicity. In petroleum geophysics, autocorrelation is the foundation of statistical wavelet extraction: the autocorrelation of a seismic trace is used to estimate the embedded source wavelet through the Wiener-Levinson algorithm, enabling spiking or predictive deconvolution that compresses the wavelet and improves temporal resolution. Autocorrelation also identifies seismic multiples by revealing secondary peaks in the autocorrelogram at the two-way travel time of the reverberating interface. Beyond seismic processing, autocorrelation is applied in production engineering to identify periodicities in pressure-transient responses, seasonal patterns in production time series, and memory effects in wellbore inflow performance data.

Key Takeaways

  • Mathematical properties of the autocorrelation function: The autocorrelation function has several mathematically important properties that govern its interpretation in signal processing. First, the zero-lag value R(0) equals the total signal energy (the sum of squared amplitudes), making it always the maximum value of the autocorrelation function. Second, the function is symmetric: R(τ) = R(-τ) for all lags, so only the positive-lag side need be plotted and analyzed. Third, a perfectly periodic signal (a pure sine wave at frequency f) produces an autocorrelation that is itself a cosine function at the same frequency, never decaying to zero; this is how autocorrelation identifies hidden periodicity in noise-contaminated signals. Fourth, white noise, which has no correlation structure at any lag other than zero, produces an autocorrelation that is an ideal spike at zero lag and zero at all other lags. The shape of a seismic wavelet's autocorrelation reflects its frequency content and duration: a narrow-bandwidth, long-duration wavelet produces a broad, slowly decaying autocorrelation, while a broadband, short-duration wavelet produces a narrow spike-like autocorrelation. The goal of deconvolution is to transform the seismic trace's autocorrelation toward the ideal spike shape, effectively compressing the embedded wavelet and increasing the resolvable bandwidth of the data.
  • Statistical wavelet extraction and the Wiener-Levinson algorithm: In practice, the true seismic source wavelet is rarely known with precision: it varies with depth, offset, and azimuth due to earth attenuation, and it is convolved with the instrument response and near-surface effects. Statistical wavelet extraction uses the autocorrelation of the seismic trace itself (or an ensemble of traces in a time window) as a proxy for the wavelet autocorrelation, based on the assumption that the reflectivity series is statistically white (random, uncorrelated between lags). Under this assumption, all the correlation structure in the seismic trace autocorrelogram is attributable to the wavelet, not the reflectivity. The Wiener-Levinson algorithm then solves a Toeplitz matrix equation in which the autocorrelation coefficients at each lag form the matrix rows, and the solution vector gives the spiking deconvolution filter coefficients that will compress the estimated wavelet to an approximate spike. The algorithm is computationally efficient because the Toeplitz symmetry allows the solution to be computed recursively in O(n2) operations rather than the O(n3) required for a general matrix inversion. The resulting deconvolution filter is applied trace-by-trace to the seismic data, producing a deconvolved section with improved resolution and better-separated reflections from closely spaced beds.
  • Multiple identification and periodicity analysis via autocorrelogram: Seismic multiples are the most practically significant application of autocorrelation in reflection processing. A water-bottom multiple has a two-way travel time equal to twice the one-way water depth divided by water velocity: for a 150-metre water column at 1,500 m/s, the primary multiple arrives at 200 ms after the primary reflection. This periodicity shows up as a secondary peak in the autocorrelogram at 200 ms lag, exactly corresponding to the multiple period. Interbed multiples, which bounce between two subsurface interfaces, also produce secondary peaks at lags equal to their round-trip travel time between the two bounding reflectors. A key practical diagnostic for the Montney play in northeastern BC is the presence of an interbed multiple from the Pardonet carbonate above the Montney Formation: this multiple has a period of approximately 40-60 ms depending on the local Pardonet-to-Montney interval, and it appears clearly as a secondary peak in the autocorrelogram of the seismic trace window that includes the Montney interval. Identifying the multiple period from the autocorrelogram directly informs the design of the predictive deconvolution operator, specifically the prediction lag setting, to suppress that specific multiple without compromising primary reflections.
  • Predictive deconvolution design from autocorrelation: Predictive deconvolution extends the spiking deconvolution concept by designing a filter that predicts and subtracts periodic components from the seismic trace rather than compressing the full wavelet to a spike. The prediction lag α is set equal to the dominant multiple period identified in the autocorrelogram: a prediction lag of 44 ms targets a multiple with a 44 ms period, and the filter subtracts the predicted periodic energy from the trace at each sample. The operator length L (the number of filter coefficients) controls how many lags of autocorrelation information enter the filter design: shorter operators (24-48 ms) are more adaptive and target higher-frequency multiples, while longer operators (100-200 ms) target lower-frequency, longer-period reverberations. Adaptive predictive deconvolution, which designs the operator using autocorrelation computed in sliding time windows, accounts for the fact that the dominant multiple period changes with depth as the water column or interbed interval changes in two-way time. The percentage of white noise added to the autocorrelogram diagonal (the prewhitening parameter, typically 0.1-1.0 percent) stabilizes the Toeplitz matrix inversion and prevents the filter from amplifying noise in frequency ranges where the signal-to-noise ratio is low.
  • Autocorrelation in production analysis and pressure transient data: Beyond seismic processing, autocorrelation is a standard tool in time-series analysis of production and reservoir engineering data. In production decline analysis, the autocorrelation of monthly production rate residuals (observed minus forecast) detects whether the forecast model has captured all the systematic structure in the data: if significant autocorrelation remains in the residuals at lags of 1-12 months, the model is under-fitting and must be extended with additional parameters. ARIMA (AutoRegressive Integrated Moving Average) models, which are widely used to forecast production from individual wells and fields, include autoregressive terms that explicitly model the autocorrelation structure of the production time series. In pressure transient analysis, the autocorrelation of pressure fluctuations recorded at high sample rate during buildup or drawdown tests can detect wellbore storage effects, which produce strongly autocorrelated pressure signals at short lags, and help identify the transition from wellbore storage to radial flow. The partial autocorrelation function (PACF), which measures correlation at lag k after removing the effects of all shorter lags, is used to determine the order of the autoregressive component in ARIMA model fitting, a step that precedes rate forecasting for Montney and Duvernay horizontal wells with complex multi-segment production histories.

Autocorrelation in Seismic Processing: Wavelet Estimation and Multiple Suppression

The seismic processing workflow in which autocorrelation plays its most consequential role is statistical deconvolution, a step applied to raw field records before stack and migration to improve the temporal resolution of the data and reduce the masking effect of wavelet side lobes on closely spaced reflections. The premise of statistical deconvolution is the convolutional model: every seismic trace is assumed to be the convolution of the earth's reflectivity series r(t) with the embedded seismic wavelet w(t), plus additive noise n(t): x(t) = r(t)*w(t) + n(t). If the wavelet is known, it can be deconvolved from the trace to recover r(t). In practice, the wavelet is estimated from the autocorrelation of x(t) under the assumption that r(t) is white (random), so that the autocorrelation of x(t) is approximately equal to the autocorrelation of w(t). This assumption is well justified statistically for long time windows containing many reflections from a geologically diverse section, but it breaks down over short windows dominated by a few strong reflectors (such as a coal sequence) where the reflectivity is decidedly non-white. Processing geophysicists account for this by testing multiple window lengths and comparing the resulting autocorrelograms for stability, selecting windows that produce smooth, physically interpretable autocorrelation estimates before computing the Wiener filter.

The autocorrelogram's practical appearance on a seismic dataset reveals much about the quality of the recorded data. An ideal autocorrelogram for a processed, noise-free trace shows a dominant central spike at zero lag with symmetric, rapidly decaying side lobes. Long-duration side lobes indicate a ringing wavelet with poor temporal resolution, which is characteristic of raw marine data or onshore data contaminated by surface waves (ground roll). A secondary peak at 200-400 ms in marine data indicates strong water-bottom multiples with that period. Ghost notches, which are zeros in the amplitude spectrum caused by the time delay between the primary downgoing wave and its reflection from the sea surface, appear in the autocorrelogram as nulls at specific lag values corresponding to twice the receiver depth divided by water velocity. Identifying ghost notches in the autocorrelogram before deconvolution allows the processor to design filters that account for the ghost, preventing the deconvolution operator from attempting to boost energy at the notch frequencies where no signal exists. On land data, the autocorrelogram commonly shows a complex early-lag region dominated by near-surface reverberations and air-wave coupling, which must be excluded from the window used to estimate the deep-target wavelet for deconvolution.

The relationship between autocorrelation and the power spectrum of a seismic trace is formalized by the Wiener-Khinchin theorem, which states that the power spectral density of a stationary random process is the Fourier transform of its autocorrelation function. This theorem means that analyzing the autocorrelogram is mathematically equivalent to analyzing the trace's amplitude spectrum: broad-bandwidth, high-frequency content corresponds to a sharp, spiky autocorrelation at zero lag, while narrow-bandwidth, low-frequency content corresponds to a broad, smooth autocorrelation. In practice, processing geophysicists use both representations: the autocorrelogram for identifying periodicity and multiple periods in the time domain, and the amplitude spectrum for QC-ing the frequency content and confirming that deconvolution has achieved the desired bandwidth broadening without excessive noise amplification at high frequencies. The prewhitening parameter in Wiener-Levinson deconvolution directly controls the trade-off between bandwidth extension (benefits of better wavelet compression) and noise amplification (risk of amplifying high-frequency noise where the signal-to-noise ratio is low), and choosing the correct prewhitening level requires examining both the autocorrelogram shape and the post-deconvolution amplitude spectrum.