Electronic Supplement to
A Waveform Detector That Targets Template-Decorrelated Signals and Achieves Its Predicted Performance, Part I: Demonstration with IMS Data

by Joshua D. Carmichael

This electronic supplement develops several theoretical and practical aspects of implementing the correlation and cone detector on geophysical waveform data. The Effective Degrees of Freedom section documents an estimator for the number of statistically independent samples NE in a data stream and describes how it shapes the correlation statistic’s probability density function (PDF). In The Maximum-Likelihood Correlation Statistic section, we derive this PDF, which quantifies our correlation and cone detector’s ability to identify their respective target waveforms. The Cone Detector Statistic section then presents an alternative to the correlation detector hypothesis test and introduces additional, deterministic uncertainty into the target signal model. Therein, we derive the cone detector PDF, illustrate its geometric properties, and model its capability to detect template-decorrelated target waveforms. We support this derivation with Monte Carlo simulations (Fig. S1) and illustrate how deterministic template-target waveform decorrelation inflates thresholds relative to the correlation case (Fig. S2). The Relative Magnitude Parameterization of the Cone Detector Statistic section then relates the probability of detecting a template-decorrelated target waveform to the magnitude of the seismic source that produces this target waveform. Finally, we provide an analysis of three ostensible sources of error from our data processing implementation in the Error Analyses section and describe their influence on our threshold and capability analyses. We find that only data-variance pooling is likely to appreciably affect detector performance.

Effective Degrees of Freedom

This section outlines a method to estimate the effective degrees of freedom NE of a data stream x as N^E. This estimate provides an ostensible alternative to a full covariance matrix Σσ2I for x that is generally required for temporally or spatially correlated data. Density functions for detection statistics that process x are then easily parameterized by the effective number of independent data-stream samples NE. This scalar theoretically equates to twice the time (T) bandwidth (B) product (2TB) of the data stream over the temporal duration of the correlation template. Real data often show that NE2TB, however. This occurs both naturally and through processing operations like band-pass filtering, which replace each sample with itself plus a weighted sum of its neighbors and thereby introduce intersample statistical dependence. To quantify the influence such correlation exerts on the shape of our detector’s PDF curves, we implement an empirical estimator for NE, denoted N^E, to continuously update such PDF parameterizations (e.g., the correlation statistic PDF fR(r;Hk)). This estimator computes the sample correlation between the multichannel template waveform w and several hundred pseudorandom, commensurate data vectors drawn from nonintersecting segments of preprocessed, signal sparse data within x (see Weichecki-Vergara et al., 2001, their section 2.4). We compute the sample variance σ^R2 of the resultant correlation time series using 99.9% of the data by excluding 0.01% of the extreme left and right tails of its histogram. This provides the needed statistic to estimate NE as

N^E=1+1σ^R2.

(S1)

We use N^E to parameterize fR(r;Hk) (equation S7), compute detector thresholds η^, and quantify detector performance.

The Maximum-Likelihood Correlation Statistic

This section develops the PDF and detection performance of the sample correlation detection statistic r(x) (equation 3 of the main article). Because the form of the sample correlation PDF differs from that reported elsewhere, we derive the general form here.

The relative square error in approximating a data stream x=u+n (the H1 model of equation 15 of the main article), with a waveform template w, is the ratio of the least-squares error e2 to measured signal energy x2. This error may be rewritten as a ratio of quadratic forms

e2x2=xA^w2x2=xx,ww2w2x2=1x,w2w2x2=1r2(x),

(S2)

in which A^ is the maximum-likelihood estimate (MLE) for template waveform amplitude. We now rewrite the right side of the second equality in equation (S2) as a ratio of subspace projections

xx,ww2w2x2=PW(x)2PW(x)2+PW(x)2,

(S3)

in which W is the subspace span (w), W is the orthogonal complement to W, PW is the projector onto W, and PW is the projector onto W. The denominator follows from the Pythagorean identity for Hilbert spaces. We define two noncentrality parameters from these terms

λ=PW(E{x})2σ2=PW(u)2σ2=ρ2u2σ2λ=PW(E{x})2σ2=PW(u)2σ2=(1ρ2)u2σ2,

(S4)

in which the expected value and linear-projection operators commute. We combine the previous three equations to rewrite r2(x)

1(1r2(x))=PW(x)2+PW(x)2PW(x)2+PW(x)2PW(x)2PW(x)2+PW(x)2=PW(x)2PW(x)2+PW(x)2=dχ12(λ)χ12(λ)+χNE12(λ),

(S5)

in which =d is distributional equality, χNE12(λ) is the noncentral chi-square distribution with NE  −1 degrees of freedom and noncentrality parameter λ, and χ12(λ) is the noncentral chi-square distribution with one degree of freedom and noncentrality parameter λ. From the definition of the beta distribution

r2(x)B(t,12,12(NE1);λ,λ),

(S6)

(Kay, 1998), in which B(t,N1,N2,α,β) is the doubly noncentral beta distribution function. It is evaluated at t (with the same domain as r2), has N1 and N2 degrees of freedom and noncentrality parameters α and β. The presence or absence of a target signal is indexed by the hypothesis Hk on the data. Hypothesis H0 symbolizes that the data consist of only noise, whereas H1 signifies that the data consist of a signal plus noise. The scalar NE denotes the effective number of independent samples within x. When the data stream contains only noise, the hypothesis H0 is satisfied, and r2 has a central beta distribution, in which λ=λ=0. In the presence of signal, the data stream x generally has nonzero projections PW(x) and PW(x) that are respectively onto and orthogonal to the noise-contaminated template data vector w. In this case, λ, λ0, and r2 has doubly noncentral beta distribution. If the target signal is an amplitude-scaled copy of the template waveform, then x=Aw+n, λ=0, and r2 has a noncentral beta distribution. The singly noncentral beta distribution therefore provides an absolute upper bound on the detection performance of a correlation detector consistent with assumptions underlying H1.

We derive the PDF for r from the density of r2 using a variable transformation; we additionally consider values r < 0, so that

fR(r;Hk)=|r(x)|[B(r2(x);12,12(NE1),λ,λ)+B(r2(x);12,12(NE1),λ,λ)].

(S7)

The form of this distribution function differs from that derived in similar applications (Weichecki-Vergara et al., 2001). In the alternative case, the signal-present data stream was assumed to correlate sample-by-sample with the template waveform, and the test statistic had a Pearson moment-product distribution.

The Cone Detector Statistic

This section develops the cone detection statistic (equation 17 of the main article), its PDF, and illustrates its performance; it was originally introduced in the context of icequake detection (Carmichael, 2013). We first reference the two competing hypotheses of equation (15) of the main article

H0:xN(0,σ2I)(noise present,σunknown)H1:xN(u,σ2I)(noisy target present,uC,σunknown).

(S8)

We further assume that preprocessing operations (like filtering) and naturally occuring temporal data correlation reduces the number of statistically independent samples in x to NE<N. Given this parameterization, the PDF under H0 is denoted as f0(x;H0) and the PDF under H1 as f1(x;H1), in which

f0(x;H0)=1(2πσ2)12·NEexp[x22σ2]f1(x;H1)=1(2πσ2)12·NEexp[xu22σ2],uC.

(S9)

We estimate the unknown parameters under each model listed in equation (15) of the main article and select the applicable hypothesis using a log-generalized likelihood ratio test (log-GLRT). This test evaluates the PDFs for the competing hypotheses at the MLEs of their unknown parameter values and then compares their log ratio to a threshold value η. The test’s decision rule is a conditional, scalar inequality

Λ(x)=ln[maxσ,u{f1(x;H1)}maxσ{f0(x;H0)}]H1H0η.

(S10)

The MLE of the variance under each hypothesis is

σ^12=argmaxσ{ln[f1(x;H1)]}=xu2Nσ^02=argmaxσ{ln[f0(x;H0)]}=x2N,

(S11)

(Kay, 1993; Scharf and Friedlander, 1994), in which the subscripts on each sample variance estimate in equation (S11) indicate the applicable hypothesis, and N is the number of samples in x, not to be confused with NE. To estimate the distribution mean under H1, we evaluate f1(x;H1) at the MLE for σ1 and perform a constrained maximization of uC

u^=argmaxuC{ln[f1(x;H1)]}|σ=σ^1=argminuC{xu2}.

(S12)

The solution to this equation

u^=PC(x)

(S13)

is the MLE for u. That is, u^ is the vector that minimizes the distance between the observed data x and all points that constitute the multiplet set C. This defines the projection of x onto C as the unique signal that is either interior to, or on the boundary C of C (Stark and Yang, 1998). The sample variance and cone-element MLEs from equation (S11) reduce Λ(x) to

2NEΛ(x)=ln(σ^02)ln(σ^12)=ln[xPC(x)2x2].

(S14)

We obtained equation (S14) without specifying C aside from its convexity. This result therefore applies to any normally distributed data confined in a convex set. In the case that C is described by the correlation constraint of equation (14) of the main article, the projection energy has a conic decomposition

xPC(x)2=x2PC(x)2,

(S15)

as given by Moreau’s decomposition theorem (Moreau, 1962), which is analogous to the orthogonal subspace decomposition from linear analysis (Stark and Yang, 1998). The log ratio is then expressible as

2NEΛ(x)=ln[1PC(x)2x2].

(S16)

The right side of equation (S14) is of the form ln(1x2), which is monotonically increasing for 0x1. Because PC(x)2/x21, equation (S16) may be inverted for its argument to provide an equivalent test statistic s(x) for the decision rule introduced in equation (S10)

PC(x)x=s(x)H1H0η.

(S17)

Equation (S17) demonstrates that s(x) compares the ratio of the projected signal energy to the original signal energy. The exact projection PC(x) of x, from a general Hilbert space and onto C, is derived in Stark and Yang (1998, pp. 111–113). Vector PC(x) is a nonlinear projection that depends on the value of x and is either in C, on its boundary C, or zero. We document an equivalent form of that projection here

PC(x)={0:r1r2c,PC(x)=0γzz:r1r2[c,1c],PC(x)Cx:r1r2>1c,PC(x)C.

(S18)

The constants and vectors in equation (S18) are

rw^,xxc1ρ2ρ2γρx(r+c1r2)zzρw^+1ρ2xw^,xw^xw^,xw^

(S19)

The vector w^ in equation (S19) is the normalized multichannel template waveform defined by equation (14) of the main article, not to be confused with a parameter estimate. Using the definitions in equation (S19), the test statistic s(x) of equation (16) of the main article is then expressible as

s(x)={0:r1r2c,PC(x)=0γx:r1r2[c,1c],PC(x)C1:r1r2>1c,PC(x)C.

(S20)

The detection statistic s(x) must be equivalent to the correlation coefficient as ρ approaches a limiting value of 1. To demonstrate this, we first note that any projected signal has a decreasing probability of lying inside the cone as ρ decreases (equation S20). Similarly, any projected signal has a decreasing probability zero of lying on the cone vertex. It follows that only the projection onto the limiting cone boundary C is nontrivial, so that

s(x)=limρ1γx=limρ1ρx(r+c1r2)x=r,

(S21)

in which limρ1 is the limit of ρ approaching from values less than 1, whereby c0.

We derive the requisite PDF for s(x) from its cumulative distribution function (CDF) FS(s;Hk) using the law of total probability. In words, this law states that the probability that the cone statistic random variable S takes on a value as large as s(x) is the sum of three conditional probabilities: the probability that (1) S<s(x), given Pr{PC(x)=0}, plus (2) the probability S<s(x), given Pr{PC(x)C}, plus (3) the probability that S<s(x), given Pr{PC(x)C°}

FS(s(x);Hk)=Pr{Ss(x)|PC(x)C°}Pr{PC(x)C°}+Pr{Ss(x)|PC(x)C}Pr{PC(x)C}+Pr{Ss(x)|PC(x)=0}Pr{PC(x)=0},

(S22)

in which Pr{A|B} is the conditional probability of A, given B is true and statement PC(x)C° is equivalent to PC(x)=(xC). To express FS(s;Hk) in a computationally evaluable form, we write its three conditioning factors in terms of r/1r2 through equation (S20). Two of these terms are trivial to evaluate from the definition for s(x)=PC(x)/x

Pr{S<s(x)|PC(x)=0}=0,sinces(x)=0Pr{Ss(x)|PC(x)C°}=1,sinces(x)=1.

(S23)

The other terms in equation (S22) are expressible using r/1r2 through equation (S20)

Pr{PC(x)C}=Pr{c<r1r2<1c}Pr{PC(x)C°}=Pr{r1r21c}.

(S24)

We evaluate these probabilities by developing the CDF FQ(q;Hk) for the ratio q=r/1r2 (k=0,1). To do so, we make a change of variables with FR(r;Hk)

FQ(q;Hk)=FR(q1+q2;Hk)+FR(q1+q2;Hk),

(S25)

in which FR(r;Hk) is the CDF for correlation statistic r. We then use equation (S25) to write the identities from equation (S24) in terms of FQ(q;Hk):

Pr{PC(x)C}=FQ(1c;Hk)FQ(c;Hk)Pr{PC(x)C°}=1FQ(1c;Hk).

(S26)

Last, we evaluate the derivative of Pr{PC(x)C} in equation (S22) through a variable change on r, in which c<r/1r2<1/c. To do so, we first note that s(x)=ρ(rc1r2) is invertible over this domain and express r as a function of s

r(s)=ρs1ρ21s2.

(S27)

The PDF of s(x) over c<r/1r2<1/c is therefore

fS(s;Hk|PC(x)C)=fR(r(s);Hk)|dr(s)ds|,

(S28)

in which |PC(x)C indicates the restricted domain of r. To obtain the PDF for s(x) over 1r1, we differentiate equation (S22) and substitute equations (S26) and (S28) into the results. This produces a density function that depends on only s, c, and fR(·;Hk)

fS(s;Hk)=[FQ(1c;Hk)FQ(c;Hk)]fR(r(s);Hk)|dr(s)ds|,

(S29)

because the top line of equation (S22) is constant and differentiates to zero.

We assessed the validity of our derivation using a Monte Carlo simulation, whereby we projected random noise and noise-contaminated signal vectors onto several convex cones of increasing aperture and computed the statistic s(x). This simulation demonstrates that equation (S29) agrees with our empirical histograms (Fig. S1).

We additionally compared our cone detector thresholds against constant false-alarm-on-noise constraints. We thereby inverted for cone detector thresholds using a fixed value for the effective degrees of freedom (NE=400) over a grid of decorrelation parameters (1ρ) that ranged from 0 to 0.25. We repeated this process for several false alarm rates (Fig. S2). The resulting detection thresholds increase most rapidly for small changes near ρ=1, at which the signal space of the cone geometrically collapses to the 1D subspace used by the associated correlation detector. The slope of the curves here may become undefined. We will explore this result more quantitatively in future work.

Relative Magnitude Parameterization of the Cone Detector Statistic

An unbiased estimator for the relative magnitude Δm between an explosion that generates a noisy waveform x=u+n and an explosion that generates a detector template w is

Δm^=12log10(x2w2σ^12Nw2),

(S30)

(Carmichael and Hartse, 2016, their equation B.8), in which σ12 denotes an estimate for noise variance in x (subscript denotes hypothesis H1 is true) and N is the number of samples in x, not to be confused with NE. To relate x2 to the noncentrality parameters λ and λ that shape the PDF fS(s;H0), we again use the standard Pythagorean identity for Hilbert spaces

x2=PW(x)2+PW(x)2,

(S31)

in which the subspace projection terms are defined in equation (S3). Next, we use the definitions of u, λ, and λ to rewrite x2 as

x2=σ12(λ+λ)+(noise term),

(S32)

in which “noise term” is an inner product expression that includes n, and σ12 is the true variance of the target data. The noncentrality parameters in equation (S32) are related by Carmichael and Hartse (2016, their equation A.11)

λ=(1ρ2ρ2)λ,

(S33)

and the expected value of “noise term” in equation (S32) is

E{noise term}=E{2u,n}+E{n2}=σ12N,

(S34)

in which E{u,n}=0, because the noise is zero mean. Finally, we combine the preceding equations (this section), remove the noise-bias term σ12N, and write λ in terms of relative magnitude

λ=ρ2w2σ12·102Δm.

(S35)

Equation (S35) thereby expresses the noncentrality parameter for both the correlation and cone detection statistic in terms of the four fundamental scalars describing the wavefield: the deterministic correlation ρ between the template and target waveforms, the signal energy w2 of the template waveform, the noise variance σ12 of the target data, and the relative magnitude Δm between the template and target source.

For our purposes, equation (S35) parameterizes λ by relative magnitude. In such cases, the term ρ is estimable from equation (12) of the main article, σ12 is estimable from the top term of equation (S11), and w2 is effectively deterministic because correlation detectors generally implement high-signal-to-noise ratio (SNR) templates.

Error Analyses

We identified three potentially significant sources (risks) of error in our detection capability assessments and empirical comparison. The first risk of error is attributable to certain details of template selection. Specifically, waveforms recorded on International Monitoring System (IMS) arrays with large differences in source–receiver separation distance show temporal gaps in start time of the high-SNR portions of the recorded signals and therefore require sample imputation. Zero padding such data to equalize length can lead to several biases, however. For example, supplementing data with a large number of zeros causes the empirical null PDF (histogram) for the correlation detection statistic to become more concentrated around its mean and thereby artificially lowers the detector’s threshold. The statistic histogram then fits the predicted PDF poorly and biases estimates of the degree of freedom parameter N^E, because the correlation variance is reduced by the presence of zeros (equation S1). To mitigate these problems and facilitate template scanning, we therefore abstained from zero-padding the intermediate, noisy portion of our waveform template. Instead, we shifted each seismogram channel by an amount equal to its start time, minus the earliest start time among all template channels (template and target data). We thereby maintained the signal-only length of our template, avoided padding, and kept all data aligned to the same origin time. In addition to preventing estimation bias, this shifting mitigated needless computation of padded data and prevented divide-by-zero errors. We therefore consider our template selection to present a low (direct) risk of error.

Estimation of ρ presents a second potential source of error. This arises largely from ambiguous estimation schemes for waveform SNR, which influence the variability of ρ0 (equation 11 of the main article), which, in turn, inversely scales ρ^ (equation 12 of the main article). We mitigated ambiguity problems by carefully selecting an associated low-variance estimate for SNR, which normalizes ρ. One “common-sense” estimate for SNR is the ratio of an N-point sample variance estimate of the data proceeding the detected signal, divided by an N-point sample variance estimate of data preceding the detected signal (a renormalized short-term average/long-term average [STA/LTA]). It is obvious, however, that any such estimate will be biased by background signals contaminating the data steam, which reduce the resultant SNR estimate for u. It follows that ρ^ will be a biased estimator and give lower-than-true deterministic correlation values. A better approach requires preprocessing target data with a power (STA/LTA) detector, removing samples that exceed a threshold for signal declaration, and then computing the noise variance σ^02 from this remaining data. The quotient

SNR^=u2(N1)σ^02,

(S36)

then gives a reduced-bias SNR estimate. We used this estimator to compute ρ0 as:

ρ^0=SNR^(w)(SNR^(w)+1)SNR^(u)(SNR^(u)+1).

(S37)

We therefore consider our estimates of ρ0 to be robust and unlikely to induce significant uncertainty.

Last, noise variance estimation presents an additional risk of error. As we noted before (see the Estimating Deterministic Decorrelation section in the main article), background seismicity adds signal to the time series and can thereby bias such estimates. We therefore processed our target data in parallel with an STA/LTA detector that operated at a 106 false-alarm-on-noise probability, removed data that exceeded the associated threshold, and used the remaining (almost) signal-free data to estimate noise variance on each IMS channel. This scheme thereby avoided bias from signal. A second form of bias was also present, however. This latter bias originated from the nonuniform number of channel samples processed by the detector. Specifically, we noted above that our template included large temporal gaps over portions of the waveform, and consequently, records from MJAR contribute less to each detection statistic than do longer duration records at MKAR. An alternative noise-variance estimate that accounts for such disparate channel contribution employs the pooled variance σ^P2, given by

σ^P2=k=1L(Nk1)σ^k2NL,

(S38)

in which σ^k2 is the noise-variance estimate from channel k. We performed a subsequent analysis using this estimator and found that it was often smaller than our naive estimation that used equally weighted data. This would have increased the effective size of the noncentrality parameter λ that controls predicted detection power. It may explain why, at times, our observed detection capability exceeded the concurrent predicted detection capability. We therefore suggest that performance discrepancy may generally result from an inconsistency in noise variance estimation.


Figures

Figure S1. Null hypothesis PDFs for three cases of deterministic template-target waveform correlation uncertainty: ρ=1, ρ=0.99, ρ=0.90 (in which NE = 400). The PDFs for r(x) and s(x) equate for identically shaped waveforms as shown by the purple and black curves. The shaded region shows a Monte Carlo simulation of the PDF for s(x) when ρ=0.9

Figure S2. Cone detector thresholds η (equation 21 of the main article) for constant values of PrFA compared against deterministic uncertainty in the template-target waveform cross correlation 1ρ, in which NE=400. The leftmost tangent line to the 106 curve shows a rapid increase in η for small uncertainties in deterministic template-to-target waveform correlation relative to near-linear increases in η for larger uncertainties. ρ=0 corresponds to the correlation detector.


References

Carmichael, J. D. (2013). Melt-triggered seismic response in hydraulically-active polar ice: Observations and methods, Ph.D. Thesis, University of Washington, Seattle, Washington.

Carmichael, J. D., and H. Hartse (2016). Threshold magnitudes for a multichannel correlation detector in background seismicity, Bull. Seismol. Soc. Am. 106, no. 2, 478–498.

Kay, S. M. (1993). Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice-Hall PTR, Englewood Cliffs, New Jersey.

Kay, S. M. (1998). Fundamentals of Statistical Signal Processing: Detection Theory, First Ed., Prentice-Hall Inc., Upper Saddle River, New Jersey.

Moreau, J.-J. (1962). Décomposition orthogonale dun espace hilbertien selon deux cônes mutuellement polaires, C.R. Acad. Sci. Paris 255, 238–240 (in French).

Scharf, L. L., and B. Friedlander (1994). Matched subspace detectors, IEEE Trans. Signal Process. 42, no. 8, 2146–2156.

Stark, H., and Y. Yang (1998). Vector Space Projections: A Numerical Approach to Signal and Image Processing, Neural Nets, and Optics, Wiley John and Sons, Incorporated, New York, New York.

Weichecki-Vergara, S., H. L. Gray, and W. A. Woodward (2001). Statistical development in support of CTBT monitoring, Technical Rept. DTRA-TR-00-22, Southern Methodist University.

[ Back ]