Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with...

13
Adaptive STALTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common approach to seismic triggering is to compare short- term averages (STA) with long-term averages (LTA) of transformed amplitudes. In recording environments where this technique is of limited use, hidden Markov models (HMMs) are increasingly used for statistical event detection and classification, but these require training data and are often susceptible to false positive detection errors. In this work, we introduce an adaptive STALTA triggering algorithm that uses STA and LTA of state probabilities defined by restricting an HMM to a two population model of outliers in background noise. Monte Carlo simulations of noise and synthetic events are used to investigate detector sensitivity using statistical properties of latent states. We compare our method with traditional STALTA triggering on real data re- corded by a 12-station vertical borehole array near Hoadley gas field, Alberta, Canada. These tests suggest that our method is more accurate when dealing with closely spaced events and is less susceptible to false positive detection errors. When existing picking algorithms are adapted for HMM STALTA, the result is improvement in total picks, accuracy, and consistency. A narrow range of detection thresholds is optimal for a wide range of signal-to-noise ratios; this suggests HMM STALTA may be less sensi- tive to analyst parameter choices than even traditional STALTA. Seismic Detection Problems The goal of any seismic detection problem is to determine when signals begin and end using fluctuations in measurable properties of earth tremor (Withers et al., 1998; Tselentis et al., 2012). Adopting nomenclature based on Allen (1982), Withers et al. (1998), and Leonard (2000), and expanding the frame- work of Beyreuther et al. (2012), seismic detection algorithms can be loosely grouped into four overlapping categories. 1. Event detection: Searching unprocessed records for new events (e.g., Tselentis et al., 2012; Carmichael, 2013; Hammer et al., 2013). We consider triggering algorithms a subclass of this category designed for (possible real time) transient detection (Swindell and Snell, 1977; McE- villy and Majer, 1982; Withers et al., 1998). 2. Picking: Estimation of phase arrival times. (e.g., Allen, 1978; Baer and Kradolfer, 1987; Earle and Shearer, 1994; Leonard and Kennett, 1999; Tselentis et al., 2012) 3. Classification: Categorization of detected events, typi- cally using seismic attributes. (e.g., Ohrnberger, 2001; Scarpetta et al., 2005; Benítez et al., 2007; Beyreuther and Wassermann, 2008; Beyreuther et al., 2008, 2012) 4. Transition tracking: Determining when quasicontinuous signal content changes. (e.g., Carniel and Di Cecca, 1999; Carniel et al., 2003; Cabras et al., 2012; Jones et al., 2012a,b) The most common type of event detection algorithm trans- forms raw data with a characteristic function (CF) and then compares short-term averages (STA) of current values with long-term averages (LTA) of prior values (Allen, 1978; Withers et al., 1998; Trnkoczy, 2009). Most autopicking al- gorithms either determine onsets from STALTA ratios or STALTA differences (e.g., Allen, 1982; Baer and Kradolfer, 1987; Earle and Shearer, 1994) or by autoregressive methods (e.g., Takanami and Kitagawa, 1988; Leonard and Kennett, 1999; Leonard, 2000). Tselentis et al. (2012) gives a thorough literature review of common methods for both classes of seis- mic detection problem. The performance of any seismic detection algorithm de- pends on suitable choices for a number of variables, which are typically passed as free parameters to the algorithm itself. STALTA ratio triggering, for example, is sensitive to detec- tion threshold and requires choosing an STA window sensi- tive enough to detect transients without multiple triggers from later phase arrivals (Earle and Shearer, 1994; Withers et al., 1998; Trnkoczy, 2009). For a new detector to be of practical use, its improvement on standard STALTA process- ing should compensate for any increased sensitivity to new or existing analyst parameters. Seismic Detection with Hidden Markov Models As data sets increase in size, seismic detection increas- ingly turns to statistical models to automatically identify and characterize events. In the event detection and classification literature, an increasingly common statistical technique uses 1606 Bulletin of the Seismological Society of America, Vol. 105, No. 3, pp. 16061618, June 2015, doi: 10.1785/0120140203

Transcript of Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with...

Page 1: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

Adaptive STA–LTA with Outlier Statistics

by Joshua P. Jones and Mirko van der Baan

Abstract The most common approach to seismic triggering is to compare short-term averages (STA) with long-term averages (LTA) of transformed amplitudes. Inrecording environments where this technique is of limited use, hidden Markov models(HMMs) are increasingly used for statistical event detection and classification, butthese require training data and are often susceptible to false positive detection errors.In this work, we introduce an adaptive STA–LTA triggering algorithm that uses STAand LTA of state probabilities defined by restricting an HMM to a two populationmodel of outliers in background noise. Monte Carlo simulations of noise and syntheticevents are used to investigate detector sensitivity using statistical properties of latentstates. We compare our method with traditional STA–LTA triggering on real data re-corded by a 12-station vertical borehole array near Hoadley gas field, Alberta, Canada.These tests suggest that our method is more accurate when dealing with closely spacedevents and is less susceptible to false positive detection errors. When existing pickingalgorithms are adapted for HMM STA–LTA, the result is improvement in total picks,accuracy, and consistency. A narrow range of detection thresholds is optimal for awide range of signal-to-noise ratios; this suggests HMM STA–LTA may be less sensi-tive to analyst parameter choices than even traditional STA–LTA.

Seismic Detection Problems

The goal of any seismic detection problem is to determinewhen signals begin and end using fluctuations in measurableproperties of earth tremor (Withers et al., 1998; Tselentis et al.,2012). Adopting nomenclature based on Allen (1982),Witherset al. (1998), and Leonard (2000), and expanding the frame-work of Beyreuther et al. (2012), seismic detection algorithmscan be loosely grouped into four overlapping categories.

1. Event detection: Searching unprocessed records for newevents (e.g., Tselentis et al., 2012; Carmichael, 2013;Hammer et al., 2013). We consider triggering algorithmsa subclass of this category designed for (possible realtime) transient detection (Swindell and Snell, 1977; McE-villy and Majer, 1982; Withers et al., 1998).

2. Picking: Estimation of phase arrival times. (e.g., Allen,1978; Baer and Kradolfer, 1987; Earle and Shearer,1994; Leonard and Kennett, 1999; Tselentis et al., 2012)

3. Classification: Categorization of detected events, typi-cally using seismic attributes. (e.g., Ohrnberger, 2001;Scarpetta et al., 2005; Benítez et al., 2007; Beyreutherand Wassermann, 2008; Beyreuther et al., 2008, 2012)

4. Transition tracking: Determining when quasicontinuoussignal content changes. (e.g., Carniel and Di Cecca,1999; Carniel et al., 2003; Cabras et al., 2012; Joneset al., 2012a,b)

The most common type of event detection algorithm trans-forms raw data with a characteristic function (CF) and then

compares short-term averages (STA) of current values withlong-term averages (LTA) of prior values (Allen, 1978;Withers et al., 1998; Trnkoczy, 2009). Most autopicking al-gorithms either determine onsets from STA–LTA ratios orSTA–LTA differences (e.g., Allen, 1982; Baer and Kradolfer,1987; Earle and Shearer, 1994) or by autoregressive methods(e.g., Takanami and Kitagawa, 1988; Leonard and Kennett,1999; Leonard, 2000). Tselentis et al. (2012) gives a thoroughliterature review of common methods for both classes of seis-mic detection problem.

The performance of any seismic detection algorithm de-pends on suitable choices for a number of variables, whichare typically passed as free parameters to the algorithm itself.STA–LTA ratio triggering, for example, is sensitive to detec-tion threshold and requires choosing an STA window sensi-tive enough to detect transients without multiple triggersfrom later phase arrivals (Earle and Shearer, 1994; Witherset al., 1998; Trnkoczy, 2009). For a new detector to be ofpractical use, its improvement on standard STA–LTA process-ing should compensate for any increased sensitivity to new orexisting analyst parameters.

Seismic Detection with Hidden Markov Models

As data sets increase in size, seismic detection increas-ingly turns to statistical models to automatically identify andcharacterize events. In the event detection and classificationliterature, an increasingly common statistical technique uses

1606

Bulletin of the Seismological Society of America, Vol. 105, No. 3, pp. 1606–1618, June 2015, doi: 10.1785/0120140203

Page 2: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

hidden Markov models (HMMs) to determine likely member-ship states for data (Baum and Petrie, 1966; Baum et al.,1970). In any HMM, data are assumed to come from latent(hidden) states for which parameters are not precisely known(Rabiner, 1989). In a Bayesian HMM with Gaussian obser-vations, the Baum–Welch algorithm—an example of anexpectation-maximization (EM) algorithm—efficiently de-termines the parameters of each latent state from a trainingdata set (Baum et al., 1970; Sundberg, 1974; Dempster et al.,1977). This Gaussian classifier approach has seen great suc-cess in volcano seismology and has been applied to geother-mal and microseismic monitoring (Ohrnberger, 2001; Benítezet al., 2007; Beyreuther and Wassermann, 2008, 2011; Beyr-euther et al., 2008, 2012; Gutiérrez et al., 2009; Hammer et al.,2012).

Motivation for a New Detector

HMM detectors have the advantage of quantifying eventand pick confidence without extra postprocessing (Beyr-euther et al., 2008, 2012), and the idea of simultaneousdetection and classification is important for real-time hazardassessment. Yet, several key issues prevent the widespreadadoption of HMM detectors. First, classifiers that rely oncomputed seismic attributes (e.g., polarization, sonogram;Kanasewich, 1981; Joswig, 1994) are computationally de-manding because the probability density functions (PDFs)for the latent states have dimensionality equal to the numberof attributes measured. Second, EM algorithms strongly de-pend on starting parameters, because they do not necessarilyconverge to a global maximum in the log-likelihood solutionspace (Wu, 1983). In geodesy, mathematical and spatial con-straints on EM convergence compensate for this shortcomingwhen classifying Global Positioning System transients (Hud-nut et al., 1999; Donnellan et al., 2007; Granat et al., 2007,2013). Third, algorithms that require training data have lim-ited appeal for temporary experiments, in which it is prohibi-tively difficult to accumulate a representative data set beforein situ analysis becomes critical.

Methods that require model training can be poorly suitedfor hydraulic fracture monitoring because injection opensmany new microcracks, yielding a significant number of un-correlated event waveforms. It follows from the discussionin, for example, Beyreuther et al. (2012, their section 3.2)that dissimilar waveforms pose a risk of overtraining a so-phisticated classification model. The recent work by Bekaraand van der Baan (2010) suggests that a two-state detectorcan circumvent these difficulties with a simplified approachof seeking outliers in a null state of background noise. Usingtime series of time-averaged power spectral coefficients inthis framework led to a robust detector for swell noise(Elboth et al., 2009). Such an approach requires only initialparameter guesses, rather than training data collected in situ.It follows that this framework might be a viable approachfor transformed earthquake amplitudes (for which signalslack the spectral peaks of swell noise). However, averaging

membership state probabilities for (transformed) amplitudesis exactly STA–LTA recast as a problem of outlier statistics.

Theory

Let X be a seismogram with L data points. Let Y denoteX preprocessed with CF yt � g�xt�. Our HMM posits that Yis described by two PDF states f0�yjθ0�, f1�yjθ1� controlledby parameters θ0, θ1. Let Z denote the latent variables thatdetermine the parent state of each observation. Let f0�yjθ0�denote the noise population and f1�yjθ1� the statistical out-liers. Let ϵ be the fraction of data from the outlier state. Then,the mixture model that describes our HMM is

f�y� � ϵf1�yjθ1� � �1 − ϵ�f0�yjθ0�: �1�

For simplicity, we will assume states 0 and 1 are thesame PDF family but θ0 ≠ θ1. We can solve for θ0; θ1; ϵ withan EM algorithm similar to the model learning approach ofexisting seismic event classifiers (Dempster et al., 1977;Manning and Schütze, 1999; Beyreuther and Wassermann,2008; Beyreuther et al., 2008, 2012; Gutiérrez et al., 2009;Bekara and van der Baan, 2010; Hammer et al., 2012;). Theexact algorithm, described in Appendix A, is adapted fromBekara and van der Baan (2010) to work on Y rather than atime series of power spectral coefficients.

Once model parameters are learned, Bayes theoremstates that the conditional distribution of Z is proportionalto the height of the normal density, weighted by the fractionof Y in each state (Papoulis and Pillai, 2002). This gives anexplicit expression for the probability that each yt is an out-lier (i.e., from state 1):

pt �ϵf1�yjθ1�

ϵf1�yjθ1� � �1 − ϵ�f0�yjθ0�: �2�

We will use STA and LTA of pt to form an STA–LTA-typeevent detector and autopicker.

The LTA of outlier probabilities is ϵ, which follows fromequations (1) and (2): at the yt value in which f0�yjθ0� �f1�yjθ1�; pt � ϵ. The STA is pt averaged over an N < Lpoint window. The variable substitutions that replaceunmodified (superscript (0)) STA and LTAwith HMM (super-script �H�) STA and LTA are then, explicitly,

STA�0� � 1

N

Xt�N

n�t

yn → STA�H� � 1

N

Xt�N

n�t

pn;

LTA�0� � 1

L

Xt

n�t−Lyn → LTA�H� � 1

L

XL−1n�0

pn � ϵ: �3�

Formulas for HMM STA–LTA event detection followfrom the above substitutions. For example, the most generic(subtractive) STA–LTA formula

Adaptive STA–LTA with Outlier Statistics 1607

Page 3: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

Trt �1

N

XNn�1

yn −τ

L

X0n�−L

yn �4�

is equivalent to the HMM STA–LTA formula

ρt �1

N

Xt�N

n�t

pn − τϵ: �5�

In both equations, N is short-term window length, the firstterm is STA, and the second is LTA multiplied by detectionthreshold (τ). Other formulas can be adapted easily by mak-ing the appropriate parameter substitutions.

Detector Sensitivity

We will abbreviate fi�yjθi� as fi�y� hereafter for com-pactness.

An advantage of HMMs is that we can quantify detectorsusceptibility to spurious triggers using Monte Carlo simu-lations of (single state and signal free) noise by evaluating itwith the (intentionally bad) two-state model in equation (1).This will establish empirical values for threshold τ and (dif-ferences in) model parametersΔθ, above which we expect nospurious triggers in background noise using CF yt � g�xt�and PDF fi�y� at confidence level cσ.

We will use weighted percent error measurements toquantify these notions of detector sensitivity. For a noise-freeseismogram X, we expect ρt � 0. In addition, we expectθ1 � θ0 � θc, in which the latter is calculated directly fromY. So, weighting each θi by its corresponding state popula-tion (ϵ or 1 − ϵ), we define the decimal percent error in θ forone trial, and the root mean square error for all M trials, as

dθ2 ≡ϵ�θ1 − θc�2 � �1 − ϵ��θ0 − θc�2

θ2c;

Δθ ��XMm�1

dθ2mM

�12

: �6�

Estimating error in ρ (equation 5) is more straightforwardbecause −ϵ ≤ ρ ≤ 1 − ϵ:

δρ2 � 1

T

XTt�1

�ρt � ϵ�2; Δρ ��XMm�1

δρ2m;t

MT

�12

: �7�

Using these values, we can establish empirical minimumdetector thresholds from statistics of δρ and δθ. Assuming δρis normally distributed, we expect no spurious detectionsof pure noise at the cσ level (for any integer c) whenτ > 1� μ�δρ� � cσ�δρ�. Similar reasoning establishes aminimum δθi below which we say the state parameters donot differ at the cσ level.

For each CF� PDF combination, we computeM � 104 Monte Carlo simulations of K � 15 channels ofL � 2:0 × 103 noise coefficients. The model in equation (1)is applied to each channel independently for each trial, using

ϵ�0� � 0:1 and θ�0�1 computed from the largest ϵ�0� coeffi-cients in Y. Figure 1 illustrates how the testing process gen-erates noise, outlier probabilities pt, and creates the detectionquantity ρt from equation (5). Because not all seismic noiseis Gaussian distributed (e.g., Peterson, 1993; Carmichael,2013), we test each CF� PDF combination on Gaussian,Laplace, and red Laplace noise.

Because existing STA–LTA methods use a number of dif-ferent CFs, we test a number of CF� PDF combinations fordetector feasibility (Table 1). Latent state PDFs are chosen sothat a single parameter θi controls each state and the range offi�y� matches the domain of yt. Appendix B gives exact ex-

Figure 1. Illustrative sample from Monte Carlo simulations for sensitivity tests: Laplace noise with characteristic function (CF) yt � jxtjand exponential fi�y�. The left plots show xt, pt, ρk;t, �ρt, whereas the right plots show normalized histograms of ρk;t, �ρt, ϵk, Δθ.

1608 J. P. Jones and M. van der Baan

Page 4: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

pressions for each PDF and maximum likelihood estimators(MLEs) of θi.

The combination yt � E�xt� (signal envelope) with ex-ponential fi�y� returns the lowest values for both Δρ and Δθ,making it the best choice for the noise types evaluated. TheCF yt � jxtj also performs well with exponential states, andyt � E�xt� performs comparably with Rayleigh distributedfi�y�. The only other comparable performance uses CFyt � xt with zero-mean Gaussian states fi�y� on Gaussiannoise. We remark that all four of these choices appear viable;different combinations might be better suited for differentrecording environments.

Detection Algorithm

Figure 2 illustrates the work flow of adaptive STA–LTAwith a two-state HMM. Once we evaluate the model in step 5,we measure differences between θi for the latent states using

δθ �X2i�1

�θi − �θw�2�θ2w

; �8�

in which �θw � ϵθi � �1 − ϵ�θ0, the weighted mean. We onlyscan for events in channels in which δθ exceeds the highest3σ threshold in Table 2 for the CF and PDF used.

In keeping with previous triggering algorithms, wedeclare an event in progress when the detection quantityexceeds 0. For single-channel data, the detection quantity isρt in (5). For aK-channel array, j denotes index to a triggeredchannel and J denotes the total number of triggered channels(J ≤ K). If δθ meets the conditions described above, then ourdetection quantity is the multichannel average of ρj;t, that is,

�ρt �XJj�1

ρj;tJ

: �9�

Multichannel averaging of STA–LTAwith (transformed) am-plitudes is extremely difficult due to susceptibility to falsetriggers via single-channel spikes. Multichannel averagingis possible here, however, because each pj;t ≤ 1. Each chan-nel thus contributes at most �1 − τϵj�=J to �ρt, making com-

pensatory threshold adjustment a matter of simple arithmetic.To minimize susceptibility to single-station transients and lo-cal glitches, we only need to impose the requirement that thetotal triggered channels J exceeds the number of channelsper station (typically 3).

Two post-processing steps are possible without addi-tional calculations. First, ϵj and θi;j become estimates of

θ�0�i;j ; ϵ�0�j for the next length L data buffer to be examined.

This requires setting a small absolute minimum (e.g.,

ϵ�0�j � max�ϵj; 0:01�) to prevent ϵ�0�j → 0 during long peri-ods with no signal. These a posteriori parameter updates aresimilar to how STA–LTA for amplitude data updates LTA.

Second, we can assign a detection quality qj �0 ≤ qj ≤ 1�to each triggered channel j using only arithmetic and param-eters already computed:

qj � maxt

�ρj;t

1 − τϵj

�: �10�

From this, we can finally assign an event quality qe to eachtriggered event (multichannel detection) by simply taking

qe � maxj∈J

�qj�: �11�

Picking with State Probabilities

Existing pickers that use STA–LTA ratios are easilyadapted to work with HMM STA–LTA. As an example, apply-

Table 1Characteristic functions (CFs) and Probability DensityFunctions (PDFs) Used in Detector Sensitivity Tests

CF PDF�1� Example Detectors Using this CF

yt fi�y�xt zga Freiberger (1963)jxtj exp, ray Allen (1978) and Swindell and Snell (1977)x2t exp, ray McEvilly and Majer (1982)

E�xt� exp, ray Earle and Shearer (1994) and Withers et al.(1998)

yt is the CF used to transform raw seismic data, fi�y� is the latent statePDF, zga is zero-mean Gaussian distribution, exp is exponential, ray isRayleigh. ‘E� �’ denotes the seismic envelope (Kanasewich, 1981).

Figure 2. Event detection workflow.

Adaptive STA–LTA with Outlier Statistics 1609

Page 5: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

ing the variable substitutions in equation (3) to the picker ofEarle and Shearer (1994) yields the work flow of Figure 3.After an event trigger ends, ρj;t is calculated at every pointin the trigger window. Channels with max�ρj;t� > 0 aresmoothed with a 30-point Hanning filter and assigned a pickquality qp from the first local maximum in ρj;t. Because aHanning window of half-width w has implicit minimum pickuncertainty of �0:5w, and because our detector fits themodel in equation (1) with exponentially distributed fi�y�,we assign pick uncertainties with the empirical formula

δt � max�0:5w; exp�q−1p ��: �12�Setting the parenthetical terms equal, picks withq ≥ �log�0:5w� � 1�−1 have δt � 0:5w, giving minimumδt ≈ 7 samples for qp ≥ 0:5.

Dependence on Initial Conditions

We have already remarked that the solution to any EMalgorithm depends on its initial values and is not guaranteedto reach a global maximum in the solution space (Wu, 1983;

Granat et al., 2007, 2013). Because one aim of this work is toavoid the need for large training data sets, we must investigatehow initial parameter guesses θ�0�i ; ϵ�0� affect solution quality.

For this test, we will track event triggers in a controlledsetting by applying the work flow in Figure 2 to single-channel synthetic seismograms generated with a signal plusadditive Gaussian noise. Testing parameters are given inTable 3. Each test is repeated for each of two syntheticevents: an impulsive event created from a db64 scaling filter(Daubechies, 1992) and an emergent event created from aTukey windowed chirp function (Figs. 4 and 5). For eachsignal-to-noise ratio (SNR), τ pair, each trial uses a new reali-zation of additive noise, on which we use our method tocheck for triggers by evaluating equation (1) once per ϵ�0�

value. Drawing on the results of the last section, we use

CF yt � E�xt� with exponential fi�y�, with θ�0�1 computedfrom the largest ϵ�0� of Y. The first trigger containing thesynthetic event’s center of energy is the only permittedsuccess. All other triggers are false positive (f�) failures.Figures 4 and 5 describe the process for each signal type.

As a first illustration of how initial conditions affect ourdetector, we use formulas (4) and (5) to compare triggering viaour method to traditional STA–LTA. Figures 6 and 7 show theresultant trade-off curves between false triggers and missedevents, using the synthetic signals of Figures 4a and 5a,respectively. At our method’s optimal value, it outperformstraditional STA–LTA at any ϵ�0� tested, for both synthetic signaltypes, with fewer f� errors.

Table 2Intrinsic Error and Minimum Threshold for Hidden

Markov Model (HMM) Short-Term Average(STA)–Long-Term Average (LTA)/ Detectors

yt fi�y� X Δρ Δθ τmin δθmin

jxtj exp G 0.09 0.16 1.11 0.19jxtj exp L 0.10 0.42 1.13 0.50jxtj exp R 0.09 0.18 1.15 0.26jxtj ray G 0.13 0.47 1.20 0.53jxtj ray L 0.16 0.75 1.25 0.81jxtj ray R 0.15 0.50 1.31 0.62E�xt� exp G 0.09 0.05 1.10 0.06E�xt� exp L 0.09 0.12 1.11 0.15E�xt� exp R 0.09 0.06 1.12 0.09E�xt� ray G 0.09 0.19 1.13 0.22E�xt� ray L 0.11 0.37 1.19 0.43E�xt� ray R 0.10 0.22 1.20 0.33xt zga G 0.07 0.17 1.08 0.28xt zga L 0.26 1.16 1.35 1.27xt zga R 0.12 0.29 1.34 1.03x2t exp G 0.13 0.91 1.19 1.02x2t exp L 0.14 1.46 1.22 1.60x2t exp R 0.14 0.99 1.31 1.24x2t ray G 0.20 1.11 1.29 1.18x2t ray L 0.20 1.46 1.27 1.54x2t ray R 0.21 1.15 1.43 1.31

yt is the CF used to transform raw seismic data, fi�y� is thelatent state PDF fitted to yt. E� � denotes the signal envelope,exp denotes the exponential distribution, zga denotes the zero-mean Gaussian distribution, ray denotes the Rayleighdistribution. The third column gives the noise type: G isGaussian, L is Laplace, and R is red Laplace noise. Incolumns 4 and 5, Δρ and Δθ are the root mean square error inρ and θ, respectively, for all M trials. τmin and δθmin are theminimum detection threshold and minimum decimal percentdifference in θi, respectively, that prevent spurious detectionsat the 3σ level.

Figure 3. Autopicking workflow, adapted from Earle andShearer (1994) for the event detection workflow of Figure 2.

1610 J. P. Jones and M. van der Baan

Page 6: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

As a second illustration of how initial conditions affectour detector, Figures 8 and 9 show the final outlier state frac-tion ϵ�f� for both synthetic events, for all ϵ�0� values tested.Averaging pt via equation (5) compensates for poor ϵ�0�

guesses at any SNR in which these signals are detectable—far lower than where ϵ�f� approaches the true value of 0.12.

Application to Borehole Array Data

Microseismic monitoring is a challenging environmentfor STA–LTA triggering, because events can be closelyspaced with site noise and tremor superposed on transients(St-Onge et al., 2013; Tary et al., 2014). In 2012–2013, theMicroseismic Research Consortium deployed a vertical

Figure 4. Impulsive synthetic event and illustrative example of hidden Markov model (HMM) and traditional short-term averages (STA)–long-term averages (LTA), snapshotted at signal-to-noise ratio �SNR� � 4 dB, τ � 1:2. The topmost plot shows noise-free signal constructedfrom a db64 scaling filter (Daubechies, 1992). Second through fourth plots show detection on the signal with additive Gaussian noise. Thelight box corresponds to HMM STA–LTA trigger window at ϵ�0� � 0:5, and the dark boxes are classical STA–LTA trigger windows. The bottomtwo plots show triggering calculations using ρ in equation (5) and Trt in equation (4). The plot times are shifted to match the center of eachdetection window, rather than the left edge (as in text formulas).

Figure 5. As in Figure 4, using an emergent synthetic signal generated by a Tukey windowed Chirp function. Snapshotting is identicallydone at SNR � 4 dB and τ � 1:2.

Table 3Parameters for Controlled Detection Tests Using Synthetic

Data

Parameter HMM Traditional

LTA window length 2000 400ϵ�0� 0.05: 0.05: 0.25 —Signal length 240Record length 2000STA window length 120STA refresh 60SNR (dB) –10: 1.0: 10 dBThreshold (τ) 1.0: 0.1: 3.0Number of Simulations 100 per SNR, τ pair

All lengths are in samples. HMM denotes adaptive HMM STA–LTAwithour method. ϵ�0� is initial outlier fraction. SNR is the signal to noise ratio.

Adaptive STA–LTA with Outlier Statistics 1611

Page 7: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

borehole array to monitor an open-hole multistage hydraulicfracture treatment at Hoadley gas field, central Alberta, Can-ada (see Data and Resources). Here, we present detectiontests using raw data records from the early postinjectionphase of the Hoadley experiment.

Benchmark Detection

To compare our method with traditional STA–LTA eventdetection, we use formulas (4) and (5) to compare each de-tector’s ability to trigger around benchmark times in the rawrecords. Detector parameters are given in Table 4. Parametersfor traditional STA–LTA match the original processing andare typical for microseismic monitoring (Eaton et al.,2014). Our test uses a total of 129 benchmarks determinedby analyst inspection. Benchmark times correspond to the

peak energy in each event (typically the S phase). As shownby the illustrative example in Figure 10, our method per-forms similarly to traditional STA–LTA but more accuratelytriggers when events are closely spaced in time. This is pre-sumably because of what Earle and Shearer (1994) termloading up of LTA by energetic transients, which results inlowered sensitivity to weak arrivals following strong, impul-sive ones. This effect can be seen in the lowermost plots ofFigures 4 and 5.

By repeating the detection test over the tabulated rangeof τ and ϵ�0� values, we establish empirical f�=f− trade-offcurves for this data set in Figure 11. Our method’s robustnessto ϵ�0� and low optimum threshold (1.2–1.4) are consistentwith the detector sensitivity tests of Table 2. Quite remark-ably, in the 1:2 ≤ τ ≤ 1:6 range, for all our ϵ�0� values tested

Figure 6. Failure rate versus threshold (τ) for detection trialsover all SNRs tested (−10:10 dB in 1 dB increments) with the im-pulsive signal in Figure 4, using HMM (solid lines) and classical(dashed lines) detectors. The thick lines show total failures, andthe thin lines show false triggers.

Figure 7. As in Figure 6, for detection trials with the emergentsignal in Figure 5.

Figure 8. Our expectation-maximization (EM) algorithm’s sen-sitivity to initial conditions is seen in the final outlier fraction ϵ as afunction of SNR for various initial values ϵ�0�. Computations useresults of Monte Carlo simulations using the impulsive syntheticof Figure 4. The bars correspond to 1σ error.

Figure 9. As in Figure 8, for detection trials with the emergentsignal in Figure 5.

1612 J. P. Jones and M. van der Baan

Page 8: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

(0.03–0.15), the proposed method outperforms traditionalSTA–LTA with the latter set to any detection threshold.

Automatic Onset Picks

As stated by Leonard (2000), reliable and accurate auto-matic onset picks are extremely important for data analysis—particularly when unprocessed microseismic data sets nowroutinely reach terabytes to tens of terabytes in size. We nowinvestigate whether our adapted autopicker improves pick ac-

curacy and consistency by testing against manually picked Parrivals. The benchmark data for this test are 50 high SNR mi-croearthquakes picked from single-component traces on 25 re-cords, a subset of the data used in the event detection test of thelast section. Picks are chosen for low analyst uncertainty(�10 ms). Full testing parameters appear in Table 4; minimuminterevent offset is higher than for triggering benchmark tests,because the detection benchmarks are more closely spaced.

In this test, we use the autopicker of Earle and Shearer(1994) on two groups of time series: time-averaged state prob-

Figure 10. A 5 s data sample with seven benchmarks is scanned with adaptive HMM (solid lines) and normal (dashed lines) STA–LTA. Theupper plots show normalized seismograms from the station with the highest average quality qj for all HMM triggers. The first plot overlay showstrigger windows with our method, the second plot overlay shows trigger windows for traditional STA–LTA. Benchmark times appear as vertical lineson the third plot. The fourth plot shows rescaled channel count for STA=LTA�0�τ � 1:5: c ≥ 0 indicates an event and rescaling gives c � 1 whenall channels trigger. The fifth axis shows HMM triggering parameter �ρt and event quality qe � max�qj�. Full detector parameters are given in Table 4.

Table 4Parameters for Benchmark Tests and Picking

HMM Traditional

Parameter Name Variance Benchmarks Picking Benchmarks Picking

Pretrigger window, ms — 200 500 200 500Post-trigger window, ms — 200 1000 200 1000Minimum event offset, ms — 200 500 200 500Latent state PDF fi�y� Exp —Channels to Trigger Nc — 10LTA window, ms L 500 100CF yt. E�x�Threshold τ 1.0: 0.1: 4.0STA window (refresh), ms N 30 (15)

HMM denotes adaptive HMM STA–LTA with our method. All window lengths are inmilliseconds, and all data are sampled at 4000 Hz. E� � denotes the signal envelope(Kanasewich, 1981). The second column denotes variable used for each parameter in text.

Adaptive STA–LTA with Outlier Statistics 1613

Page 9: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

abilities computed by our method from equation (5) and tradi-tional STA–LTA ratios. Figure 12 shows the time difference δtpbetween manual and automatic picks in the threshold 1.0–4.0range. This yields a remarkable result. If we consider automatedpicks accurate when jμ�δtp�j ≤ 0:015 s (half a short window),and consistent when σ�δtp� ≤ 0:03 (one short window), thenour method produces accurate, consistent picks in theτ � 1:7–3:7 range. The original picking method only does soat τ ≥ 3:6. Our method yields more picks for τ � 1:7–2:0 thanthe original method at τ ≥ 3:6. Thus, we see considerable im-provement in the trade-offs among accuracy, consistency, andtotal picks.

An illustrative example of the spread in automatic picksis shown in Figure 13. This comparison sets automatic picksusing the lowest thresholds from Figure 12 that meet our cri-teria for accuracy and consistency (τ � 3:6 with traditionalSTA–LTA, τ � 1:7 with our method). Picking with ordinarySTA–LTA yields 172 automatic picks with μ�δtp� � −7:5and σ�δtp� � 21:0 ms. Using the modified autopicker withoutlier probabilities yields 223 picks with qe ≥ 0:1,μ�δtp� � −11:4, and σ�δtp� � 23:1 ms. Thus, with no sta-tistically significant loss in accuracy, our method yields∼30% more automated picks than when the picker is appliedto traditional STA–LTA.

Full Example: Detection and Picking

An example of detection for a 1 min window of unproc-essed data appears in Figure 14. Events are detected by scan-ning with the detection test parameters in Table 4. Theclosely spaced events of Figure 14a are commonly seen inmicroseismic monitoring; traditional STA–LTA is often modi-

fied to improve triggering accuracy when interevent times arelow (Trnkoczy, 2009). We use ϵ�0� � 0:10, τ � 1:5 for initialevent detection and τ � 2:0 for picking. Traditional STA–LTAyields 15 triggers with 0 f� errors with τ � 2:0 for detectionand τ � 3:6 for picking. Our method yields 19 triggers with 0f� errors. Three small events are missed with both techniques.

Figure 11. Empirical f�=f− error trade-off curves as a functionof τ, using HMM (black) and classical (gray) STA–LTA; 149 totalbenchmarks are used in this test. The thin solid lines denote f� fail-ures. In legend, τ0 denotes the optimal threshold value for each curve.

Figure 12. Comparison of picking error and number of picksversus detection threshold. ES in legend denotes the method ofEarle and Shearer (1994) and HM the modified HMM version ofthis work (Fig. 3). The solid lines correspond to differences betweenautomatic and analyst picks (scale at left), circles indicate mean, andbars indicate 1σ error. The dashed lines are total number of auto-matic phase picks at each threshold (scale at right).

Figure 13. Histogram showing errors in onset pick times rel-ative to analyst picks. Detector parameters match Table 4 withϵ�0� � 0:1, τ � 1:7 for our method and τ � 3:0 for traditionalSTA–LTA. The dark gray bars indicate picks with STA–LTA eventdetections using Earle and Shearer (1994). The light gray bars in-dicate picks with event detections using our method and the adaptedworkflow (Fig. 3).

1614 J. P. Jones and M. van der Baan

Page 10: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

The original processing (at τ � 3:0) had only 12 triggers inthis time window (D. Eaton, personal comm., 2014).

Sample picks using the modified Earle and Shearermethod (Fig. 3) are shown in Figure 15 for a high-qualityevent. Pick times are consistent with one another, and visualinspection suggests high accuracy. Consistent with Figure 13,accuracy appears higher with our method. This result is quiteremarkable considering that SNR for unprocessed microseis-

mic data is generally much lower than for (regional to global)network data (e.g., Earle and Shearer, 1994; Fig. 1).

Discussion

Although HMM event detection is not new, our work isthe first HMM event detector developed primarily as a trigger-ing algorithm. Compared with Hammer et al. (2013) and Be-yreuther et al. (2012), our method requires far fewer analystparameters and far less analyst interaction. Before even con-sidering analyst-determined feature descriptors (e.g., cornerfrequencies), an N-state Gaussian classifier has a minimum of2N � 2 analyst parameters: value for N, window length, andinitial values for mean and variance of each state. On the otherhand, our proposed method has essentially three free param-eters: short window length, long window length, and outlierfraction ϵ�0�. Although threshold CF and PDF shape canchange, our tests suggest that this is rarely necessary.

We can assess our method’s real-time applicability byexamining scan times of data samples used in the triggeringtest: using MATLAB test code (see Data and Resources) onan Intel 3.4 GHz quad-core CPU with 8 GB RAM, for whichoperating system is Microsoft Windows 7 Home Premium,processing times for envelope data range from 0.42 to 0.69 swith mean 0.59 s. The largest contributing factor to scan timeis threshold τ. In the optimal τ � 1:2–1:4 range, the meanscan time is 0.52 for 5.0 s records sampled at 4000 Hz. Thisis an order of magnitude faster than the minimum for real-time acquisition and ∼30 times the speed of fully optimizedSTA–LTA code in the same environment. We anticipate speedwill improve further once HMM STA–LTA code is fully opti-mized and converted to FORTRAN.

Our detector sensitivity tests may explain why previousHMM detection works consider amplitudes nondiagnostic, asremarked in, for example, Benítez et al. (2007). Our noise

Figure 14. Event triggers in a 1 min window of continuous data recorded by a 12 geophone vertical borehole array sampling at 4000 Hz.The top plot shows trigger windows overlain on raw data from channel 34. Our methods trigger 19 times at τ � 1:5, ϵ�0� � 0:1 (light grayboxes). Traditional STA–LTA triggers 14 times at τ � 2:0 (dark gray boxes). The middle plot gives corresponding pk;t values for channel 34 from(2), superposed on a plot of τϵ. The bottom plot gives corresponding ρk;t from (5). Vertical stagger in boxes is for improved visibility only.

Figure 15. Automatic picks are plotted on normalized vertical-component seismograms taken from data in 14. Numeric subscripthere indicates station number. qp is computed from the first localmaximum in ρj;t. Error bars use the empirical formula (12). Picks ingray result from applying Earle and Shearer (1994) to traditionalSTA–LTA ratios at τ � 3:6. Picks in black apply our modifiedversion of the picker (Fig. 3) to HMM STA–LTA with ϵ�0� � 0:1,τ � 2:0. Full detection parameters are given in Table 4.

Adaptive STA–LTA with Outlier Statistics 1615

Page 11: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

trials suggest that detectors using CF y � x2 or a GaussianPDF �f ∝ x2� are too sensitive to typical noise amplitude var-iations for HMM triggering (see Table 2). We know of no otherHMM detector methods that use non-Gaussian latent states.We feel this is an oversight of prior works because there isno guarantee that features are Gaussian distributed; for exam-ple, envelope amplitudes of a zero-mean Gaussian distributedvariable are Rayleigh distributed (Kanasewich, 1981).Envelope PDF shape becomes even more complicated for aGaussian distributed variable with a nonzero mean (Dharma-wansa et al., 2009). We anticipate that future work will morethoroughly investigate howmisfit between assumed and actualPDF shape affects statistical event detection.

The feasibility of developing a multistate amplitude-based triggering and classification algorithm is a subject ofongoing investigation. This is not as straightforward as replac-ing the single outlier state in equation (1) with a sum and re-deriving from equation (2). A significant complication is thatone or more states might be needed for each seismic phase ofeach nonrepeating event; Beyreuther et al. (2012) used thisapproach with some success. Yet, this introduces a need formodel training and adaptive reevaluation, as in Hammer et al.(2012, 2013). Although the prospect is appealing, we know ofno algorithm to date that simultaneously detects, picks, andclassifies events using only HMM state probabilities.

Conclusions

The key improvements of using a two-state HMM forSTA–LTA triggering are increased accuracy and fewer falsepositive detection errors. The key improvement in pickingtime series of state probabilities (rather than ratios) is greaterpick yield with high accuracy and consistency, even whensome low-quality picks are retained. The principal improve-ment over existing HMM detectors is that in situ data samplesfrom a new experiment are not needed to train a statisticalmodel, facilitating real-time processing of both permanentand temporary deployments with minimal user intervention.

A secondary but notable advantage of our method is that itrequires little in situ threshold adjustment, although detectionthreshold for traditional STA–LTA necessarily varies by experi-ment and network type. We consider this a significant improve-ment to algorithm robustness. Also significant is that HMMSTA–LTA appears sensitive to fewer analyst parameters thantraditional STA–LTA; for example, our tests show little depend-ence on initial outlier fraction and no need for an arbitrary chan-nels to trigger parameter. Although CF and state PDF couldmerit reconsideration for other experiments with extremelyunusual background noise, our detector proved effective for awide range of tests on synthetic and real data, even in the chal-lenging environment of microseismic monitoring.

Data and Resources

Data used in this article were recorded by the Microseis-mic Research Consortium, a joint venture of the University of

Alberta, University of Calgary, and numerous industry sponsors.For a nine month period in 2012–2013, we deployed a verticalborehole array to monitor an open-hole multistage hydraulicfracture treatment at Hoadley gas field, Alberta, Canada. Thearray recorded continuous data on 12 passive triaxial geophones(fc � 15 Hz) at 4000 Hz. Full experiment details appear inEaton et al. (2014). Requests for data samples may be directedto Mirko van der Baan ([email protected]) orDavid Eaton ([email protected]).

Prior to analysis all data were converted to SEG-Y for-mat in ∼5 s (20,001 sample) segments using ESG HNASsoftware. Data analysis used MATLAB r2012a–2014a in-stalled on x86-64 personal computers with Microsoft Win-dows 7 Home Premium and Canonical Ubuntu 12.04–14.04long-term support operating systems. Programs, scripts, andsupplemental material related to this work are available byrequest from the corresponding author.

Acknowledgments

We thank the Microseismic Industry Consortium for financial supportof this work and ConocoPhillips Canada and the Natural Sciences and En-gineering Research Council of Canada for financial support of the Hoadleyflowback microseismic experiment. We extend special thanks to Conoco-Phillips Canada and the Microseismic Industry Consortium for providingtest data for this work and D. Eaton (University of Calgary) for microseismicprocessing information and discussions. Joshua Jones thanks S. D. Malone(University of Washington) for the insightful discussions of earthquake trig-gering, J. D. Carmichael (Los Alamos National Laboratory, New Mexico)for discussions of statistical detector theory, and R. Carniel (Universitá diUdine, Italy) for manuscript suggestions. Finally, we thank the anonymousreviewers for their suggestions and insights, which we hope have led to im-provement of this work.

References

Allen, R. V. (1978). Automatic earthquake recognition and timing from sin-gle traces, Bull. Seismol. Soc. Am. 68, no. 5, 1521–1532.

Allen, R. (1982). Automatic phase pickers: Their present use and futureprospects, Bull. Seismol. Soc. Am. 72, no. 6B, S225–S242.

Baer, M., and U. Kradolfer (1987). An automatic phase picker for local andteleseismic events, Bull. Seismol. Soc. Am. 77, 1437–1445.

Baum, L. E., and T. Petrie (1966). Statistical inference for probabilistic func-tions of finite stateMarkov chains,Ann. Math. Stat. 37, no. 6, 1554–1563.

Baum, L. E., T. Petrie, G. Soules, and N. Weiss (1970). A maximizationtechnique occurring in the statistical analysis of probabilistic functionsof Markov chains, Ann. Math. Stat. 41, no. 1, 164–171.

Bekara, M., and M. van der Baan (2010). High-amplitude noise detectionby the expectation–maximization algorithm with application toswell-noise attenuation, Geophysics 75, no. 3, 39–49.

Benítez, M. C., J. Ramírez, J. C. Segura, J. M. Ibáñez, J. Almendros,A. García-Yehuas, and G. Cortés (2007). Continuous HMM-basedseismic-event classification at Deception Island, Antarctica, IEEETrans. Geosci. Remote Sens. 45, no. 1, 138–146.

Beyreuther, M., and J. Wassermann (2008). Continuous earthquake detec-tion and classification using discrete hidden Markov models,Geophys.J. Int. 175, no. 3, 1055–1066.

Beyreuther, M., and J. Wassermann (2011). Hidden semi-Markov modelbased earthquake classification system using weighted finite-statetransducers, Nonlinear Process. Geophys. 18, 81–89, doi: 10.5194/npg-18-81-2011.

1616 J. P. Jones and M. van der Baan

Page 12: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

Beyreuther, M., R. Carniel, and J. Wassermann (2008). Continuous hiddenMarkov models: Application to automatic earthquake detection and clas-sification at Las Canadas caldera, J. Volcanol. Geoth. Res. 176, 513–518.

Beyreuther, M., C. Hammer, J. Wassermann, M. Ohrnberger, and T. Megies(2012). Constructing a hiddenMarkovmodel based earthquake detector:Application to induced seismicity, Geophys. J. Int. 189, 602–610.

Cabras, G., R. Carniel, and J. Jones (2012). Non-negative matrix factoriza-tion: An application to Erta ‘Ale volcano, Ethiopia, Boll. Geof. Teor.Appl. 53, no. 2, 231–242.

Carmichael, J. D. (2013). Melt-triggered seismic response in hydraulically-active polar ice: Observations and methods, Ph.D. Thesis, Universityof Washington, Seattle, Washington.

Carniel, R., and M. Di Cecca (1999). Dynamical tools for the analysis oflong term evolution of volcanic tremor at Stromboli, Ann. Geofisc.42, no. 3, 483–495.

Carniel, R., M. Di Cecca, and D. Rouland (2003). Ambrym, Vanuatu (July–August 2000): Spectral and dynamical transitions on the hours-to-daystimescale, J. Volcanol. Geoth. Res. 128, 1–13.

Daubechies, I. (1992). Ten lectures on wavelets, in CBMS-NSF RegionalConference Series in Applied Mathematics, Society for Industrialand Applied Mathematics, Philadelphia, Pennsylvania, 377 pp.

Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihoodfrom incomplete data via the EM algorithm, J. Roy. Stat. Soc. Ser. B 39,no. 1, 1–38.

Dharmawansa, P., N. Rajatheva, and C. Tellambura (2009). Envelope andphase distribution of two correlated Gaussian variables, IEEE Trans.Commun. 57, no. 4, 915–921.

Donnellan, A., J. Rundle, G. Fox, D. McLeod, L. Grant, T. Tullis, M. Pierce,J. Parker, G. Lyzenga, R. Granat, and M. Glasscoe (2007). QuakeSimand the solid earth research virtual observatory, in ComputationalEarthquake Physics: Simulations, Analysis and Infrastructure (II),Springer, Basel, Switzerland, 2263–2279.

Earle, P., and P. Shearer (1994). Characterization of global seismograms using anautomatic picking algorithm, Bull. Seismol. Soc. Am. 84, no. 2, 366–376.

Eaton, D., E. Caffagni, A. Rafiq, M. van der Baan, V. Roche, andL. Matthews (2014). Passive seismic monitoring and integrated geo-mechanical analysis of a tight-sand reservoir during hydraulic-fracturetreatment, flowback and production, Unconventional Resources Tech-nology Conference (URTEC), Denver, Colorado, 25–27 August 2014.

Elboth, T., B. A. P. Reif, and O. Andreassen (2009). Flow and swell noise inmarine seismic data, Geophysics 74, no. 2, Q17–Q25.

Freiberger, W. F. (1963). An approximate method in signal detection, J.Appl. Math. 20, 373–378.

Granat, R., G. Aydin, M. Pierce, Z. Qi, and Y. Bock (2007). Analysis ofstreaming GPS measurements of surface displacement through aweb services environment, in Proceedings of the IEEE Symposiumon Computational Intelligence and Data Mining, CIDM 2007, partof the IEEE Symposium Series on Computational Intelligence 2007,Honolulu, Hawaii, 1–5 April 2007, 750–757.

Granat, R., J. Parker, S. Kedar, D. A. Dong, B. Y. Tang, and Y. Bock (2013).Statistical approaches to detecting transient signals in GPS: Resultsfrom the 2009–2011 transient detection exercise, Seismol. Res. Lett.84, 444–454.

Gutiérrez, L., J. Ibaez, G. Cortés, J. Ramírez, C. Benítez, V. Tenorio, and A.Isaac (2009). Volcano-seismic signal detection and classificationprocessing using hidden Markov models. Application to San Cristóbalvolcano, Nicaragua, in Geoscience and Remote Sensing Symposium,2009 IEEE International, IGARSS 2009, Vol. 4, Cape Town, SouthAfrica, 12–17 July 2009, IV-522

Hammer, C., M. Beyreuther, and M. Ohrnberger (2012). A seismic eventspotting system for volcano fast response systems, Bull. Seismol.Soc. Am. 102, no. 3, 948–960.

Hammer, C., M. Ohrnberger, and D. Fäh (2013). Classifying seismic wave-forms from scratch: A case study in the alpine environment, Geophys.J. Int. 192, 425–439.

Hudnut, K. W., Y. Bock, J. E. Galetzka, F. H. Webb, and W. H. Young(1999). The Southern California integrated GPS network (SCIGN),

in Y. Fujinawa (Editor), Proc. Int. Workshop Seismotectonics Subduc-tion Zone, NIED, Tsukuba, Japan, December 1999, 175–196.

Jones, J. P., R. Carniel, and S. D. Malone (2012a). Sub-band decompositionof continuous volcanic tremor, J. Volcanol. Geoth. Res. 213, 98–115.

Jones, J. P., R. Carniel, and S. D. Malone (2012b). Decomposition, location,and persistence of seismic signals recovered from continuous tremor atErta’Ale, Ethiopia, J. Volcanol. Geoth. Res. 213, 116–129.

Joswig, M. (1994). Knowledge-based seismogram processing by mentalimages, IEEE Trans. Syst. Man Cybern. 24, no. 3, 429–439.

Kanasewich, E. R. (1981). Time Sequence Analysis in Geophysics,University of Alberta Press, Edmonton, Alberta.

Leonard, M. (2000). Comparison of manual and automatic onset time pick-ing, Bull. Seismol. Soc. Am. 89, no. 6, 1384–1390.

Leonard, M., and B. L. N. Kennett (1999). Multi-component autoregressivetechniques for the analysis of seismograms, Phys. Earth Planet. In.113, no. 2, 247–264.

Manning, C. D., and H. Schütze (1999). Foundations of Statistical NaturalLanguage ProcessingMIT Press, Cambridge, Massachusetts, 388–402.

McEvilly, T. V., and E. L. Majer (1982). ASP: An automated seismic processorfor microearthquake networks, Bull. Seismol. Soc. Am. 72, 303–325.

Papoulis, A., and S. U. Pillai (2002).Probability, Random Variables and Stochas-tic Processes, Fourth Ed., McGraw-Hill Book Co., New York, New York.

Peterson, J. (1993). Observations and modeling of seismic backgroundnoise, U.S. Geol. Surv. Open-File Rept. 93-322, 94 pp.

Rabiner, L. (1989). A tutorial on hidden Markov models and selectedapplications in speech recognition, Proc. IEEE 777, no. 2, 257–286.

Scarpetta, S., F. Giudicepietro, E. Ezin, S. Petrosino, E. Del Pezzo,M. Martini, and M. Marinaro (2005). Automatic classification ofseismic signals at Mt. Vesuvius volcano, Italy, using neural networks,Bull. Seismol. Soc. Am. 95, no. 1, 185–196.

St-Onge, A., D. W. Eaton, and A. Pidlisecky (2013). Borehole vibrationresponse to hydraulic fracture pressure, paper presented at CSEGGeoconvention, Calgary, Alberta, 6–12 May 2013.

Sundberg, R. (1974). Maximum likelihood theory for incomplete data froman exponential family, Scand. J. Stat. 1, no. 2, 49–58.

Swindell, S. W., and N. S. Snell (1977). Station processor automatic signaldetection system, phase I: Final report, station processor software de-velopment, Texas Instruments Final Report No. ALEX (01)-FR-77-01,AFrAC Contract Number F08606-76-C-0025, Texas Instruments Inc.,Dallas, Texas.

Takanami, T., and G. Kitagawa (1988). A new efficient procedure for the es-timation of onset times of seismic waves, J. Phys. Earth 36, 267–290.

Tary, J. B., M. Baan, and D. W. Eaton (2014). Interpretation of resonancefrequencies recorded during hydraulic fracturing treatments,J. Geophys. Res. 119, no. 2, 1295–1315.

Trnkoczy, A. (2009). Understanding and parameter setting of STA/LTA trig-ger algorithm, in New Manual of Seismological Observatory Practice(NMSOP), P. Bormann (Editor), Deutsches GeoForschungsZentrumGFZ, Potsdam, Germany, 1–20.

Tselentis, G., N. Martakis, P. Paraskevopoulos, A. Lois, and E. Sokos(2012). Strategy for automated analysis of passive microseismic databased on S-transform, Otsu’s thresholding, and higher order statistics,Geophysics 77, no. 6, KS43–KS54.

Withers, M., R. Aster, C. Young, J. Beiriger, M. Harris, S. Moore, andJ. Trujillo (1998). A comparison of selected trigger algorithms forautomated global seismic phase and event detection, Bull. Seismol.Soc. Am. 88, 95–106.

Wu, C. F. J. (1983). On the convergence properties of the EM algorithm,Ann. Stat. 11, no. 1, 95–103.

Appendix A

Expectation Maximization

Let L be the number of observations in a transformedsingle-channel seismogram Y described by equation (1). Our

Adaptive STA–LTA with Outlier Statistics 1617

Page 13: Adaptive STA LTA with Outlier Statisticsvanderba/papers/JoVa15.pdf · Adaptive STA–LTA with Outlier Statistics by Joshua P. Jones and Mirko van der Baan Abstract The most common

expectation-maximization (EM) algorithm estimates themaximum-likelihood estimators (MLEs) of the parametersθi that control each probability density function (PDF) (state)f1�y�, f0�y� in the statistical model of equation (1). Let theprobability that each yt is an outlier be pt in equation (2).Given observed data Y and latent (unobserved or missing)data Z, we estimate initial fraction ϵ�0� of Y∈ f1�y� and ini-

tial parameters θ�0�i for each state.We obtain MLE estimates of θi, ϵ at each iteration j us-

ing the following iterative process. For each data channel:

Expectation Step

1. Compute p�j�t for each yt�t � 1; 2;…L�.

2. Compute log-likelihood Q�θjθ�j�� � E�logL�θ;Y;Z��.3. Declare convergence if Q�θjθ�j�� and Q�θjθ�j−1�� are suf-

ficiently close (defined here as ΔQ=L < 0:001). If thiscondition is not met by j � 100, we declare failure toconverge.

Maximization Step

If the algorithm has not yet converged:

1. compute ϵ�j�1� for the next iteration from

ϵ � 1

L

XLt�1

pt; �A1�

2. update model parameters θ�j�1�i to maximize

Q�θjθ�j�1��.

Appendix B

PDFs and Parameter MLEs

Here, we give reference expressions for the latent statePDFs in this work, along with MLEs for their controllingparameters θi and log-likelihood Q � Q�θjθi; y� for thetwo-state model in equation (1).

The exponential distribution

fi�y� �1

μie−y=μi �B1�

is controlled by the mean θi � μi. The MLE forms of μi forthe outlier (1) and null (0) populations are

μ1 �P

Lt�1 ptytPLt�1 pt

; μ0 �P

Lt�1�1 − pt�ytPLt�1�1 − pt�

; �B2�

respectively, and the log-likelihood in terms of computedquantities is

Q �XLt�1

pt

�log ϵ − log μ1 −

ytμ1

� �1 − pt��log�1 − ϵ� − log μ0 −

ytμ0

�: �B3�

For a Gaussian distribution with mean fixed at zero

fi�y� �1

σi������2π

p e−y2=2σ2i : �B4�

The controlling parameter is the variance θi � σ2i for whichMLE forms are

σ1 �P

Lt�1 pty2tPLt�1 pt

; σ0 �P

Lt�1�1 − pt�y2tPLt�1�1 − pt�

: �B5�

The log-likelihood is then given by

Q �XLt�1

pt

�log ϵ − log σ1 −

1

2log�2π� − y2t

2σ21

� �1 − pt��log�1 − ϵ� − log σ0 −

1

2log�2π� − y2t

2σ20

�:

�B6�Finally, for the Rayleigh distribution

fi�y� �ys2i

e−y2=2s2i ; �B7�

the scale parameter (mode) θi � s2i controls each state. sirelates to mean and variance by s2i � 2μ2i =π ands2i � 2σ2i =�4 − π�, respectively (Rabiner, 1989). The MLEforms of s2i for the model in equation (1) are

s21 �1

2L

PLt�1 pty2tPLt�1 pt

; s20 �1

2L

PLt�1�1 − pt�y2tPL

t�1 1 − pt: �B8�

The log-likelihood is

Q �XLt�1

pt

�log ϵ� log yt − 2 log s1 −

y2t2s21

� �1 − pt��log�1 − ϵ� � log yt − 2 log s0 −

y2t2s20

��B9�

Department of PhysicsUniversity of AlbertaEdmonton, Alberta T6G 2E1, Canada

Manuscript received 10 July 2014;Published Online 12 May 2015

1618 J. P. Jones and M. van der Baan