On some days the Electronic Seismologist (ES) is feeling long of tooth and wonders if he is up-to-date enough on the latest nerdy things to write a dazzling, you've-gotta-know-this-information type of column. On other days he is just lazy and hopes someone else will write his column for him. In looking around for a sucke--, errr, brilliant and with-it seismologist with experience and understanding of modern "Information Technology" (translation: computer geek), the ES thought of many possible candidates. Of course, picking one and convincing him to write a column is another matter. After procrastinating almost until deadline he started the process of asking, begging, pleading, flattering, and promising riches to the person on the top of his list. Tom Owens accepted! Those who know Tom realize that he not only has and will state interesting opinions, but he actually knows something about modern computer science, or at least he has convinced the ES that he knows something.
Many seismologists dealing with data acquisition and processing have been grappling with the problems of how to deal with a large increase in the number of waveforms arriving at data centers. The simple solution (at the top of the ES's list) is bigger, faster computers with more memory and ever-larger dimension statements to create bigger arrays. (Oops, the ES's Fortran is showing.) Such brute-force, old-fashioned solutions may work under some conditions for some data sets, but one needs to think at a higher level to solve the real problem. That problem is not just acquiring and archiving large volumes of data, but rather bringing those data to bear on interesting scientific problems. The ES too often gets stuck down in the bits and bytes and loses track of the real problem to solve. He is very happy to host a column containing cogitations on the big picture. In this guest column Tom Owens (in the guise of Noah) tosses out a few design considerations for an ark to prepare for the coming flood of data.
ARE WE READY FOR THE GREAT FLOOD?
Thomas J. Owens
When Steve Malone asked if I would like to write a guest column for the Electronic Seismologist, it seemed like a good idea. I had a few ideas for a column, and I distinctly remember Steve saying that I had two weeks. No Problem. But it turns out that two weeks, in Steve's mind, is actually ten days. So, I whined and got an eleventh day. Now it is early morning on the twelfth day, and I am taking advantage of the time zones to finish (OK, start and finish) this column before Steve settles in for his day's work. My saving grace may be Seattle's Morning Latte Law, which buys me an additional 30-45 minutes relative to an electronic seismologist working out of, say, San Diego.
Where did these last eleven days go? What have I done, other than gain a greater appreciation for what Dave Barry faces on a regular basis? Basically, I did my day job in the Internet world. I taught my classes, but I'll spare you those stories. I also worked with Richard Borst, a high-school physics teacher at Silver Bluff High School south of Aiken, South Carolina after he alerted me to an earthquake that opened a sinkhole on a highway near the Savannah River Site. Using data that flows on the Internet in real time from his school and twenty other high schools in South Carolina, I analyzed this event and an earthquake in the area last October. Using data from Richard's school and another high school 60 km away, I was able to show that the most recent event was likely a sonic boom. We were both impressed that we obtained an estimate of the velocity of sound in air of 1,085 ft/sec. For his part, Richard investigated the sinkhole and concluded that it had been merely a "pothole" as recently as the previous day and that the shaking that evening likely only enhanced its reputation rather than its size. A mystery was solved, and a good learning opportunity resulted for Richard's physics students.
In addition, I reviewed two draft manuscripts that summarize some the results of a year-long PASSCAL-style experiment in Scotland. We analyzed a couple of dozen earthquakes from a half-dozen stations for both crust/upper mantle structure and transition zone structure. The data collection took over a year. The data analysis took over a year and we currently have another two-year deployment in the same area. To round out my late February efforts, I put energy into various issues related to EarthScope (http://www.earthscope.org), the major research equipment proposal that, when funded, will change the way we do Earth sciences in the coming decade. I wrote my congressional delegation, discussed the USArray component proposal with colleagues, registered for the EarthScope Information Technology Workshop, and reviewed materials from the recent EarthScope Education and Outreach Workshop.
Pretty satisfying 10-ish days, eh? A little teaching, a little outreach, a little research, a little service. So, am I happy? Nope. I'm concerned, not about how I used my time, but about what was going on in our electronic seismology world during the same time period. What happened while I was merrily doing my job? Another 84 Gbytes of seismic data from ~350 stations around the world were archived at the IRIS Data Management Center (DMC)! Even more data flowed into various national, international, and regional data centers around the globe during this period. Yup, while I worked with two or three seismic traces from a sonic boom, and my colleagues put the finishing touches on a year-long analysis of a couple of hundred more seismograms in our Scotland study, no one stopped collecting even more data. If that is not sobering enough, ponder this: If EarthScope/USArray were fully deployed, during the last ten days about 350 Gbytes from a total of over 800 stations would have flowed into the IRIS DMC alone! That's at least 35 Gbytes of seismic data every day for the next decade, more data in a month than was archived at the DMC in all of 1995. Heck, I'm still analyzing some of the data that I collected in 1995.
Are We Ready for the Great Flood of USArray?
On the up side, in some ways, we are. Real-time systems exist to transport these data from the field to data archive centers. IRIS, the U.S. and other national networks, and various regional networks do have experience doing quick quality control on substantial volumes of data. Advances in data storage will allow us to archive these data relatively painlessly. Thus, we are ready to receive this data, check it quickly, and file it away. Then what?
One only needs to read past ES columns to realize that we are a community that loves to get up close and personal with individual seismograms. And we love to write software to do these tasks. The ES column has many examples of people doing interesting things with small volumes of data using their own software. Can we continue in this mode of operation? If we continue to painstakingly analyze individual seismograms then we may severely underutilize this rich new data source. On the other hand, if we bounce from seismogram to seismogram and never have time to study any of them in great detail (call it the waveform slut approach), we also underutilize our data. Ideally, we would be able to analyze every seismogram in great detail and greatly multiply its interpretative value by analyzing it with its nearest neighbors in equally great detail. Are we ready to do this?
My contention is that, in order to handle the USArray flood, a large part of what we do as seismologists today must be automated in the very near future. We must develop high-quality automated preliminary analysis tools so that we can more effortlessly reach the point where we are looking at this new data in ways that do justice to the effort expended to collect it. One necessary element inreaching this goal is to avoid duplicating individual efforts. Recent ES columns (Malone, 1997; Lomax, 2000) outline ways in which software development efforts might be made more efficient. IRIS is also supporting efforts to improve delivery of data to the seismologist's desktop through its Data Handling Infrastructure project (Ahern, 2001).
The seismological community needs to examine similar approaches to streamlining initial analysis. This would involve identifying "sufficiently acceptable" analysis methods that could be set to receive data directly from IRIS and strip away the first few layers of analysis automatically with the intention of saving interested seismologists considerable time. This idea has been discussed in various forums recently, and the conversation always seems to bog down in attempting to define "sufficiently acceptable" analysis methods. Everyone likes his own methods and, in my opinion, is a little concerned about someone else's method being adopted as a "standard" approach. This is false logic. Everyone will always have favorite techniques. However, we, as a community, have a problem. If we can agree on some standard approaches then we can address the problem and allow all of us to start doing the really interesting science sooner.
An example of such an approach is right in our backyard. For twenty years the group at Harvard University has produced Centroid Moment Tensor (CMT) solutions for all significant global earthquakes. They are produced quickly, they are of high quality, and they are the initial reference for everyone's follow-up analysis. If the CMT solutions were the final word, there would be no need for special journal volumes on major earthquakes. What the CMT solutions provide is an essential service to the source-modeling community: a rapidly available frame of reference which is a starting point. Members of that community can start from the CMT solution and more quickly identify and analyze aspects of a particular earthquake source that is unusual or unexpected or simply of interest to them.
One can think of several other key parameters that might be determined automatically to provide the community with a frame of reference from which to undertake further research. Phase arrival times are obvious, but I think even higher-level analysis such as receiver functions and shear-wave splitting measurements could be automated. My conceptual example of this type of approach is the Receiver Reference Model (RRM). In receiver function analysis different researchers use different deconvolutions, different modeling methods, have different targets of interest, etc. But the basic scheme of this method has not changed for about twenty years. It should be automated. A simple flow chart for this is as follows: I file a request with IRIS that says that I am interested in all events 30°-90° from any USArray station using its SOD mechanism (Standing Order for Data). When a qualifying event occurs data would arrive on my machine automatically. Arrival of these data would trigger a sequence of standard analysis codes that could deconvolve the teleseismic earthquake, apply some standard waveform matching methods, and produce (at a minimum) a first-order estimate of crustal thickness and Poisson's ratio using this event and potentially stacks of previously recorded events. The results, including waveform matches, could be (at a minimum) posted on a Web site. Ideally, using DHI, they could be delivered to other seismologists who register their interest in these types of results. Those seismologists may not agree completely with the deconvolution used. They may not think that the waveform matching is the best possible approach, etc. Most importantly, however, they would be able to immediately look at the RRM results and identify features of the waveforms and the models that they find interesting and worthy of further analysis. The RRM, like the CMT solutions, would not be the end of the analysis, but the beginning. If we can move the beginning of the analysis closer to the end of analysis then we just might survive this great flood!
Speaking of the end, and introducing a painful phase change in my hydrological analogy, I close by pointing out that USArray is really only the tip of the iceberg. My focus on it simply reveals my bias as a U.S.-based seismologist interested in lithospheric and upper-mantle structure. Within the U.S. the Advanced National Seismic System (http://www.anss.org) will produce further instrumentation and merge strong-motion and far-field instrumentation (and could produce nearly twice the data volume that is estimated for USArray!). Merging USArray observations with other components of EarthScope produces yet another set of challenges. Making all of this data available in useful forms to a broad spectrum of potential users from Richard Borst and his high-school students to emergency planners to research seismologists represents yet another looming challenge. Internationally, broadband national networks are growing quickly. Other nations have programs similar to USArray/ANSS underway or planned (e.g., http://www.k-net.bosai.go.jp or http://bats.earth.sinica.edu.tw/). In short, the flood is a global flood with ample opportunities to work together to analyze and synthesize a diverse and rapidly growing body of data. I know, a RIVER of opportunities!! Now, how can I work water vapor and glaciers into this article?
Ahern, T. (2001). DHI: Data Handling Interface for FISSURES, IRIS, IRIS DMC Newsletter 3 (http://www.iris.edu/news/newsletter/about.htm).
Lomax, A. (2000). The ORFEUS Java Workshop: Distributed computing in earthquake seismology, Seism. Res. Lett. 71, 589-592.
Malone, S. (1997). The Electronic Seismologist goes to FISSURES, Seism. Res. Lett. 68, 489-492.
SRL encourages guest columnists to contribute to the "Electronic Seismologist." Please contact Steve Malone with your ideas. His e-mail address is firstname.lastname@example.org.
Posted: 15 November 2002