REAL-TIME DATA EXCHANGE (revisited)
The Electronic Seismologist (ES) began hyping the Internet over six years ago: how it was going to help seismologists acquire, analyze, and present seismic data in new and wondrous ways. It was touted as the next really great whiz-bang thing that everyone had better get familiar with or get left behind. Boy, was the ES right! He clearly is a cool dude with a crystal ball who can see the really obvious when it whacks him on the head. After a few columns he even stuck his neck out and "envisioned a time in the future where data from almost any seismograph in the U.S. could be available in a continuous stream to almost anyone" (Malone, 1996). A data rate of 2 Mb/sec was estimated to be enough capacity to get the whole National Seismic Network. Well, we are close to being there.
Currently there are a number of institutions that collect real-time seismic data from stations outside their own networks and process or provide these data as a "virtual seismic network." The route that a particular seismic station's data take from the seismometer to a researcher can be complex and can vary with time. One can get composites of waveform segments from many stations from one event or a continuous feed from a subselection of stations in different ways. It gets confusing enough that the ES (who is, after all, an expert in such things) can't tell what's going on and if the data he is viewing are coming or going. It's time to try to make a little sense of all of this. The ES would like to review all the different ways real-time data are exchanged between different groups and provide a cogent summary for his enthralled fans. However, after getting started with the summary he became more confused than ever. For now he will list only some of the organizations of which he is aware that acquire significant amounts of data from stations they do not run. One of these will be described in some detail. With the help of Sandy Stromme of the IRIS Data Management Center, the real-time data collection facilities available through the new IRIS DMC BUD (Buffer of Uniform Data) will be outlined.
Organizations Collecting/Processing Multinetwork Real-time Seismic Data
Via a quick surfing of the Web and direct contact with a few system managers, the ES found quite a few places where stations from multiple networks are being collected. In some cases information about the system could be obtained from Web documents or other direct Internet queries. For example, by using the network information provided by the AutoDRM summary system Waves4U at http://seismo.ethz.ch/waves4u/ (Kradolfer, 2000), the ES was able to get some information about currently available real-time data at several places. (One should also see the article in this issue by Takeuchi et al. [page 166] which describes a very clever Java [Remote Methods Invocation] application that allows one to retrieve data from a variety of data centers, although not in real time.) The following is a partial list of some of the sites found, listed in order of number of stations being acquired.
IGPP Broadband Seismic Data Collection Center, UC San Diego
IRIS Data Management Center (Buffer of Uniform Data)
CTBTO International Data Center, Vienna, Austria
U.S. National Seismic Network/NEIC
EC-Project MERIDIAN at ORFEUS Data Center
USNSS regional networks
IRIS Buffer of Uniform Data (BUD)
The IRIS Data Management Center (DMC) runs no seismic stations directly, nor does it process seismic data for event detection, location, or analysis. The DMC's primary mission is to collect, archive, and distribute seismic waveform data. In having a well defined and limited mission, it is different than most institutions in seismology. The result is that the DMC does a bang-up job of collecting data from many different sources (25 different networks), storing it all in a reliable high-capacity archive system (30 Tb), and providing the data quickly to a huge number of users (50,000 shipments, total more than 0.5 Tb during 2001).
Because of good network connectivity and inherent efficiencies the DMC has been transitioning from a tape-based data-collection system to an Internet-based system. Individual data volumes have been shipped from various data collection centers via FTP for some time. However, increasing amounts of data are being streamed over the Internet into the DMC, where they reside in temporary storage before being permanently archived. Because there are many different techniques and protocols used to retrieve continuous data feeds, IRIS has developed the concept of a Buffer of/for Uniform Data or BUD. Data in this buffer are in a uniform format no matter what the source and are stored in a strictly defined directory structure. BUD serves as (1) a staging area where data can be quality-controlled (QC) before archiving, (2) an online buffer of near-real-time data immediately available to data users, (3) a single point from which to monitor incoming data flow, and (4) a single point from where to archive near-real-time data to offline storage.
At the end of 2001 the BUD system seems to be fairly stable, with data flowing into it reliably from many different sources. A quick look through its holdings shows a total of 476 stations (2,236 channels) from 16 different networks, but these numbers are increasing almost daily and may be double that number in a few months. The durations of data streams in BUD seem to vary from a few days to about two months. There are already several very useful Web-based tools to help one examine what stations/channels are there, how up to date they are, and even to look at waveform segments.
BUD Web Interface
From the user's perspective a good way to get a feel for BUD is to poke around in it. A Web interface to BUD can be found at http://www.iris.washington.edu/bud_stuff/dmc. Besides providing some documentation, it has entry points to several interface tools. The tools available now are primarily ones developed for monitoring data flow and quality through BUD. There is a simple display showing latency values for incoming stations in a color-coded chart that is useful to both the IRIS staff and the staff of networks providing continuous data to BUD. Some tools developed for other purposes have been adapted for use with BUD. GOAT is a Gap/Overlap Analysis Tool that can graphically, or within a table, show where missing or duplicated data segments for a channel exist. BUD stations can be selected in several different ways and their locations mapped using a GMT interface. The Wiggles waveform applet has been integrated into BUD so that a Web user can select and view waveform segments in near-real time. Finally, there is an FTP interface for direct access to BUD data files.
The Web interface is particularly useful to network operators who want to initiate and maintain a high-quality data stream into BUD. It has also proved useful to the ES for detecting certain types of problems generated in the original data-acquisition part of the seismic network he tries to help run.
In order to have uniform data BUD's internal data format is miniSEED organized in individual flat files by channel-day. All incoming data are converted to miniSEED before the files are written. They are organized in a simple directory structure by network and then station. This simple organization is easily understood. The directories and files can also be easily reviewed, modified, or added to with standard Unix utilities. Forexample, an inventory of networks and stations in a BUD directory is immediately visible by inspection of the BUD directory structure. An estimate of whether there are data for a particular channel, day, and time in the BUD directory can be derived from file modification times, and an expected latency (typically a function of logical record length, sample rate, and input source) can be associated with a channel. The channel-day miniSEED files are fairly small and so can be read quickly. These features of the BUD directory and file structure make the construction of the Web-based tools fairly easy. Keeping many channels for many days can generate a lot of files, but BUD is intended as a flow-through, near-real-time staging area, so the total number should be limited. At the end of 2001 it contained 323 Gb in about 140,000 files of data. BUD hardware consists of a 1 terabyte RAID on an 8-processor E4000 Sun server. Input to BUD is from two primary sources, an Antelope™ Object Ring Buffer (ORB) system and an Earthworm Waveserver client. The BUD Antelope streams originate from a number of special data-exchange protocols, each of which has its own peculiarities, adherents, and characteristics. These include LISS (Live Internet Seismic Server from ASL, http://devo.liss.org/), NRTS (Near Real Time System from IRIS IDA Data Collection Center, http://quakeinfo.ucsd.edu/idaweb/Telemetry/, seedlink (based on SeisComP from GEOFON, http://www.gfz-potsdam.de/geofon/seiscomp/), and direct Orb2Orb Antelope (from Boulder Real-Time Technologies, Inc., http://www.brt.com/) sources. The Antelope system combines data from these and writes miniSEED files directly into the BUD directory structure. An Earthworm Waveserver client, ew2mseed, which writes out continuous miniSEED data files to a BUD directory structure, was developed by ISTI (http://www.isti.com/) under contract to the DMC. These clients connect directly to Earthworm Waveservers at regional networks keeping track of what data segments are needed to maintain a continuous feed. Thus, data sources using at least five different seismic data-exchange protocols are combined in BUD, resulting in all data, no matter what its source, ending up in the same format in near-real time.
The ES is really impressed with all of this data from many different sources flowing into a uniform buffer, but from the above descriptions it seems to be like a WOM (Write Only Memory). What good is it? It turns out that BUD does have uses other than checking on its own operational status. Anyone using the new WILBER II (http://www.iris.washington.edu/cgi-bin/wilberII_page1.pl) Web interface will be indirectly using BUD. This event-oriented interface accesses BUD for the equivalent of SPYDER™ data. By the time an earthquake event message is received from NEIC, BUD already has lots of continuous data for that event, and using those data WILBER-II can quickly make an event-based collection of waveforms. This more than doubles the average number of stations from which data are retrieved for an event. The dial-up SPYDER system is also being modified to feed waveform segments from dial-up stations into BUD. Once completed, the old version of SPYDER will be phased out.
One can make one's own channel-time selections by going through the BUD query interface and selecting a set of channels and a time window. A miniSEED file of the selected channels is placed in the IRIS FTP area for direct download. Be aware that this file is only miniSEED, so one will need a dataless SEED volume to be able to use it. Other services are available and planned. One can have a network's data continuously delivered to you through a LISS server. A prototype Fissures Seismogram Server has been developed and is currently being tested. By special arrangement one can get an orb2orb feed through Antelope. An AutoDRM that serves BUD data is also under development.
Increasing volumes of seismic data are being distributed in real time far beyond the organization that operates the original station. In many cases the waveforms are being consolidated for specific local purposes. However, the IRIS DMC, whose job is to distribute seismic data, does the consolidation primarily to pass it on to others. So, in the immortal words of the IRIS DMS program manager, Tim Ahern (and the ES still groans and shakes his head when hearing them), when speaking of their Buffer of Uniform Data, "This BUD's for you."
Kradolfer, Urs (2000). Waves4U: Waveform availability through AutoDRM's, Seism. Res. Lett. 71, 79-82.
Malone, Steve (1996). "Near" realtime seismology, Seism. Res. Lett. 67(6), 52-54.
SRL encourages guest columnists to contribute to the "Electronic Seismologist." Please contact Steve Malone with your ideas. His e-mail address is firstname.lastname@example.org.
Posted: 15 November 2002 URL's updated: 29 January 2003