SRL 68:4 Electronic Seismologist

Steve Malone
E-mail: steve@geophys.washington.edu
Geophyics AK-50
University of Washington
Seattle, WA 98195
Phone: (206) 685-3811
Fax: (206) 543-0489

Why do so many seismologists who need to do computations on data start right off writing their own computer programs? Wouldn't you expect that for most computational needs someone has already written a program you could use or easily modify to do what is needed? Many of the general purpose seismic analysis packages such as SAC, Geotool, Datascope, or PITSA are very adaptable or configurable to a large variety of tasks. Of course, the one that can do what you need and that you already know doesn't understand your data format. So, at a minimum, you must write a data format converter. Too often, even when an existing public domain program is available that will do exactly the task at hand, a seismologist writes a new program or modifies an inappropriate program. Such duplication of effort is inefficient, yet it seems to be accepted, if not expected. As the complexity, sophistication, and particularly the volume of seismological data processing increases, such wasted effort becomes increasingly costly. What can we do?

To try to understand this situation better, the Electronic Seismologist (ES) attended the FISSURES (Framework for Integration of Scientific Software for University Research in Earth Sciences) workshop organized by IRIS in May of this year. Besides deriving long obscure acronyms, IRIS is working to help provide advice and direction to the seismological community regarding the development of software tools for the display and processing of seismic data. Approximately forty seismologists experienced in and interested in the details of computer application to seismological problems and representing a broad spectrum of seismological research organizations attended this workshop. Also, seven people with computer expertise from other disciplines were invited to provide different perspectives. Experts in commercial software development and database management issues as well as experts in establishing industry standards gave presentations and provided insights into how problems similar to ours are addressed in other arenas. The workshop organizers started off using a tree analogy (Figure 1) in which the roots are the data sources and the leaves are the end-user application packages. The trunk and branches connect data to applications. The current situation in seismology seems analogous to many small trees, each with only one or a few leaves and roots intertwined in a hopeless mess. Applications (leaves) on one tree cannot easily access the data (roots) of another. It would be good to have one tree in which all applications can easily access all data sets through a common trunk. The mechanism for doing this is, of course, a Seismic Attachment Protocol (SAP), which can move information between data applications as well as to and from storage.

Figure 1. Tree analogs to seismic software organization. "A" is the official IRIS FISSURES version of the ideal organization model. "B" is the Electronic Seismologist's interpretation of the current situation.

For three days there were presentations and discussion in both small and large groups, and botanical confusion (at least for the ES) reigned. What were we there for? What was the exact problem we were trying to solve? What was the process we were to use to solve the problem? What was this "SAP", and how do we go about defining it? Despite the confusion, there did seem to be some general agreement as to the need for a solution and some possible directions in which to move. By the time this SRL issue is published, the summer issue of the IRIS Newsletter should have a summary of the meeting, and there will have been additional discussions at the IRIS workshop in June. As this is being written, the day after the FISSURES workshop and without additional botany classes, the ES cannot make a definitive analysis of the current situation, but is encouraged that things may head in a productive direction. The rest of this article will consist of observations and impressions jotted down in notes at the workshop. Few if any of the attendees may agree with these interpretations (including the ES, who often cannot read his own notes).

First let's explore some possible reasons that so much computer code is written and rewritten in what seems to be such an inefficient exercise. Some of these reasons are not relevant to the "SAP" concept but should be recognized anyway. The verbalized reason for writing a seismic analysis application is almost always, "I just want to do science, and I need a way to process my data in the proper way." The ES sometimes heard (or thought he heard) unspoken words such as:

Such unfortunate, but very human, excuses certainly play some part in any serious activity done by bright, competitive, dedicated professionals, but these were not overt subjects at the workshop. More relevant comments heard relating to the general problems and frustrations of the current situation were:

Such undercurrents were subtle and maybe only imagined. The actual agenda of the workshop was divided into the following four general types of presentations or activities: (1) experiences and frustrations of seismologists with the current software situation, (2) current seismological software tools, (3) techniques other groups use or expect to use to handle similar problems, and (4) open discussions.

Seismologists' frustrating experiences using software to process large data sets included calculating centroid moment tensors using GSN data and combining PASSCAL data for refraction lines. In each case, specific solutions had been worked out using combinations of programs obtained from elsewhere, written, modified, and combined with specialized scripts. Directory/file-naming conventions helped with organization. These systems were not transportable, nor even easily used later by the same researcher. Reflection seismology-type experiments seem to be naturally better organized, but only if contained within one commercial processing package. There is no easy method to exchange or incorporate data from other types of sources. Regional network operators have very diverse, unsharable sets of software for routine recording and processing and often no in-house computer programmers to help maintain them. So they must depend on the good will of others for help.

There were no seismologists who thought they had a general solution to all parts of the problem. However, many had very impressive solutions to specific classes of problems. In fact, the number, sophistication, and power of the application packages described both impressed and depressed the ES. The sophistication of analysis packages has increased remarkably over the past few years, but the ES was reminded how much similarity and redundancy there still are. Modern computers and software tools have greatly increased the speed with which fancy applications can be developed, but they have not made the distribution and sharing of solutions easy enough to reduce significantly the duplication of effort. The ES was encouraged by several of the described packages which seem to be more in the line of flexible tool boxes rather than monolithic, "do everything" programs. In particular, the flexibility of MATLAB-based packages and the number of modules provided by the Datascope seismic analysis system were impressive. Even in these cases, as well as for other sophisticated analysis packages, it was painfully obvious to all that interchangeability of data between packages was a major problem.

After listening to multiple presentations about the nth seismic analysis package, a refreshing change was hearing from various computer industry experts and data managers from other disciplines about current thoughts regarding software design. In particular, the issue of writing software so that it is interchangeable, shareable, nondata-format specific, and reusable was addressed. Analog efforts to the IRIS FISSURES initiative from the atmospheric science and GIS disciplines illustrated what types of things can be done to make software and data sharing more effective. Presentations by software developers of large commercial packages described leading-edge software technology and provided glimpses of the future. Interchangeability of software modules between competitors was seen as a way to "future-proof" one's current efforts. Make a better module and the world will buy it and make you rich, but only if it will interact with others in a seamless way. The key to this interaction is the interface. A generic interface must be well designed so as not to limit its use by any type of module. Separating the definition of the interface from potential implementations of it seems to be key. Don't get these two things mixed up! The ES as well as many others struggled with this imperative, since we naturally have always thought in terms of implementation. How does one go about defining this interface, or "SAP", in an abstract enough manner that it doesn't automatically imply a certain implementation? The concept of abstracting the essence of both data and action for the interface was foreign to most of us. There was a tendency for people to think of the "SAP" as a "super format" or wrapper around all older formats. This view, the experts insisted, was not correct.

Mechanisms seem to exist for designing an interface that does not box one in by building in or defining implementation. Other industries have experienced this process, and it may be time for seismologists to give it a try as well. The process used by others apparently is to have one group of "software architects" deal with the interface at a very abstract level while they consult with experts in the specific field to help understand the way data and applications interact. Can this process work in seismology? Since the seismological community is not part of the commercial world and has its own and different driving forces and limits, it is not obvious that the same process would be appropriate for us. Determining the specifics of such a process is a first step. Group discussions at the workshop grappled with both the concept of an abstract interface and a process for defining it for seismologists.

By the end of the workshop, perhaps because most were too tired, too confused, or too smug to express contrary opinions, there seemed to be a consensus that the idea of understanding, defining, and using "SAP" was worth pursuing. There was hope that the situation can improve and that, with the resources IRIS has available, it should follow up on the workshop with additional efforts. While the ES is similarly encouraged that things can improve, he does have some concerns about issues that were never addressed at the workshop. Early in the workshop a metaphor was used: Trying to get seismologists to move in the same direction regarding software development was like herding cats. The ES wonders if IRIS has thought about whether catnip or a big dog will best effect the desired outcome. Can the solution be made so overwhelmingly obvious that it will be adopted as a matter of course by most seismologists, or will it take a big dog to "encourage" use?

The ES is fairly skeptical about seismologists flocking to a "computer science-y" solution requiring a significant investment in learning something this new without some barking and nipping at heels. He is reminded of the standard format for exchanging earthquake data, SEED. While it is a format few like, because the IRIS DMC refuses to accept or provide earthquake data in any other format, most admit it has become an effective format. In this case the established use of the standard, in all its ugliness, has made the exchange of large earthquake waveform data sets much easier and more robust. Such a "big dog" is not likely to work in the case of "SAP."

On the other hand, catnip, in the form of support for reading and writing SEED, has greatly aided its acceptance. It is now fairly easy and quick to get customized data sets in SEED and convert them to one of four waveform formats using rdseed. IRIS's support for the development of a "SAP" is certainly necessary if it is to be done well enough to make a similar impact. Since "SAP" has no use on its own and will require adoption by a significant portion of the seismological software developers who are supported by many different organizations, some with little direct connection to the IRIS community, it is not clear what the catnip in this case might be. If "SAP" were something as simple as an exchange format, if it could be agreed to, its use would be fairly straightforward to incorporate into existing and future applications. Since it may require a fundamental shift in the way one thinks and writes code, it may be much more difficult for many of the established seismological software developers to run with. Serious catnip and at least a small yappy dog may be needed.

The Electronic Seismologist has concerns that these metaphorical uses of cats, dogs, and trees is leading in a nonproductive direction and so will stop at this point and try to concentrate his limited literary skills on purely technical issues in the future.

SRL encourages guest columnists to contribute to the "Electronic Seismologist." Please contact Steve Malone with your ideas. His e-mail address is steve@geophys.washington.edu.