SRL 73:4 Electronic Seismologist

ELECTRONIC SEISMOLOGIST
July/August 2002

Steve Malone
E-mail: steve@geophys.washington.edu
Geophysics, Box 351650
University of Washington
Seattle, WA 98195
Phone: (206) 685-3811
Fax: (206) 543-0489

LOGOUT AND LOGIN

After seven years and 32 columns, the Electronic Seismologist (ES) may have blown a fuse, short-circuited, or just powered down. There ain't no more juice in the circuits. The bit bucket is empty. The do-loop is done. Syntax error and the core-dump file is zero-length. It's time to pass the keyboard to someone with an object-oriented approach; with massive pushdown stacks, megabytes of memory, and an eight-pipe, parallel processor. Now in shutdown mode, the ES produces his last column. However, before your eyes glaze over and you go back to doing something useful, check out the report by the new ES that follows my ramblings. This time you get two ES's for the price of one. Such a deal!

As I retire from the publish-or-get-nagged business of providing columns for SRL every other month or so, I feel the need to look back to see how it all started. Don't worry, I won't get maudlin. A quick glance through some of my early columns reveals just how transitory much of this geeky computer stuff really is. My first column in early 1995 was published in the early days of the "World Wide Web" and made a big deal about how handy the Internet would be for seismology. No duh! In fact, the guts of the article was a reference list of the Web, FTP, bulletin boards, finger servers, and e-mail where data and information about seismology (excluding exploration seismology) could be found. This "seismosurfing" list had been generated by searching the Internet by hand (no search engines existed then) using guesses, heuristics, and luck. It probably contained most of the offerings available at the time. This list was made available on the Web (http://www.ess.washington.edu/seismosurfing.html) and has been kept mostly up-to-date since then. Of course, it is no longer updated using a manual search or even by automatic means, but relies on others to send me notes on what is missing or wrong. In particular, Torild van Eck from ORFEUS has been a great help in passing on references he somehow finds.

Seismosurfing has expanded from the initial list of 50 entries, only 15 of which were Web addresses, to more than 260 entries, of which almost all are Web addresses. I check out each request for placement on the list and turn down many requests because the referenced pages don't include fundamental seismic data or direct research results. I established these criteria early on to make the list useful to myself and not too long. While there are now many similar lists, some much more extensive and better-formatted or indexed, "seismosurfing" seems still to be used by many. A look at our Web-server logs for the past year shows 4,000-8,000 hits per week not including those from robots. Only about 20% in a day come from repeat users. A quick look at several search engines shows that this page is referenced by between 430 and 740 different Web pages around the Internet. It now has a life of its own.

This searching brings up a pet peeve of mine regarding Web offerings. After wasting far too much time surfing the Web I find the content of many sites to be minimal. Too often a site claiming to have information on a topic will be nothing more than a list of references to other sites about that topic. Endless, contentless loops result. Those sites also often have a beautiful, grand layout with wild colorful graphics and fancy animation but nothing of substance. It seems to me that often the simpler a Web page is the more valuable its contents. Nice Web page dressing can help one enjoy the offerings but doesn't make up for a lack of substance.

My absorption with the Internet and its use in seismology has obviously colored many of my columns. Fortunately for the reader other contributors have broadened the topics covered to include a larger variety of subjects. And lucky for me, I only had to write fewer than half the columns published. In fact, without the contributions of many other authors this column would have long since been relegated to /dev/null. Some authors came out of the woodwork and others needed some prodding or encouragement from me. For these authors I was convinced they had something of interest to say but were too bashful to write it without being asked. There are certainly many others from whom I would have liked to have had contributions, but they either didn't volunteer or declined invitations. While I (and certainly many readers) are very appreciative of these guest columnists (more than 30 seismologists contributed to these guest columns), I would like to thank in particular a group of seismologists from the University of Bergen who contributed three different articles. Not only is this group prolific in writing seismic software, but they also do a fine job of writing it up and making it available for others. I assume the new ES will also be appreciative of contributions from the community and I encourage anyone with thoughts on related subjects to contact him.

Finally, I must thank my editors who have been patient, understanding, and gently prodding when writer's block paralyzed my keyboard. John Ebel had the initial idea for this column and has been highly encouraging all along. Susan Hough continued that encouragement and has added ideas and a skilled editorial pen.

With those remarks and appreciations I will wrap up and turn this column over to someone who is not shy, has illuminating observations, and writes knowledgeably about "Information Technology" ... whatever that is. I present to you my last guest author and the next "Electronic Seismologist", Tom Owens. The ES is dead; long live the ES.

To the new ES, an instructional departing haiku:
   Write with interest
   things seismic and digital;
   but mostly, have fun

NEWS FLASH ... INFORMATION TECHNOLOGY IS HERE TO STAY!

Thomas J. Owens
Department of Geological Sciences
University of South Carolina
Columbia, SC 29208
Telephone: +1-803-777-4530
Fax: +1-803-777-0906 fax
E-mail: owens@sc.edu

In late March, the Electronic Seismologist sent his favorite cub reporter on his first special assignment to the EarthScope Computational Science and Information Technology Workshop in Snowbird, Utah. EarthScope (http://www.earthscope.org), in case you haven't heard, is a major program under consideration for funding in the U.S. that will examine a good part of the North American continent from all sorts of viewpoints in the coming decade. It clearly will challenge our ability to integrate many different data types into a cohesive view of Earth structure and processes. So with high hopes, notepad in hand, and a pencil in my ear (oop ... BEHIND my ear; next time I'll bring a laptop!), I headed to the mountains.

I came away with a strong sense of déjà vu. There we were, slightly disoriented geoscientists along with some outside agitators from the computer science community gathered together to wallow (or waller, where I come from) in acronym soup and to try to get a grip on a very important problem. Ah ... I've got it ... the year: 1997; the place: a nameless hotel on the outskirts of Chicago; the event: the IRIS FISSURES Workshop (Malone, 1997); the problem: produce a software framework to aid seismologists in data processing and research. Granted, five years later, there are quite a few differences. Many/most of the acronyms have changed, software frameworks are termed Information Technology, the problem has been expanded to include the whole of the geosciences, and, obviously, the venue is a vast improvement. The constants: simultaneously excited, confused, and concerned geoscientists; similarly stimulated and remarkably patient computer scientists; lack of a common language; and an overwhelming sense that the problem at hand is truly a grand challenge.

While at the workshop, I took lots of notes in anticipation of dazzling readers with the latest in acronyms and cutting-edge techno-speak. I'll probably resort to that eventually. In the end, one simple question captured the atmosphere so succinctly that it bears repeating at the onset of this discussion. I call it "The Question": Who is going to do all of this? First posed by Peter Shearer of UCSD but clearly on all our minds, this question captures the essence of the information technology dilemma that we face. So here is the practical and philosophical question that I'd like to discuss briefly in this column: Can we as a community develop the long-term relationships with the computer science ("IT") community, the financial resources, and the culture necessary to build and maintain a state-of-the-art information technology structure that will really make a difference in the way we do our science?

OK, let's break this down a bit. First, who are "we?" For the purposes of this discussion, I'd like to define "we" as "those who generate new data and knowledge about the Earth." Scattered around the globe with us, squirreled away in various hard drives, diskettes, floppies, map cabinets, and thin-section collections is a lot of information that may be of interest to many other people. In addition to the collection and interpretation of data that we are best trained to deal with, we now spend an increasing amount of our effort integrating our results with those of other researchers included in my definition of "we", but with distinctly difference expertise. Most of us now also have a clientele, a group of people ranging from K-12 students and teachers to emergency planners and engineers, to policy makers who rely on us to provide them with reliable information needed to address their particular applied problems. In short, we have an increasing need to provide our data and results to others with varying needs and skills and an increasing need to absorb data and results from other disciplines with which we have varying expertise and familiarity. Both of those tasks are hard because our colleagues and clientele are not always able to send or receive our information in a way that we can easily handle. Thus, most of these exchanges take place with those colleagues and clients with whom we have a close professional relationship that allows us to query and be queried until everyone is satisfied that they can use and understand what they have received.

Enter the computer scientists. No white horses, no claims of easy solutions, just a lot of enthusiasm and a vision for how the world COULD work. They describe a world in which queries come from people we don't even know and are answered automatically by our faithful computers. A world where we can find data by posing simple questions and get answers complete enough that we don't have to scratch our heads, curse, and call the person who collected the data numerous times before we have any chance of actually making use of what we get. If your initial response to this vision is "Yeah, right!", then might I suggest that you are crying out for help? Humor me and listen to these computer scientists for a bit. Their world is based on defining "Interfaces", "Web Services", "Markup Languages", and "Metadata" suitable for a geoscience environment.

Interfaces are the simplest to describe. They represent a definition of exactly how your program and my program are going to exchange information. For you FORTRAN buffs, interfaces are what you get when you put COMMON statements and arguments to subroutine calls on steroids. Interfaces are defined in one or more, get this, "Interface Definition Languages."

A "Web service" is really just a program that sits between a data source and the rest of the world. A Web service needs an interface so the rest of the world can talk to it. The most popular way of doing this at the moment appears to be something called WSDL, or Web Service Definition Language. WSDL is new, but these concepts have been around awhile ... at least since the 1997 FISSURES Workshop. Don't confuse a Web service with a Web server. A Web server is an example of one specific Web service: feeding Web pages to their clients, usually the familiar Web browsers. In general, each Web service does some specific task, but those tasks can be completely unrelated to "surfin' the Web" as we know it.

Markup languages have grown up quite a bit since 1997. In 1997, most of us were aware of HTML (Hyper-Text Markup Language), a language that tells Web browsers how to display Web pages on their computer screens. This concept has grown because of the possibility of a data file itself describing its contents in a more flexible manner than our traditional "flat files" can. At the core of most markup languages at the moment is XML (Extensible Markup Language), which, as its name suggests, allows itself to be tailored to a specific application. For instance, we have tested the use of XML as a means of generating self-describing seismic velocity models (Danala, 2000), and we are also using it to make self-configuring user interfaces for a K-12 seismic data explorer. The big concept in the IT world at the moment seems to be the idea of a "semantic markup language", that is, an XML-based way of defining the vocabulary of a specific community that can then be used to develop smarter ways of querying distributed sources of information.

Next is "metadata." Metadata is that stuff that you know about your data that no one else knows. Right now, metadata is what makes your phone bill so high when you send a seemingly complete data set to someone else! It is the stuff buried in your field book, in your head, and scribbled on your desk blotter that really describes the nature and limitations of your data. Of all the elements of the computer scientist's brave new world, properly representing metadata is probably most critical to the success of IT in the geosciences. First, it's because it is the key to using the data wisely and, second, because it requires a commitment on the part of the data provider to encapsulate appropriate metadata in a useful digital form.

Those are the pieces; how do we proceed? At Snowbird, a challenge put before us was to develop an Earth Sciences Markup Language (ESML), a definition of what we do that is simple enough for a computer to interpret. Somewhere around here was when Peter Shearer asked "The Question", because it is clearly a daunting task. The answer to The Question revolves around our relationship with the computer science community. The comment was made that "these IT guys are a solution looking for a problem." Another way of saying this is that there are a lot of smart people out there willing to help us if we can just find a way to establish the right relationships. Our role in the relationship is to serve as "Domain Experts." We have to have the interest and patience to describe what we do in a manner that a computer scientist can understand and implement in a modern information management framework. Then we have to be willing to change the way we manage our data and metadata to allow us to fold it more easily into this emerging framework. The essence of The Question is that we have to do all of this and still find time to remain "those who generate new data and knowledge about the Earth!"

I wish I could close this commentary with a clear answer to The Question. But I can't. I can say that the potential reward is great. If I commit to this idea and someone whose data that I could really use does as well, then the fruits of my labor will be the ability to do my science more effectively. Imagine being able to ask a search engine "Are there any xenoliths from the lower crust available in my study area?" And if there are, being able to overlay their pressure/temperature estimates with error bars on your seismic velocity model easily. How about a middle school student who just learned that the Earth has a crust being able to ask the question "how thick is the crust beneath my house" and receiving a meaningful answer. Perhaps the difficult aspect of this is that the real reward for this effort may not be in the work that you put into integrating your own data into the IT framework that is envisioned, but rather in the work that others put into making their data available. I have my data and I can use it, of course. But I can't very well use other kinds of data at the moment. So we must develop a culture that encourages and rewards the effort it will take to prepare and contribute data in a manner that will almost certainly be a little unfamiliar to us all. If you think you can do better science when you can tap into a well developed IT framework, then we need to start sometime, because our problems are not getting any simpler.

The motivation for the Snowbird workshop was that when EarthScope gets funded it could be the catalyst we need to drive our transition to IT-based geosciences. In addition, the National Science Foundation, through its Information Technology Research (ITR) Initiative, is providing the opportunity for computer scientists to get together with other communities to help us begin to take advantage of emerging technologies to solve our problems. One major project funded in the Earth sciences is the Southern California Earthquake Center (SCEC; http://www.scec.org) ITR project. SCEC, in collaboration with computer science experts, has received a major grant to begin to cast a spectrum of earthquake research problems in southern California into a modern IT framework. This could be the launch point for our community to frame the bigger semantic framework that will benefit the broader geoscience community.

OK, time for your homework assignments! [Can he DO that?] First, take a computer scientist or two to lunch. Describe what you do, ask them what they do, try to find some common ground. It will help both communities in the long run. Next, join the EarthScope IT discussion forum (http://www.scec.org/ESIT) and start listening/contributing. Third, follow the SCEC project. It is the clear prototype for geoscientific information technology research and management. Finally, please take note that I just called a five-year, ~$10 million project a "prototype!" This stuff takes time and money, so stay engaged in the process and contribute where you can. The diversity and volume of data that we need to do our science are not going to diminish, so don't bury your head in the sand and hope this IT stuff goes away.

REFERENCES

Ahern, T. (2001). DHI: Data Handling Interface for FISSURES, IRIS, IRIS DMC Newsletter 3 (http://www.iris.edu/news/newsletter/vol3no3/page3.htm).

Danala, R. (2000). XML Representation of One-dimensional Seismic Velocity Models, M.S. Thesis, University of South Carolina, 121 pp.

Malone, S. (1997). The Electronic Seismologist goes to FISSURES, Seism. Res. Lett. 68, 489-492.

SRL encourages guest columnists to contribute to the "Electronic Seismologist." Please contact Steve Malone with your ideas. His e-mail address is steve@geophys.washington.edu.

Posted: 4 December 2002
URL's updated: 21 January 2003