Contact Us | Members Only | Site Map

Association of Research Libraries (ARL®)

  Resources Contact:
Lee Anne George
Publications, Reports, Presentations
Membership Meeting Proceedings

Transformations in Astronomy Research Due to the Internet

Share Share   Print

Eugene, Oregon
May 13-15, 1998

The Future Network: Transforming Learning and Scholarship

Transformations in Astronomy Research Due to the Internet

David Schade, Astronomer
Canadian Astronomy Data Centre

Going from medicine to astronomy is in a sense going from body to soul because, in fact, the study of the universe is something that's deep in our soul. We really need to understand how the universe we observe came to be in the first place--how it came to evolve from its early state to the state that we see it in today.

This morning I had some thoughts about distance learning and about what can and what can't be achieved by computer-aided educational tools. One thing that struck me is that there is no substitute for looking at the real sky, because you don't understand your relationship between yourself as a human being and the real sky until you go out on a starry night and look up using your eyes or binoculars or a telescope. Along the same vein, there's no substitute for physics labs in universities. The reason I think that labs can't be entirely replaced by computers is that a physics lab is meant to demonstrate the relationship between the theoretical and the physical. If you use computers, the tendency is to demonstrate the relationship between the theoretical and the simulated, that is, what's on the computer. You just don't see the same thing. It doesn't make the connection. You're just comparing the theoretical with the theoretical, and you might as well not do it at all. So you have to be careful--especially in physics, but I'm sure there are analogies in the other fields--of what you expect from computer-aided learning.

I'm with the National Research Council of Canada, which is part of the federal government, and I work with Canadian Astronomy Data Centre (CADC), which is part of the Hertzburg Institute of Astrophysics. The mandate the federal government gave to the CADC is to provide services to the research community that help them to do better research and to develop facilities that they don't have the capabilities to develop themselves. So we try and develop what are nowadays mostly web-based tools that are then distributed to university researchers.

The Canadian Astronomy Data Centre is a very small group. There are really only about four or five of us, depending on if you count temporary and contract workers. It was founded in 1986, and it has grown from three to about four or five in that decade and really made great advances in the field of what we call data archiving. We think of data archiving in very broad terms; it involves data processing, data cataloging, data distribution, and a lot of other things.

You should understand what data actually is in terms of astronomy; data is images. It's a collection of pixels, and a pixel is a picture element. Then, essentially, it's a real number. Our data consists of collections of real numbers, and there's some relationship and contextual information to make these real numbers meaningful.

Data archiving has been around for a while. It's playing an ever increasing role in astronomy. Data archiving needed the World Wide Web to thrive. We could have saved the data, but we never could have done anything very interesting with it without the World Wide Web. It's really the ideal interface and the ideal distribution system for these data because it gives people capabilities to browse the categories and to get information very quickly. It's essentially instantaneous. So, although the Canadian Astronomy Data Centre was conceived in 1986 as a center where people would physically fly in and use these facilities, to my knowledge that never occurred. As soon as the Internet came up, well before the World Wide Web, it was used as much as it could be used, and it just grew and grew. And the day the World Wide Web became available, we had tools based on it and web browsers to access data.

Historically, space missions have had much better archives than ground-based telescopes. This is an important fact because space missions are a very small minority of data sources. There are many more ground-based telescopes, but that data is not archived effectively essentially because the people who lead these projects can physically get to the telescope and screw up the observation procedures. The way that it works is you make a proposal to a time allocation committee. Then if they judge you worthy, you go to the telescope. And the way that optical astronomers feel about the telescope is that I got the time; it's my telescope; I run it the way I want. That has the unfortunate consequence that everybody runs it differently from everyone else. So when the data stream comes out, it's incomprehensible to anyone except the person that was there, and sometimes it's even incomprehensible to them a couple of months later. I'm not exaggerating. We have very specific experiences along these lines, and it's a big problem.

One example is the High-Energy Astrophysics Science Archive Research Centre. It's not one of my favorite archives because it archives a whole bunch of high-energy emissions--X-ray emissions, gamma-ray emissions, etc.--and puts all this data in one place, and you can search across all these areas with one query. That is an important thing.

But they do other things here. There is an archive. There is software. There is information. There's public outreach and education. So, although I would call this an archive, we mean a lot more by the word "archive" than just storing the data.

Another example is the Hubbell Space Center Archive, which is a space mission, of course, but it's optical data. In that way it's similar to ground-based data. Because it's a spacecraft, they implement proper calibration procedures of the data, and the data stream then is very comprehensible.

My own organization is the Canadian Astronomy Data Centre. Now we do unique things. We archive Hubbell spacecraft data. We are not just a component of public data. We do some things better than they do, but they do some things for which we don't have the responsibility. We are science-focused, and they are mission-focused, with a lot of responsibilities for mission instrumentation, verification, and so forth, that we don't have.

But we also have ground-based data. The CFHT, the Canada France Hawaii Telescope, is the telescope that has so far produced the best archived ground-based data in the world, and, even so, it's not very good. Essentially, it suffers from the problems I described: Poor logging of information and poor calibration procedures. In other words, we are missing context. We have pixels, collections of real numbers, but without the accompanying information in a reliable formula that will allow someone else to use this information. So, as one of our main functions is to archive this data and distribute it, the majority of my time has been acting as an advocate for the proper archiving of ground-based data. It's not easy. The other side of this coin is that individuals will use the telescope and then go home with data. They feel that it's their data, whereas it belongs to everybody because the taxpayers paid for that data. It has information content that hasn't been extracted and that evolves with time--there's still information content in all these individual pieces of data.

So our big thing lately has been acting as advocates. We call it consulting. But, in fact, we believe that we have to start information procedures that guarantee that these data are useful for future generations of scientists, and to ensure that the taxpayers get their money's worth. The big project on the horizon now in ground-based astronomy is the Gemini Project, which is a consortium of Canada, the United States, Chile, Argentina, Brazil, and the United Kingdom. We're in the process of trying to get the project's board of directors to consider archiving as a real priority and to spend five or ten percent of their second generation instrumentation budget, which is a drop in the bucket, towards that aim. We are having a hard time getting it.

Now I'm a real believer in data archiving not because I work for an archive. I work for an archive because I believe in it. I was hired because I was the biggest user of the Canadian Astronomy Data Centre in the country--it's true. I publish papers on it. There are a lot of ways in which the Internet has transformed astronomy in terms of the literature, in terms of using webpages as a form of communication, in terms of e-mail, and all sorts of other ways. But I'm dropping those for the moment. They're each a subject in themselves, particularly the online literature question.

The bandwidth issue in archiving is illustrated by the whole data flow question. In 1982 people used photographic plates, and essentially you could fill a filing cabinet a year if you were a very active astronomer. And there was no problem with the labels on the envelopes and the filing cabinets that constituted the archives. That situation didn't change much until digital detectors came online, which was a long time ago now. It's changed a lot since 1982. In '82 we had very small detectors, and maybe in a year you'd get a gigabyte of data out of a detector. That data was further distributed over many observers, so there weren't any storage problems at the individual level. By 1992 we were getting a gigabyte per night from a single instrument. A researcher could even send it home and process it there. But things are exploding right now, and with the next generation of instrumentations we will be getting a gigabyte of data in just a few minutes.

The Megacam Project is a collaborative project for deep, wide-field imaging at the CFHT. We will process approximately five terabytes of data, so we need a much faster Internet just to maintain status quo. When this begins, we will have to carry tapes again, something I thought we had left in the past.

So the ways in which data archiving will really transform astronomy is that astronomers will become consumers of data rather than producers. Right now observation astronomers spend a lot of their time writing proposals to get to go use the telescope, make the observations, and then come back and process that data. That will change gradually. Of course, traveling to the telescope will never go away entirely, and, in fact, it shouldn't.

It comes back to that issue of soul--sitting in that office processing data you can forget what you're looking at. But it really reinvigorates you to go to a mountaintop in the Atacama Desert in Chile, and the only thing you can see is one light on the Pan American Highway. And the Milky Way is so bright, it throws a shadow on the ground. Then you remember. The real heart of the matter is this great mystery we are trying to unravel, and seeing it really is an invigorating process. So we shouldn't just be information managers. We have a soul, and we have to maintain it.

However, data archiving will give astronomers unprecedented access to a really wide range of observations. It already has, but the fact is the contents of these data archives are only now becoming a critical mass, and links between all these different archives are really magnificent. Ultimately, we want to be able to observe the whole sky virtually. But that doesn't mean the end of observation, because we will always develop better instrumentation, with higher resolution in spacial and energy terms.

Data mining is kind of adding a top layer to this whole archival process. It is really necessary. You can't manually go and search all these databases. Right now you type in one thing at a time, you search one database at a time, and it's very inefficient. Astronomers are bad researchers. Their most salient characteristic is that they are always busy, and they won't do things that take a long time. So this data mining layer helps us interface with databases that are way too big to deal with in any kind of manageable way. We can construct a query that will decide which databases and which literature sources should be accessed and what processing should be done. That can't be done manually.

When I started thinking about this, I asked myself, "What is the next revolution?" But I don't think there is a next one, not for a little while anyway. It's really a process of evolution. There are great ideas in the data astronomy archiving business. Everything you can think of doing you can find a seed of somewhere. But we are really short on manpower to write software that works well on the Internet. The challenge is to take all these seeds, put them together, and have complete integration that will enable us to have transparent cross-archive access to data and to the data mining tools that allow us to make sense of the information that is being archived.