James E. Gass, Lee E. Brotzman, Archibald Warnock, Debbie Kovalsky
Hughes STX; &
Frank J. Giovane
NASA/HQ
Introduction
Astronomical research is being transformed by improvements in wide-area networking and the availability of low-cost computing power. These developments have resulted in remote observing, distributed access to large quantities of scientific data, and the first steps to the electronic submission of articles for publishing. The cornerstone of scientific research, the refereed literature, has not yet benefitted from these advances in information access. The result has been tremendous growth in what is being published, without improvements in the researchers' ability to locate and retrieve articles of interest.
It is now technically feasible to place much of the astronomical literature and documentation on-line, providing researchers with direct access to this information. More importantly, with the addition of modern text searching methods, astronomers will have the ability to quickly find articles about a particular research topic and examine them as they wish.
Not only do many technical details have to be addressed before the journal publishers can move in this direction, the impact on the scientific community, the financial health of the journals, and the impact on libraries must also be carefully considered. It is unlikely that publishers would be able to offer both paper and electronic versions of a journal for long.
In fact, there are two components to the problem of establishing on-line documents and literature: (1) the conversion of existing materials from printed pages to electronic files, and (2) the production of new literature in a form which can be placed on-line as published.
The STELAR Project
In March 1991, NASA and the American Astronomical Society (AAS) began hosting a series of workshops to explore the methods and potential impact of placing most of the astronomical documentation and literature on-line. These meetings identified a need for an experiment to study the technical and practical issues. In response STELAR, the STudy of Electronic Literature for Astronomical Research, was launched. This project is a joint effort of AAS, ASP (Astronomical Society of the Pacific), NASA, publishers, editors, research libraries, and astronomers. Additional support is being provided by AIP, Library of Congress, NSF, and UNC Chapel Hill.
STELAR is a pilot project managed at NASA's Astrophysics Data Facility (ADF), located at the Goddard Space Flight Center in Greenbelt, MD. Its formal goal is to explore the use of electronic means for improving access to scientific literature; using astronomical publications to evaluate distribution, search, and retrieval techniques for full text and graphics display. The project is conducting a multi-phased study. The initial phases focus on the problem of converting existing literature for on-line access. STELAR will incorporate machine-readable abstracts provided by NASA's Scientific and Technical Information (STI) program and page images of several years' worth of the Astrophysical Journal (Apj), Astrophysical Journal Supplement (ApJ Suppl), Astronomical Journal (AJ), and the Publications of the Astronomical Society of the Pacific (PASP). Recently, the publishers of Astronomy & Astrophysics (A&A) have also granted permission to use their journal for this study.
In the current phase of the study, a prototype system is under development to allow a limited number of test subjects to search these materials and view the articles of interest. The libraries at the Space Telescope Science Institute, NOAO/KPNO, NRAO/Charlottesville, and Goddard Space Flight Center will work with selected astronomers to evaluate the initial prototype expected to be available this fall.
Current Status
The STELAR prototype system uses a highly portable and fully open, multi-disciplinary document query and delivery system known as WAIS (Wide Area Information Server), which is the subject of a separate article in this newsletter.
STELAR currently provides access to machine-readable abstracts for eight leading academic journals of interest to the astronomical community (ApJ, ApJS, AJ, PASP, A&A, A&AS, Monthly Notices of the Royal Astronimical Society or MNRAS and Journal of Geophysical Research, orJGR). These abstracts have been supplied by NASA/STI from a database prepared for NASA's RECON system by an independent abstraction service. The RECON system database contains abstracts from as early as the mid-1960's. The ADF will update the set of available abstracts on a regular basis.
The completed prototype will link the abstracts to scanned bitmaps of the individual article pages. Access to the bitmaps will initially be limited to test groups at the libraries to protect the copyright concerns of the societies and the journal publishers. In addition to this controlled study, the ADF and STI are making the abstracts and several other text databases available to the astronomical community as part of NASA's commitment to its science community. These are described in the companion article. The STELAR project is seeking feedback from researchers on the usability and value of the system. This feedback will guide the refinement of successive prototypes.
STELAR agreements
At this time the STELAR project has received permission for:
Subscription to the society journals to keep the scanned bitmap images up to date (with in 1 month of receipt from the publishers) during the duration of the STELAR experiment.
Access to scanned bitmap images and abstracts of the journals (ApJ, ApJ Suppl, ApJ Letters, and AJ from AAS; PASP from ASP; and A&A) from current issues through 1986 by members of the STELAR planing group, 4-6 selected astronomical libraries (including selected individuals as test subjects via the library access), and selected individuals with personal access. The proposed libraries and the selected individuals will be listed at the time of the presentations.
Distribution of Bitmap images and abstracts to the above external sites. Thus this access would NOT limit the bitmap images to the NSSDC NDADS facilities.
In Late 1993 the STELAR project expects to request from the participating societies, after presentations to its governing boards permission for:
Access to scanned bitmap images and abstracts of the journals (ApJ, ApJ Suppl, ApJ Letters, and AJ from AAS and PASP from ASP) from 12 month old issues through 1986 by members of the astronomical community, and current issues through 1986 by then current subscribers to the particular journal. Access to the A&A journals is restricted to 1985-1990.
Distribution of Bitmap images and abstracts to the above external sites and individuals. As above the bitmap images would NOT be limited to the NSSDC NDADS facilities.
List of authorized subscribers to each journal.
In 1994 the STELAR project expects to make presentations to the participating societies governing boards as to the status and possible evolution of the pilot project. It is expected at this time that a request will be made to continue the experimental access for another year.
Future Plans
Subject to the approval of the copyright holders of the various journals, the STELAR Project plans to gradually make the scanned bitmaps of the article pages available to the astronomical community. Additional enhancements being investigated include indexing of the full text of the articles (when machine-readable versions of the published articles are available), making articles available in a mark-up language (TeX, SGML) or device-independent form, and the addition of errata and other forward references to the basic STELAR article structure.
For additional information about the STELAR Project, please contact the authors at:
stelar-info@hypatia.gsfc.nasa.gov.
Summary of overheads used STELAR Presentation
OVERHEAD #1
- Formally the Goal of STELAR is to: Explore the use of electronic means for improving access to scientific literature; using astronomical publications to evaluate distribution, search, and retrieval techniques for full text and graphics display.
OVERHEAD #2
The Players include:
National Aeronautics and Space Administration
American Astronomical Society
Astronomical Society of the Pacific
Astronomical Libraries (Goddard, NRAO, STScI, NOAO)
Research astronomers
Library of Congress
University of Chicago Press
Astronomy & Astrophysics
Royal Astronomical Society
National Science Foundation
University of North Carolina/Chapel Hill
OVERHEAD #3
STELAR Overview
The first word of STELAR is study.
STELAR works with Astronomers, Professional societies, Librarians, Engineers, and Computer Programmers.
STELAR is a dynamic project.
STELAR is not only studying the effects of the system, but also the design issues.
What political issues have to be resolved ?
STELAR is a first step toward electronically published journals, i.e., the distribution of data, textual, and graphical information over electronic networks.
The STELAR project is a laboratory bench.
OVERHEAD #4
Some of the important questions STELAR seeks to answer are listed below.
- With the ability to make an infinite number of original copies, from just one electronic copy; how are journal subscriptions to be handled and the journals to be funded ?
- How can the health of the journals be insured through any transition ?
- How are "published" papers to be protected from "changes" malicious or author's corrections?
- How will libraries "archive " the journals?
- What will the time and title coverage be?
- Will non-profit and for profit publishers both use the same system ?
- Who will create the electronic form ?
- Who will maintain the "master" archive ?
- Will the future form be required to be the same as the past form ?
- What access methods are required ?
- What are the scientist requirements in using an electronic publication ?
- What are the publisher requirements ?
- How are copyrights applied ?
- What about the scientific community that will not have electronic access?
- What is the format for the electronic form?
- How are graphics plates included ?
- How is the new form evaluated ?
- How to capture author input ?
- How to capture the graphics ?
- Relation between journal and data archive ?
- When does the journal stop and the data archive begin?
- How to make links between journal pages or pictures and the underlying data ?
- Mandate or just encourage the archival of underlying data ?
- How to stay current ?
The STELAR Project is attempting to make it possible for as many of these questions as well as many as yet un-guessed at others to be answered. Most of the technical problems are solved or are solvable. The difficult problems with out any clear answers are those that deal with the "political" issues (funding, subscriptions, copyright etc.).
OVERHEAD #5
The Pilot Project
Objectives:
- Study the feasibility of routine electronic distribution of major Astronomical journals.
- Gain Practical experience in electronic journal distribution.
- Provide a broad base of researchers access to a test journal set so that their reactions can be used to guide future efforts.
OVERHEAD #6
Project Description:
A Two year evaluation period is planned starting in late 1992.
Two to five years of back issues will be used. Initially as full page images and ascii abstracts (for "indexed" text searches).
We will evaluate "OCR" possibilities.
Access will be via the Astrophysics Data System (same as the archived data).
OVERHEAD #7
STELAR Architecture
- Currently based on WAIS (Wide Area Information Server) Technology.
- Full-text indexing of over 60,000 abstracts from NASA RECON system for eight journals (publicly available).
- Scanned bitmaps of ApJ, ApJ Supp, AJ, and PASP from 1986-1990.
- Text index of abstracts is stored on a (Unix-based) SGI 4D workstation.
- Bitmaps are stored on NDADS (VMS-based) optical jukebox.
- STELAR has a single point-of-entry through the NDADS VAX cluster, which in turn runs a forwarding server to relay queries to servers on the other host machines and passes replies back.
- Article = collection of associated data objects: text abstract, scanned pages, markup text (SGML, TeX, DVI), references, errata.
- Current investigations include indexing/retrieval engines, bitmap viewers, user interfaces.
- The data volume for scanned pages is too large for deliverable media, so a centralized archive is required.
- Electronic networks are widely available for the transfer of data.
OVERHEAD #8
At this time the STELAR project is requesting from the participating societies permission for:
Access to scanned bitmap images and abstracts of the society journals (ApJ, ApJ Suppl, ApJ letters, and AJ from AAS and PASP from ASP) from current issues through 1986 by members of the STELAR planing group. This access would be limited to the NSSDC NDADS facilities.
Subscription to the society journals to keep the scanned bitmap images up to date (within 1 month of receipt from the publishers) during the duration of the STELAR experiment.
OVERHEAD #9
In June 1992 the STELAR project expects to request from the participating societies after presentations to its governing boards permission for:
Access to scanned bitmap images and abstracts of the society journals (ApJ, ApJ Suppl, ApJ letters, and AJ from AAS and PASP from ASP) from current issues through 1986 by members of the STELAR planing group, 4-6 selected astronomical libraries (including selected individuals as test subjects via the library access), and selected individuals with personal access. The proposed libraries and the selected individuals will be listed at the time of the presentations.
Distribution of Bitmap images and abstracts to the above external sites. Thus this access would NOT limit the bitmap images to the NSSDC NDADS facilities.
NOTE: This access and distribution does not place any restrictions as to which of the included journals would be available to the limited test group.
OVERHEAD #10
In June 1993 the STELAR project expects to request from the participating societies again after presentations to its governing boards permission for:
Access to scanned bitmap images and abstracts of the society journals (ApJ, ApJ Suppl, ApJ letters, and AJ from AAS and PASP from ASP) from 12 month old issues through 1986 by members of the astronomical community, and current issues through 1986 by then current subscribers to the particular journal
Distribution of Bitmap images and abstracts to the above external sites and individuals. As above the bitmap images would NOT be limited to the NSSDC NDADS facilities.
List of authorized subscribers to each journal.
NOTE: STELAR planing group and the above selected astronomical libraries will continue to need unlimited access to the database for testing purposes.
In June 1994 the STELAR project expects to make presentations to the participating societies governing boards as to the status and possible evolution of the pilot project. It is expected at this time that a request will be made to continue the experimental access for another year.
OVERHEAD #11
WAIS -- Wide Area Information Servers
What is WAIS ? WAIS is client/server search and retrieval system based on Z39.50-1988. It has been developed by Thinking Machines, Inc. Released for Public use in 1991. There are over 200 WAIS servers on network, with thousands of client sites active. Clients and servers are available for many platform, and WAIS can be layered on top of current access mechanisms.
OVERHEAD #12
WAIS System Highlights
- Client/Server system based on NISO Z39.50-1988. Evolving towards Z39.50-1992.
- The client requests information from the publisher.
- The server provides the most appropriate storage and search mechanisms for the data and provides the data in as many forms as appropriate signaling this information to the client.
- The client can ignore it if so desired.
- The client provides the appropriate visual interface.
- The protocol (WAIS) is the glue that binds these "ideals" together.
- WAIS is an "Open Protocol."
- There are no copyright restrictions.
- Non-proprietary, which means it can be supported by any vendor.
- It can also be ignored by any (and all) vendors.
- WAIS has been ported to many platforms.
- Data organization is independent of the protocol.
- The client makes no assumptions about the organization of the data managed by the server.
- The client requires no knowledge of the naming or storage practices of the data publishers.
- Data format and presentation are independent of the protocol.
- Format information is passed as plain text strings.
- A single data object can exist in more than one format. The client uses the format information provided by the server to determine the best presentation mechanism for the data.
- The server doesn't know nor care how the client deals with the data. Search techniques are independent of the data format. It is not necessary to search the actual data that is returned.
- As an example, textual and relational databases can be searched using very different techniques. The data presented as a result of this search can be graphical.
- WAIS searches can be based on the notion of "relevance". "I know it when I see it !"
- Queries can be refined by marking previously returned data as "relevant". The search engine at the server end uses this information as appropriate to improve the quality of the search.
- " Post-Boolean Searching'' - Conceptually simpler to most users than refining Boolean queries and defining result-sets.
OVERHEAD #13
WAIS Summary
Non-proprietary.
Not discipline specific.
Client/server based. Clients exist for UNIX, VMS, MS-DOS, and Macintosh platforms. Servers exist for VMS, MS-DOS, and UNIX.
Architecture provides gateways to many other information systems.
Includes server, clients, full-text indexing and retrieval.
NISO Z39.50-1988 standard communication protocol allows hardware independence between server and clients.
Modular code allows easy replacement of individual components (i.e., word stems, synonyms, boolean searches, factor spaces, SQL, and spatial indexes).
Originally written by Thinking Machines, Inc. and distributed free of charge. Full source code (in C) is available via anonymous FTP from {\em Think.com}.
Capable of handling a variety of data types as documents (ASCII text, formatted text, graphics, images, and other scientific data).
Communication protocol and WAIS implementation allow for fee structures and validating users.
OVERHEAD #14
Contacting STELAR
For information, send electronic mail to: stelar-info@hypatia.gsfc.nasa.gov
WAIS sources for STELAR products are available by anonymous FTP from:
hypatia.gsfc.nasa.gov, the the directory wais-sources.
WAIS clients for various systems are available by anonymous FTP.
Basic distribution system (includes server, indexer, X client, and character-mode client):
wais-8-b5.tar.Z from think.com
MOTIF client:
mxqwais.tar from ftp.eos.ncsu.edu
MS-DOS client:
pcwais.zip from ftp.oit.unc.edu
Macintosh client:
WAISstation from hypatia.gsfc.nasa.gov
OpenLook client:
openlook.tar from sunsite.oit.unc.edu
Microsoft windows client:
winwais.zip from ftp.oit.unc.edu