Networked Information Resource Discovery and Retrieval The Coalition for Networked Information continues to have a strong interest in accelerating the development of sophisticated navigational tools, now commonly referred to as networked information resource discovery and retrieval (NIDR). A panel at the Spring Task Force Meeting in Washington, DC highlighted several exemplary initiatives: a CNI NIDR white paper, Harvest, and Portfolio. Avra Michelson, Digital Libraries Department, MITRE Corporation, introduced a plenary panel on advances in NIDR. She is part of a team, along with Clifford Lynch, Director, Library Automation, University of California, Office of the President, Craig Summerhill, Systems Coordinator and Program Officer, Coalition for Networked Information, and Cecelia Preston, that is developing a Coalition white paper on the topic of networked information resource discovery and retrieval. CNI NIDR White Paper Clifford Lynch set the context for the panel by describing the Coalition's white paper initiative, which began in the fall of 1994 with the objectives of framing the major research problems in the NIDR area and suggesting where standards work might be fruitful. The four chapters of the paper will include: introductory material, architectural issues, content issues (metadata), and a discussion that looks beyond the current framework and discusses extensions that will be needed as software becomes more autonomous. Lynch stated that the NIDR "problem" has two components. The first is discovery, which covers a large spectrum of activities, e.g., searching, organizing, browsing, selecting among items, and ranking items. The second component, retrieval, is sometimes narrowly viewed as the act of downloading information to a workstation, but it should have the broader meaning of making use of information resources. At present, Lynch stated, NIDR is considered as a graft-on to the existing uncontrolled, independent world of Internet resources. He asked, "When will we see information spaces develop that integrate NIDR as part of their basic architectural design?" The CNI paper will examine the idea of tools defining information spaces as, for example, Gopher defines Gopherspace. Lynch identified several other issues that will be addressed in the CNI paper. First, an increased emphasis on selection and ranking of information resources in the networked environment is needed. Discovery is not simply a process of inundating the user with candidate resources. Second, the developing mix of free and for fee information resources on the network has implications for the existing and future framework of NIDR tools. Information retrieval protocols will have to become substantially richer to accommodate the needs of pricing objects. He stated that simple ftp models will become an increasing liability for the next generation of NIDR. A third basic issue to be addressed in the white paper is the current conception that humans are directly in command of the process, e.g., typing in search commands. At the same time, we all have visions of worlds that go way beyond this, worlds in which searching is facilitated by various types of software agents, and a world in which we can link disparate information resources together. It may be that beyond retrieval, the next goal of NIDR is interoperability: linking a remote collection of information organizationally with a local resource. The CNI NIDR team has been struck by the difference between the immediate goals of many tools and the future world, which is much more mediated by software. A draft of the first chapter of the NIDR white paper is available on the CNI server and the team hopes to produce a full draft by fall. The paper will be discussed with various communities and by attendees at the Fall Task Force Meeting. Harvest Michael Schwartz, Associate Professor, Department of Computer Science, University of Colorado, spoke about Harvest, an efficient, community-tailored resource discovery tool. He began his presentation with a critique of current navigational tools, e.g., Archie, Veronica, Web robots, and WAIS. He noted that none of those tools has a community or topical focus; they all have poor scaling characteristics; they use unstructured, low-quality data; and they have "hard- wired" search algorithms. The tool that Schwartz has developed, Harvest, uses an efficient, distributed gathering architecture coupled with topical and/or community focused "Brokers" (an index/search interface that accommodates many engines). Harvest addresses each of the problems inherent in other resource discovery tools in various ways. Its efficient gatherer can run at a number of sites and an administrator can configure the data that will be collected. A sub-program can do selected text extraction, e.g., search only titles, abstracts, etc. and uses much less space than a tool like WAIS but delivers high precision and recall. It includes a plug-and-play index/engine in each Broker and its architecture does not limit it to text. Sample brokers have been built with computer science technical reports, the SEC EDGAR files, and Web Homepages. It uses network-aware caching and replication for scalable access. A key feature of Harvest is its network efficiency. It has the potential to greatly alleviate the network bottlenecks that develop when particular objects or particular servers become very popular with network users. Schwartz is now beginning to work on supporting more powerful environments than the unstructured, anarchic content of much of current Internet. He is interested in integrating commercial search and retrieval engines, billing and encryption systems, content markup tools, Z39.50 and other query interfaces into Harvest. More information is available at: http://harvest.cs.colorado.edu/. Portfolio Ann Mueller, Technical Manager, Stanford University described Portfolio, an enterprise-wide information management system prototyped at Stanford in 1994 and developed jointly by librarians and information technologists. The project provides an infrastructure for the institution's distributed computing architecture. It is an example of a multi-faceted information system, including information on the institution's faculty, computing resources, library (including links to the UC's MELVYL catalog); information on the local community, and links to Internet resources throughout the world. The developers seeded the collection with 400 resources and now have 3,000 internal and external resources. Decisions on what will be included in Portfolio are made by information providers and subject specialists, who provide initial information about objects that is then augmented by library catalogers. Mueller noted that while the full potential for the use of metadata in this framework has not yet been realized, each item does have a metadata profile and the system uses WAIS for indexing. A key attribute of this initiative is that it takes disparate resources and services and treats them as a single entity, presenting them in a consistent and flexible presentation manner. The Portfolio developers are confident that they can adapt this system to the next generation of information clients and adapt to new information and delivery paradigms. The CNI NIDR white paper and other documents from the Spring 1995 Task Force Meeting are available on the Coalition's Internet server. To access the Coalition's homepage, the URL is http://www.cni.org/CNI.homepage.html. Via gopher, point your gopher client to gopher.cni.org 70. - Joan Lippincott, Assistant Executive Director ------- ARL 181 A Bimonthly Newsletter of Research Library Issues and Actions Association of Research Libraries August 1995