Associate Law Librarian Columbia University Law Library
Project JANUS is a five-year prototype digital library which utilizes the power of a massively parallel supercomputer to provide users with access to texts, images, sound and video from remote and local workstations, through advanced, user-friendly search and retrieval software. Project JANUS began in 1990 in response to a request by the Columbia Board of Trustees for the Law Library to evaluate alternative modes of library access which utilized new technologies in lieu of physical expansion of library space. Research by Law Librarian James Hoover and then Director of Computer Systems and Research Willem Scholten led them to the ideas of coupling massively parallel supercomputing, state of the art imaging, WAIS (Wide Area Information Servers) and free text searching to build a "virtual library" -- the library of the future.
In November of 1992, a Connection Machine 2 supercomputer, on loan from Thinking Machines, was installed in the Columbia Law Library for Project JANUS, making Columbia Law Library the first library to install a supercomputer.
JANUS allows users to search for words, phrases or whole paragraphs in multi-gigabyte databases. Integration of new imaging technology offers a valuable tool for archival preservation, and the powerful search engine offers users full access to text contained in images. As the JANUS project is developed users will be able to have access to tens of thousands of books, both archival and current copyrighted editions. In addition, JANUS is a means of preservation and enhanced access to significant archival collections such as the Perlin Papers (the Rosenberg/Sobell FBI Surveillance Archive) and the Nuremburg Trial Papers.
Columbia is partnered with Future InfoSystems, Inc., a new research and development company started by Willem Scholten, the former Director of Computer Systems and Research for the Law School, to continue work on expanding and developing Project JANUS. In the future, the JANUS digital library will offer access for thousands of concurrent users, searching terabytes of data, using both Boolean and natural language searching, and retrieving sound, image and full motion video.
Future Info Systems, Inc. grew out of a collaboration between Thinking Machines Corporation, of Cambridge, MA, and Columbia Law School, in New York City, to develop a digital library utilizing massively parallel supercomputer power. The digital library, entitled Project JANUS, incorporates image, sound and full text retrieval. A prototype of the system is currently running on a locally installed Thinking Machines CM-2 supercomputer in the Columbia Law Library.
FIS is partnered with Thinking Machines Corporation to develop next-generation text retrieval software, which builds upon years of Thinking Machines Corp. research in text retrieval utilizing Massively Parallel Processor machines.
FIS is developing a scalable full text retrieval system to run on platforms such as the Thinking Machine Corporation Connection Machine massively parallel supercomputers, other MPP supercomputers, and which is also scalable down to single processor Unix SPARC-10 workstations. The FIS retrieval engine incorporates retrieval with both Boolean and natural language queries, with a special feature, called "best-chunk" return, which positions the document viewer at the section of the document which most closely fits the query. The engine also supports full relevance feedback.
A unique aspect of FIS's new retrieval engine is its use of imaging technology. The server provides full text searching of bit-mapped images of documents, using Optical Character Recognition technology, which offers a revolutionary means of storing and accessing large numbers of documents only available in paper format.
FIS also offers a new retrieval client featuring communications interoperability, with full Z39.50 1993 compliance. The client will also provide options for gateways to other services, full image manipulation options, Boolean and natural language support, and relevance feedback on digital images of text.
Future InfoSystems, Inc. realizes the importance of flexibility in communication across networks, and therefore is working to broaden choices for cross-system communication and data sharing. Use of the Z39.50 communications protocol assures backward compatibility with existing information servers like WAIS, as well as future systems.
Another important aspect of the FIS system is its scalability and its ability to grow as database sizes increase. Utilization of the Thinking Machine's CM-5 supercomputers assures a virtually unlimited growth potential.
The revolutionary use of pairing imaging technology with full text searching, allows a library to preserve access to the document in its original format, with censor marks and all accompanying notations, while providing a much more flexible means of access for users. In the case of the Rosenberg/Sobell Trial Archives, for which there is no finding aid to the collection, providing full text searching is an invaluable tool for researchers.
The Perlin Papers were given to Columbia Law Library in 1990, by Marshall Perlin, Law '42, the lawyer for the sons of Julius and Ethel Rosenberg, Roger and Michael Meeropol. The papers, which required years of work for Perlin to obtain, were given to Columbia to assure their continued accessibility. The collection contains approximately 250,000 pages, many of which are FBI surveillance records of the Rosenbergs and others under government investigation at the time. Many of the pages are sixth generation photocopies.
The Perlin papers represent the second JAMUS experimental imaging project, and one with the most exciting results so far. The pages are saved as digital images, using a scanner. Then a process which "recognizes" the text in the image, called "Optical Character Recognition" is performed on the pages. the database is built using both the ASCII text file created in recognition and the high-quality page image. The advantage of the JANUS system for collections like the Perlin Papers is apparent immediately. Because the system displays high quality images rather than only the text, censor marks and marginal notes are preserved. In addition, when the Perlin Papers are available fully on-line, they will by accessible by far more people than when they were solely in paper form.
JANUS will first serve scholars on Columbia's campus. Later, when it is fully operational, it will be accessible from any remote computer using a WAIS server, and via Internet it will be able to serve users nationally and internationally. Development of a large bandwidth network channel, such as proposed for the NREN and NII, would allow a large number of users to browse and work in the Columbia Law Library from any connection. JANUS is working to establish relations with publishers to allow for use of copyrighted materials in electronic form and plans to develop programs to track and verify use of licensed materials electronically. The Columbia Law Library contains the nation's third largest collection of legal materials.
Willem Scholten
Director of Computer Systems and Research
Columbia University School of Law
435 West 116th Street
New York, NY 10027
Voice: 212-854-7938
Fax: 212-854-7946
Email: willem@lawmail.law.columbia.edu