Association of Research Libraries (ARLĀ®)

http://www.arl.org/resources/pubs/scat/3.shtml

Publications, Reports, Presentations

Scholarly Communication and Technology

Online Books at Columbia

3. THE ONLINE BOOKS COLLECTION

The Online Books Evaluation Project began formal activity in January 1995. However, discussions with publishers about cooperating in such an effort by providing books and collaborating in research began in 1993, if not earlier. As noted in the Project's Analytical Principles and Design document, "The Online Books Evaluation Project is a component of the developing digital library at Columbia University. As part of its digital library effort, the Columbia University Libraries is acquiring a variety of reference and monographic books in electronic format to be included on the campus network; in most cases, those books will be available only to members of the Columbia community. Some of the books are being purchased; others are being provided on a pilot project basis by publishers who are seeking to understand how the academic community will use online books if they become more widely available in the future."

Columbia University Libraries provides the Columbia community with access to a substantial and growing set of full text (journals and reference materials), image, data and bibliographic online resources in addition to those that we are studying in the Online Books Evaluation Project. Some have been acquired or developed at Columbia and are maintained on servers here, e.g., art images, working papers. Others are maintained by publishers with access licensed to Columbia, e.g., Encyclopedia Britannica and Gale's Contemporary Authors and Encyclopedia of Associations. Yet others are maintained elsewhere and access is free to all, with Columbia subject specialists providing links on their subject home pages.


3.1 Design Of the Online Books Collection

When this Project was proposed, the World Wide Web was an emerging technology, and we still expected to develop specialized browsers for using the books in SGML format, just as other online projects were doing at the time. However, by the time the Project was funded and ready to mount books online, it was clear that the Web would soon be the best delivery system for maximizing availability of the books to the scholarly community. Web browsers had, and still have, annoying limitations, but we felt that they would become better over time and provide optimum flexibility to users.

Many other online projects are providing users with materials in PDF, scanned, or bitmapped format. These are effective formats for journal articles, which are finely indexed through existing sources and which are short and easily printed. However, the greatest potential added value from online books, compared to their print counterparts, comes with truly digital books. Only in this type of format, for example, can users do full search for terms or cut and paste parts of the book to another document. In addition, only this online format allows the development of truly interactive books that take advantage of the current and anticipated capabilities of Web technology, such as the inclusion of sound and video, data files and software for manipulating data, and links to other online resources. Perhaps only such enhanced online books will offer sufficient advantages over traditional print format that scholars will be willing to substitute them for the print format for any or all of their modes of use and for any or all classes of books.

We have devoted considerable time and effort over the past two years to dealing with technical and design issues for the books. The design has evolved over this period as Web technology has advanced and as the Project team and users have reacted to early decisions. We will continue to work with users over the months ahead in order to provide basic design features that they endorse. We hope to begin to introduce more interactive features as appropriate to various books and to measure user response to them.

We look forward to comparing the results of our evaluations with those of online projects using other formats to explore whether format does make a significant difference in user attitudes and behavior.


3.2 Development of the Online Books Collection

3.2.1 Purchased Texts

Purchased texts included in the Online Books collection are The Oxford English Dictionary and classical texts in social thought from InteLex's Past Masters CD-ROM. Columbia converted the Past Masters texts from SGML to HTML for Web access. Ten Past Masters texts were made available to the Columbia community online in mid-1995, although with little publicity. Another 44 went online in July 1996, with publicity for the collection beginning early in the Fall. We intend to convert several other purchased CD-ROM products, largely literary texts, and include them in the collection in the near future. The Columbia digital library provides access to many other full text works to the scholarly community, but the ones described here have been the focus of our analysis, in large part because they are mounted on local servers from which detailed usage information can be gathered.


3.2.2 Collaborating Publishers And Their Books

Publishers participating in the Project by providing electronic files for their books and collaborating in the research effort are Columbia University Press, Garland Publishing, Oxford University Press, and Simon and Schuster Higher Education. All but Garland have been involved in this Project since its inception; Garland jointed the effort in 1996. The books provided by each publisher and the timing of the introduction of those books to the online collection are as follows:

Columbia University Press: Two reference works, The Columbia Granger's Index to Poetry and The Concise Columbia Electronic Encyclopedia, have been available since the outset. Columbia will provide three more reference books - The Columbia Electronic Encyclopedia, The Columbia Guide to Standard American Usage, and The Columbia World of Quotations - in 1997. Monographs, anthologies and textbooks are being provided in the fields of social work, literary criticism, political science, and earth and environmental science. The Project includes only books for which the Press can obtain both electronic files and author permissions. Sixteen such books are now in the collection, seven of them in the field of social work. The first of these books were made available online in September 1996. At this point, it appears that 27 more CUP books published in these fields in the past three years will be available to our collection; they will be added in the next few months.

Garland Publishing: Three Garland reference works, The Chaucer Name Dictionary, Native American Women: A Biographical Dictionary, and African American Women: A Biographical Dictionary, were added to the collection from December 1996 through February 1997. We selected these books because Columbia has sizable user groups in Medieval and Women's Studies and because they were available in electronic format and amenable to conversion to HTML. Garland is reviewing its collection and its resource availability to determine whether it can provide any other books to the Project.

Oxford University Press: In 1995, Oxford agreed to provide its monographs in the fields of literary criticism, neuroscience, and philosophy from the publication lists for 1995 through 1997. Oxford reports that a substantial share of titles in these fields have low sales and, hence, represent the endangered scholarly monograph. As of early 1996, Oxford had provided electronic files for 19 monographs in the fields of literary criticism and philosophy. Oxford required the Project to provide an online ordering mechanism concurrent with the availability of its books; that ordering system was ready for use in October. Sixteen Oxford books were online by year end 1996; 17 are now online. In June 1997, Oxford provided nine more books in literary criticism and philosophy. These should be online by fall 1997.

Simon and Schuster Higher Education: By late 1994, Simon and Schuster had agreed to contribute high use titles, defined as books on reserve for Columbia courses that had relatively heavy circulation. Simon and Schuster provided electronic files for nine such books, most of them in business-related subjects, in Fall 1995. As of June 1997, two of the books were online and the others were expected to be ready before the new academic year.


3.2.3 The Challenge of Obtaining Electronic Files for Books from Publishers

The Project's 1997 Annual Report discusses publishers' difficulty in providing electronic files for books that are amenable to conversion to the HTML format being used in the Project. Those problems include:


3.3 User Access to the Collection3.3.1 Formats and Functionalities Over Time

As of June 1997, the Columbia community had access to a total of 96 online texts that are part of the Online Books Project. The Libraries have each book in print form, circulating from the regular collection or Reserves, or non-circulating in Reference, as well as in one or more online formats. Appendix 1 summarizes the print access modes for all the modern books in the collection. The various online modes have differing functionalities beyond browsing or reading on screen. Appendix 2 summarizes the schedule of mounting for the online books and their functionalities.


3.3.2 Who Can Use The Collection

By agreement with the publishers, we restrict access to the Project's online books to members of the Columbia University community, i.e., faculty, staff and students of Columbia and affiliated institutions who use the books in the Libraries and from anywhere via network access. Until March 1997, books were also available, only on Libraries terminals, to alumni and others with reading privileges. This policy both protects the publishers' intellectual property and provides the Project with the ability to gather richer data on usage.

Through Winter 1997: We employed two methods through Winter 1997 to maintain this control of access.

In both cases, the data the server logs did not include information on the user. We initially planned to develop a directory of Columbia IP addresses by location and to link it to the server data in order to make general discrimination between dormitories and various other campus buildings. However, we decided that developing and maintaining this database would be too costly, given our near term plans for individual user authentication. Instead our analyses for the period before mid-March 1997 result from deduction based on the host name of the user computer.

As of March 15, 1997: For books in this Project and other materials with user restrictions, Columbia has developed and deployed a more robust system for Web authentication and access. This system permits a member of the Columbia community to use materials even if she connects through an Internet service provider like AOL. It requires each user to sign in when he wants to use one or more items in the collection. During a session, he needs to sign in only once. Ultimately, data records will be session based, that is linking all the activities by a user in a single session within its umbrella into a single record and providing information on the identity of that user.

Future: Given that Web browser/server interaction are stateless, i.e., each transaction is essentially independent of previous ones and the server retains no memory of a user's previous actions, translating the ability to control access to resources to the Web has been a challenge. This local authorization system manages access with information from the central authentication database. This session-based system supports more extensive analysis of usage patterns. In particular, usage statistics can be tied to user characteristics. The management statistics system that will link access to a book with information on the user's affiliations and status should be fully ready this summer. In particular, transaction statistics may be aggregated for individuals, based on their initial 'login', providing more continuous, 'session based' tracking. To protect privacy, the personal key will be retained long enough to look up the required demographic information and will then be retained in encrypted form, to serve as an anonymous unique identification code.


3.3.3 Access Paths to CWeb Books

Users have six main alternatives for learning of the CWeb books: (1) word of mouth; : (2) the online catalog; (3) the Libraries' Digital Collections Web page; (4) the Project's home page; (5) Web pages for specialized library collections; and (6) publicity flyers, email messages, and formal and informal presentations by librarians and Project staff directed at the faculty and students most likely to be interested in the various online book collections.

In CLIO, the online catalog, a record for each online book lists its Web address (URL). In the near future when CLIO moves to the Web, a scholar will be able to click on that URL in the CLIO record and proceed directly to the book. During the period covered by this report, however, in order to move from that CLIO record to the online book, the scholar must either copy or write out the URL, switch to the Web, and input the URL into the Location box.

The first CWeb access point for the monographic (non-reference) books is a set of links to the Web pages with the subject categories into which we have grouped the books and another link to an alphabetical listing by author of all the texts in the collection. These links are on the Libraries' Digital Collections home page at http://www.columbia.edu/cu/libraries/digital/. (See Exhibit 1.)

Exhibit 1. Columbia University Digital Library Collections

http://www.columbia.edu/cu/libraries/digital/

A scholar starting at the Columbia University Web home page must take two steps to reach that list (to Libraries, to Digital Collections).

During Fall 1996 and Winter 1997, we sought ways to focus user attention on the collection, in the hopes of achieving more use and feedback. At the end of 1996, we launched a new Project home page (http://www.columbia.edu/dlc/olb/); see Exhibit 2. This page has a brief description of the evaluation effort, a link to the page that includes copies of the Project documents, a button for comments about the design of the online books system, a button for sending email to the Project Coordinator, and a capability to search by keyword throughout the books in the collection. In addition, it has links to groups of books in the collection: Historical Social Thought, Current Humanities, Current Social Science, Current Science, and Current Reference. We have included books in more than one of those groupings as appropriate; for example, each Garland reference book is in Current Reference and another subject category.

Once the scholar moves to one of the topical collection pages, he sees the books arrayed by primary subject category; pictures of the books' dust jackets accompany some of the titles. (Exhibit 3 has part of the Current Social Science page; http://www.columbia.edu/cu/libraries/digital/texts/social_sciences.html.) He has two options at this point: (1) clicking on one of those titles and going directly to the Table of Contents for that books or (2) doing a keyword search on that whole topical collection.

Besides these core locations, the online books on CWeb are typically linked to several pages where potential users might find them. Most of these are subject listings that collection bibliographers maintain, e.g., Online Books on the Social Work Library home page links to the Current Social Science page, or the Medieval Studies home page listing of Internet resources links to The Chaucer Name Dictionary.

A scholar wishing use one of the online collections repeatedly could bookmark the relevant subject matter page. He would then need only to select that bookmark from within his browser in order to reach that page.

The five Web reference books in the Online Books collection are also included in a separate set of pages maintained by the Reference Department. The scholar must traverse several levels before reaching any of the resources using this route. Finally, some of these resources are linked to Web pages created by various other Columbia groups.

Exhibit 2. Online Books Evaluation Project: Titles Included - Home Page

http://www.columbia.edu/dlc/olb/

Exhibit 3. Online Books Evaluation Project: Titles Included - Current Social Sciences

http://www.columbia.edu/cu/libraries/digital/texts/social_sciences.html

3.3.4 Publicity Campaign

Our publicity campaign for the online books collection has had several facets. The key component is a set of flyers, each focusing on one category of books. These flyers have major headlines followed by a listing of the online books available in that category, a brief explanation of the Online Books Evaluation Project, and then directions on how to reach and use the collection. These flyers have been sent to all the faculty members in each of the related departments and to graduate students whom we have identified as teaching in those departments. In some cases in which faculty members are using one of the titles in a course, we have provided copies of the flyer to each student. In some cases, we have gone to those classes to discuss the Project and how to use the books. We have also made presentations to faculty groups about the Project. More such presentations will be made in future semesters.

At this point, we are seeking a viable balance in our publicity. Over-promoting a collection that contains only a few books may create disgruntled potential users who are likely to be skeptical about the collection in the future. On the other hand, publicity is needed in order to create the awareness and sampling that are necessary precedents to regular use of online materials. Marketing research shows that publicity is most successful in cases in which a target group is generally seeking the product being offered. In our case that is scholars are likely to focus on publicity when they need to use one or more of the available books, e.g., Social Work students who are using one of the titles in a course or undergraduate students who have been told to use The OED for an assignment.



[Table of Contents] [Next Page]