3. THE ONLINE BOOKS COLLECTION
The Online Books Evaluation Project began formal activity in January
1995. However, discussions with publishers about cooperating in
such an effort by providing books and collaborating in research
began in 1993, if not earlier. As noted in the Project's Analytical
Principles and Design document, "The Online Books Evaluation
Project is a component of the developing digital library at Columbia
University. As part of its digital library effort, the Columbia
University Libraries is acquiring a variety of reference and monographic
books in electronic format to be included on the campus network;
in most cases, those books will be available only to members of
the Columbia community. Some of the books are being purchased;
others are being provided on a pilot project basis by publishers
who are seeking to understand how the academic community will
use online books if they become more widely available in the future."
Columbia University Libraries provides the Columbia community
with access to a substantial and growing set of full text (journals
and reference materials), image, data and bibliographic online
resources in addition to those that we are studying in the Online
Books Evaluation Project. Some have been acquired or developed
at Columbia and are maintained on servers here, e.g., art images,
working papers. Others are maintained by publishers with access
licensed to Columbia, e.g., Encyclopedia Britannica and
Gale's Contemporary Authors and Encyclopedia of Associations.
Yet others are maintained elsewhere and access is free to
all, with Columbia subject specialists providing links on their
subject home pages.
3.1 Design Of the Online Books Collection
When this Project was proposed, the World Wide Web was an emerging
technology, and we still expected to develop specialized browsers
for using the books in SGML format, just as other online projects
were doing at the time. However, by the time the Project was funded
and ready to mount books online, it was clear that the Web would
soon be the best delivery system for maximizing availability of
the books to the scholarly community. Web browsers had, and still
have, annoying limitations, but we felt that they would become
better over time and provide optimum flexibility to users.
Many other online projects are providing users with materials
in PDF, scanned, or bitmapped format. These are effective formats
for journal articles, which are finely indexed through existing
sources and which are short and easily printed. However, the greatest
potential added value from online books, compared to their print
counterparts, comes with truly digital books. Only in this type
of format, for example, can users do full search for terms or
cut and paste parts of the book to another document. In addition,
only this online format allows the development of truly interactive
books that take advantage of the current and anticipated capabilities
of Web technology, such as the inclusion of sound and video, data
files and software for manipulating data, and links to other online
resources. Perhaps only such enhanced online books will offer
sufficient advantages over traditional print format that scholars
will be willing to substitute them for the print format for any
or all of their modes of use and for any or all classes of books.
We have devoted considerable time and effort over the past two
years to dealing with technical and design issues for the books.
The design has evolved over this period as Web technology has
advanced and as the Project team and users have reacted to early
decisions. We will continue to work with users over the months
ahead in order to provide basic design features that they endorse.
We hope to begin to introduce more interactive features as appropriate
to various books and to measure user response to them.
We look forward to comparing the results of our evaluations with
those of online projects using other formats to explore whether
format does make a significant difference in user attitudes and
behavior.
3.2 Development of the Online Books Collection
3.2.1 Purchased Texts
Purchased texts included in the Online Books collection are The
Oxford English Dictionary and classical texts in social thought
from InteLex's Past Masters CD-ROM. Columbia converted
the Past Masters texts from SGML to HTML for Web access.
Ten Past Masters texts were made available to the Columbia
community online in mid-1995, although with little publicity.
Another 44 went online in July 1996, with publicity for the collection
beginning early in the Fall. We intend to convert several other
purchased CD-ROM products, largely literary texts, and include
them in the collection in the near future. The Columbia digital
library provides access to many other full text works to the scholarly
community, but the ones described here have been the focus of
our analysis, in large part because they are mounted on local
servers from which detailed usage information can be gathered.
3.2.2 Collaborating Publishers And Their Books
Publishers participating in the Project by providing electronic
files for their books and collaborating in the research effort
are Columbia University Press, Garland Publishing, Oxford University
Press, and Simon and Schuster Higher Education. All but Garland
have been involved in this Project since its inception; Garland
jointed the effort in 1996. The books provided by each publisher
and the timing of the introduction of those books to the online
collection are as follows:
Columbia University Press: Two reference works, The
Columbia Granger's Index to Poetry and The Concise Columbia
Electronic Encyclopedia, have been available since the outset.
Columbia will provide three more reference books - The Columbia
Electronic Encyclopedia, The Columbia Guide to Standard American
Usage, and The Columbia World of Quotations - in 1997.
Monographs, anthologies and textbooks are being provided in the
fields of social work, literary criticism, political science,
and earth and environmental science. The Project includes only
books for which the Press can obtain both electronic files and
author permissions. Sixteen such books are now in the collection,
seven of them in the field of social work. The first of these
books were made available online in September 1996. At this point,
it appears that 27 more CUP books published in these fields in
the past three years will be available to our collection; they
will be added in the next few months.
Garland Publishing: Three Garland reference works, The
Chaucer Name Dictionary, Native American Women: A Biographical
Dictionary, and African American Women: A Biographical
Dictionary, were added to the collection from December 1996
through February 1997. We selected these books because Columbia
has sizable user groups in Medieval and Women's Studies and because
they were available in electronic format and amenable to conversion
to HTML. Garland is reviewing its collection and its resource
availability to determine whether it can provide any other books
to the Project.
Oxford University Press: In 1995, Oxford agreed to provide
its monographs in the fields of literary criticism, neuroscience,
and philosophy from the publication lists for 1995 through 1997.
Oxford reports that a substantial share of titles in these fields
have low sales and, hence, represent the endangered scholarly
monograph. As of early 1996, Oxford had provided electronic
files for 19 monographs in the fields of literary criticism and
philosophy. Oxford required the Project to provide an online ordering
mechanism concurrent with the availability of its books; that
ordering system was ready for use in October. Sixteen Oxford books
were online by year end 1996; 17 are now online. In June 1997,
Oxford provided nine more books in literary criticism and philosophy.
These should be online by fall 1997.
Simon and Schuster Higher Education: By late 1994, Simon
and Schuster had agreed to contribute high use titles, defined
as books on reserve for Columbia courses that had relatively heavy
circulation. Simon and Schuster provided electronic files for
nine such books, most of them in business-related subjects, in
Fall 1995. As of June 1997, two of the books were online and the
others were expected to be ready before the new academic year.
3.2.3 The Challenge of Obtaining Electronic Files for Books from Publishers
The Project's 1997 Annual Report discusses publishers' difficulty
in providing electronic files for books that are amenable to conversion
to the HTML format being used in the Project. Those problems include:
- Neither the publishers nor their printers have ready access
to the final electronic files, e.g., typesetter's tapes, for books
unless specific provision has been made for systematic retention
and archiving of such files. Most publishers have not been able
routinely to provide the Project with copies of the electronic
files for books published since the early to mid-1990s.
- The electronic files for some books contain so many special
characters and graphics that conversion to HTML format is infeasible.
- Publishers never possessed electronic files for books that
authors supplied as camera-ready copy.
- After publication, seeking permissions from multiple copyright
owners involved in a book, such as a collection of essays, would
be too onerous.
- Interviews with authors reveal that those who refuse to include
their books in the Project do so for various reasons. Some fear
the ease of downloading and printing Web materials will tempt
users not to respect copyright and that scholars outside of the
Columbia community will receive copies of their works, thus reducing
their royalty income. Others oppose the concept of online books
and do not want to encourage them.
3.3 User Access to the Collection3.3.1 Formats and Functionalities Over Time
As of June 1997, the Columbia community had access to a total
of 96 online texts that are part of the Online Books Project.
The Libraries have each book in print form, circulating from the
regular collection or Reserves, or non-circulating in Reference,
as well as in one or more online formats. Appendix 1 summarizes
the print access modes for all the modern books in the collection.
The various online modes have differing functionalities beyond
browsing or reading on screen. Appendix 2 summarizes the schedule
of mounting for the online books and their functionalities.
3.3.2 Who Can Use The Collection
By agreement with the publishers, we restrict access to the Project's
online books to members of the Columbia University community,
i.e., faculty, staff and students of Columbia and affiliated institutions
who use the books in the Libraries and from anywhere via network
access. Until March 1997, books were also available, only on Libraries
terminals, to alumni and others with reading privileges. This
policy both protects the publishers' intellectual property and
provides the Project with the ability to gather richer data on
usage.
Through Winter 1997: We employed two methods through Winter
1997 to maintain this control of access.
- To use books on CNet or at the Unix prompt, a scholar must
sign in with her Columbia email address and password. This remains
the case for this set of books. The exception to this rule is
the public CNet terminals.
- To access books on CWeb, a scholar was required to use a computer
with an address that the server recognizes as Columbia affiliated.
Members of the Columbia community who connected to CWeb from a
service like AOL were not able to use the collection. On the other
hand, guests using X-terminals on the Columbia campus could reach
those books.
In both cases, the data the server logs did not include information
on the user. We initially planned to develop a directory of Columbia
IP addresses by location and to link it to the server data in
order to make general discrimination between dormitories and various
other campus buildings. However, we decided that developing and
maintaining this database would be too costly, given our near
term plans for individual user authentication. Instead our analyses
for the period before mid-March 1997 result from deduction based
on the host name of the user computer.
As of March 15, 1997: For books in this Project and other
materials with user restrictions, Columbia has developed and deployed
a more robust system for Web authentication and access. This system
permits a member of the Columbia community to use materials even
if she connects through an Internet service provider like AOL.
It requires each user to sign in when he wants to use one or more
items in the collection. During a session, he needs to sign in
only once. Ultimately, data records will be session based,
that is linking all the activities by a user in a single session
within its umbrella into a single record and providing information
on the identity of that user.
Future: Given that Web browser/server interaction are stateless,
i.e., each transaction is essentially independent of previous
ones and the server retains no memory of a user's previous actions,
translating the ability to control access to resources to the
Web has been a challenge. This local authorization system manages
access with information from the central authentication database.
This session-based system supports more extensive analysis of
usage patterns. In particular, usage statistics can be tied to
user characteristics. The management statistics system that will
link access to a book with information on the user's affiliations
and status should be fully ready this summer. In particular, transaction
statistics may be aggregated for individuals, based on their initial
'login', providing more continuous, 'session based' tracking.
To protect privacy, the personal key will be retained long enough
to look up the required demographic information and will then
be retained in encrypted form, to serve as an anonymous unique
identification code.
3.3.3 Access Paths to CWeb Books
Users have six main alternatives for learning of the CWeb books:
(1) word of mouth; : (2) the online catalog; (3) the Libraries'
Digital Collections Web page; (4) the Project's home page; (5)
Web pages for specialized library collections; and (6) publicity
flyers, email messages, and formal and informal presentations
by librarians and Project staff directed at the faculty and students
most likely to be interested in the various online book collections.
In CLIO, the online catalog, a record for each online book lists
its Web address (URL). In the near future when CLIO moves to the
Web, a scholar will be able to click on that URL in the CLIO record
and proceed directly to the book. During the period covered by
this report, however, in order to move from that CLIO record to
the online book, the scholar must either copy or write out the
URL, switch to the Web, and input the URL into the Location box.
The first CWeb access point for the monographic (non-reference)
books is a set of links to the Web pages with the subject categories
into which we have grouped the books and another link to an alphabetical
listing by author of all the texts in the collection. These links
are on the Libraries' Digital Collections home page at http://www.columbia.edu/cu/libraries/digital/.
(See Exhibit 1.)
Exhibit 1. Columbia University Digital Library Collections
http://www.columbia.edu/cu/libraries/digital/
A scholar starting at the Columbia University
Web home page must take two steps to reach that list (to Libraries,
to Digital Collections).
During Fall 1996 and Winter 1997, we sought ways to focus user
attention on the collection, in the hopes of achieving more use
and feedback. At the end of 1996, we launched a new Project home
page (http://www.columbia.edu/dlc/olb/); see Exhibit 2. This page
has a brief description of the evaluation effort, a link to the
page that includes copies of the Project documents, a button for
comments about the design of the online books system, a button
for sending email to the Project Coordinator, and a capability
to search by keyword throughout the books in the collection. In
addition, it has links to groups of books in the collection: Historical
Social Thought, Current Humanities, Current Social Science, Current
Science, and Current Reference. We have included books
in more than one of those groupings as appropriate; for example,
each Garland reference book is in Current Reference and
another subject category.
Once the scholar moves to one of the topical collection pages,
he sees the books arrayed by primary subject category; pictures
of the books' dust jackets accompany some of the titles. (Exhibit
3 has part of the Current Social Science page; http://www.columbia.edu/cu/libraries/digital/texts/social_sciences.html.)
He has two options at this point: (1) clicking on one of those
titles and going directly to the Table of Contents for that books
or (2) doing a keyword search on that whole topical collection.
Besides these core locations, the online books on CWeb are typically
linked to several pages where potential users might find them.
Most of these are subject listings that collection bibliographers
maintain, e.g., Online Books on the Social Work Library
home page links to the Current Social Science page, or
the Medieval Studies home page listing of Internet resources links
to The Chaucer Name Dictionary.
A scholar wishing use one of the online collections repeatedly
could bookmark the relevant subject matter page. He would then
need only to select that bookmark from within his browser in order
to reach that page.
The five Web reference books in the Online Books collection are
also included in a separate set of pages maintained by the Reference
Department. The scholar must traverse several levels before reaching
any of the resources using this route. Finally, some of these
resources are linked to Web pages created by various other Columbia groups.
Exhibit 2. Online Books Evaluation Project: Titles Included - Home Page
http://www.columbia.edu/dlc/olb/
Exhibit 3. Online Books Evaluation Project: Titles Included - Current Social Sciences
http://www.columbia.edu/cu/libraries/digital/texts/social_sciences.html
3.3.4 Publicity Campaign
Our publicity campaign for the online books collection has had
several facets. The key component is a set of flyers, each focusing
on one category of books. These flyers have major headlines followed
by a listing of the online books available in that category, a
brief explanation of the Online Books Evaluation Project, and
then directions on how to reach and use the collection. These
flyers have been sent to all the faculty members in each of the
related departments and to graduate students whom we have identified
as teaching in those departments. In some cases in which faculty
members are using one of the titles in a course, we have provided
copies of the flyer to each student. In some cases, we have gone
to those classes to discuss the Project and how to use the books.
We have also made presentations to faculty groups about the Project.
More such presentations will be made in future semesters.
At this point, we are seeking a viable balance in our publicity.
Over-promoting a collection that contains only a few books may
create disgruntled potential users who are likely to be skeptical
about the collection in the future. On
the other hand, publicity is needed in order to create the awareness
and sampling that are necessary precedents to regular use of online
materials. Marketing research shows that publicity is most successful
in cases in which a target group is generally seeking the product
being offered. In our case that is scholars are likely to focus
on publicity when they need to use one or more of the available
books, e.g., Social Work students who are using one of the titles
in a course or undergraduate students who have been told to use
The OED for an assignment.
[Table of Contents] [Next Page]