Association of Research Libraries (ARLĀ®)

http://www.arl.org/resources/pubs/symp2/owens.shtml

Publications, Reports, Presentations

Scholarly Publishing on the Electronic Networks

Electronic Text & Scholarly Publishers: How & Why?

Evan Owens, Information Systems Manager

Journals Division, The University of Chicago Press

When Ann Okerson asked me to speak at this symposium, she assigned me the following topics:

and she asked me to relate this to our experiences at the University of Chicago Press.

I replied, via e-mail, that I was not sure that she was asking the right person as we had just barely started to move in that direction in the last year. She countered that that was exactly why she wanted me to speak: to learn why Chicago, a major University Press, had not been aggressively pursuing electronic text until now and what we have under development.

I tell you this, so that you will recognize that I am speaking to you today not as one who has conquered the Mt. Everest of "electronic text," but rather as a representative of an organization that has just launched, with some considerable trepidation, a major expedition -- actually two expeditions to tackle that mountain, two projects that take different approaches to the problem of obtaining and manipulating electronic text. If this morning's demonstrations were the flowers of e-publishing, what I am going to talk about is more like double-digging the herbaceous border: not glamorous, but necessary.

A few months later, I discussed this Symposium with David Rodgers, and he told me, in his inimitable style, to lose the details and to talk about the big picture: where we are going and how we intend to get there. I understood that to mean a discussion about strategies for implementing technological change than the changes themselves.

Frankly, we at the Press are not entirely sure where we are going with e-journals right now. It does seem clear that to remain a player in the electronic future, we need to build on existing strengths in text processing. To that end we are busy training all of our copy editors and production staff in on-screen editing and text preparation and laying the groundwork for a general adoption of SGML. But we are doing that carefully, in a series of steps that are each expected to pay for themselves or even reduce our costs.

Thus we hope to have our cake and eat it too: to implement new technologies that we expect to need in the future in ways that will not increase our present costs. Sometimes, however, that will mean using the savings from one change to finance another.

What does it mean for a publisher to have E-Text?

In other words, what is an electronic text? This is an unscientific checklist that I use when I think about electronic projects:

There is nothing new or profound on this list. But it is important to recognize that plain ASCII is not enough. An electronic text lacking any formatting information, special characters, symbols, or art is a poor substitute for a printed text. Most of our journals use and need all of the elements on this list to convey information. One sometimes hears disparaging remarks about "glitzy" on-line delivery systems like that developed by OCLC, but I would argue the contrary, that even the OCLC project is not yet sophisticated enough to transmit the full information content of some of the paper texts that we publish.

What is its Usefulness and Potential?

Why would one want the text of a journal article in electronic form? Here is my list of things that we have considered doing with e-texts

The point in the process at which a publisher must have an electronic text and the method that the publisher uses to obtain or create it is determined by which of these is important to the publisher. One might, for example, decide that it is not necessary to have or to work with an electronic text until after conventional typesetting. In that case, one could recapture the text from the typesetter and then do further processing in an electronic form. Or one could start at the beginning and obtain electronic files directly from authors. At the University of Chicago Press, we are doing both.

How do you go about having it in a useful Form

We have looked at and are experimenting with most of the following methods:

But we are concentrating on the author and the front end of the process rather than the final stages, because we think is going to be most efficient to add the additional coding that will be desirable in an electronic product at the copy editing stage rather than by post-processing of typesetting tapes.

Experience at the University of Chicago Press

Before I describe our current projects, a few words about how why we haven't done much with electronic MSS in the past. (I'm speaking here now only for the journals division.)

The first reason was a strong preference not to be on the "bleeding edge" of technology. The not-for-profit sector cannot afford the high cost of being first with a new technology. (Of course, if you can use someone else's money . . . that's a different story.) We have preferred to be close followers rather than leaders where new technology is concerned. Unless an organization has money to burn, it's better to wait, watch others stumble, learn from other people's mistakes and successes, and then be ready to jump in at the right moment.

To us it appears that the moment is now, and we have acted accordingly: we have added staff, bought more equipment, launched what are for us big undertakings. But we have chosen projects that we expect to pay for themselves even with the extra costs, or that someone else will pay for.

The other reason that we have delayed working with electronic manuscripts until now has been that we had assumed (erroneously, as it turned out) that processing author-submitted electronic manuscripts would be more feasible for monographs than for journal articles. In a monograph, there is one author and perhaps several hundred text pages; once you've figured out how to work with the first chapter, the rest is easy and any time spent resolving problems should be insignificant compared to the savings in typesetting. In a journal, on the other hand, there is a new author -- and a potential new problem -- every 10 pages or so. Time spent coping with file format problems, we surmised, might quickly erase any cost savings from using the author's keystrokes. At the same time, we were getting better and better prices from our typesetters, reducing the incentive to change our ways.

But we conducted experiments in any case, and discovered, to our surprise, that our fears were greatly exaggerated. The files that we have received from journal authors have not presented any great problems at all. As it turns out, in some ways using author's electronic texts seems to work better for journals than for books. With a journal, the typesetter, the editorial style, and the desired coding are all known in advance and it thus is easy to quickly determine exactly what can and cannot be used from an author's electronic file. Our colleagues at the Books Divisions report that they often find themselves spending too much time trying to reconcile the author's electronic file with the evolving design of the book and the coding requirements of the typesetter.

The results of our first tests of on-screen editing of author submitted e-mss were so good that we decided it was time to step on the gas. That project is now being extended to include the all of journals that are edited in-house and eventually it will also be implemented at journals that are copy-edited at the editorial office. At the same time, we have been having discussions with the American Astronomical Society, for whom we publish the Astrophysical Journal, Letters, and Supplement--around 20,000 typeset pages per year of intensive math, tables, and figures. The AAS has ambitious plans for electronic publishing and we are working closely with them. So we are now undertaking a second, more elaborate project in electronic text processing in conjunction with the AAS.

Two E-Text Projects at the University of Chicago Press

I'm going to describe the two projects in some detail because they are based on different models for obtaining and processing electronic text. I referred to them earlier as lightly coded and heavily coded, or big target and small target. In the first project, we will accept almost anything that the author can throw our way (hence "big target"); in the second project, only manuscripts that are heavily coded by the author will be accepted (the "small target"): [Overhead 1]

Project 1: The Big Target

This project is primarily the work of John Muenning, our Electronic Manuscripts Supervisor. We accept any file in any word processor format, Mac or PC, and then translate to WordPerfect for DOS using translation programs like Software Bridge and MacLinkPlus. From the files we recover ASCII text, special characters, in-line formulae, embedded footnote calls, and text attributes (italic, bold, super, subscript, underline, small caps). We do not try to capture tabular material, display equations, or any attached graphics, though we do sometimes print out tables or equations from WordPerfect to get a clean copy to mark up for the typesetter. Given a file with that much usable coding, we can then copy edit a manuscript on screen using WordPerfect in almost exactly the same amount of time that it would take on paper. After copy-editing, all the WordPerfect codes are replaced with the appropriate typesetting codes and the file goes off to the typesetter. A plain ASCII file is not adequate for our purposes; it would require too much additional coding during editing to be cost effective. It is much easier to print out an ASCII file, edit it on paper, and then send it out for conventional typesetting. Within six to nine months of setting up this system, all of the participating journals are receiving 75 to 80% of manuscripts in usable electronic form.

By sending completely coded body text in electronic form, without tables and display equations, we have reducing our typesetting costs by around 30%. The additional costs were more PCs, copies of WordPerfect, Software Bridge, and DocuComp, staff training time, and staff time to work out the macros required to massage the manuscripts and insert the appropriate typesetters codes. Once this system is fully implemented, the next stage will be to develop tools for using author supplied tables and equations. After that, or possibly at the same time, we expect to move from WordPerfect as our editing environment to SGML and from supplying electronic manuscripts coded with the typesetter's coding system to supplying SGML coded manuscripts. That will depend on the success of our second project now underway.

Project 2: The Small Target

Our second project, for The Astrophysical Journal, is primarily my work, though many others are involved as well. The American Astronomical Society, sponsor of The Astrophysical Journal (published by the University of Chicago Press) and The Astronomical Journal (published by AIP), has decided that they want their members/authors to be able to submit manuscripts in entirely electronic form via e-mail and to have those manuscripts used as the starting point for the publishing process. To that end, they have adopted LaTeX as their format for text and encapsulated postscript for line art; a format for halftones is yet to be determined. The AAS has created a set of LaTeX macros that will be used by authors to submit to their journals. Only manuscripts correctly formatted according to the AAS instructions will be accepted in to the electronic production stream. This imposes a considerable burden on the authors, but has tremendous advantages in the later stages of the process. The burden on the authors is mitigated somewhat by the fact that the AAS journals publish about 45% of the world's literature in astrophysics, so the authors are likely to use the AAS macro set frequently.

The processing of the electronic and paper files is shown in Overhead 2. The most important point is that we and the AAS have agreed to translate from LaTeX to SGML and to do the editing and coding in SGML. The SGML files will drive the conventional typesetting, but they will also be coded in such a way that they can form the basis of a future electronic delivery and retrieval system. The costs of this project are considerable. To be able to edit SGML text, including display equations and tabular material, we are using ArborText's SGML Publisher, which runs on Unix workstations. On the other hand, the cost savings will also be considerable. We will be supply the typesetter with electronic files that are completely coded down to the last detail, ready for page makeup, final corrections, and output.

Some of the cost savings in typesetting will be offset by increased costs in handling conventional manuscripts: getting the art into electronic form and translating the typesetting tapes of paper manuscripts into properly coded SGML. If all goes well, the end result will be a complete text of the journal in a uniform format, regardless of the initial format of the manuscript.

Conclusion

If we step back for moment and look at what is happening in scholarly publishing, we can recognize that the problem is not electronic text, per se, but how to managing rapid technological change without loosing our shirts and/or our jobs in the process. Of course, we are not the only people that have to deal with this sort of problem. Our colleagues in the University computation and telecommunications departments worry about this all the time. If you are lucky enough to have colleagues that can and will help you, you are fortunate indeed.

How much you can do for yourself depends on your size and your infrastructure. It is very easy to get over-extended and over-committed. The computer tabloids are full of horror stories about millions of dollars spent on useless projects. It's just as easy to hemorrhage money in the smaller scale that University Presses work in. In the last decade, the University of Chicago Press has gone from four of the original IBM PCs to 200+ PCs & Macs, four Local-area-networks and a wide-area-network, two minicomputer and associated ordering and accounting systems, a Unix database server, four Sun workstations, fax gateways, two e-mail systems and three e-mail gateways, god-knows how many printers, and so on . . . (It doesn't help that we are spread out over four buildings, two of which are not on the campus proper.) Soon it becomes a full time job just keeping all the systems going -- several jobs in fact. The moral is, be sure to figure the cost of maintaining the technological infrastructure into any project that would add significantly to your present systems.

In the course of my work, I read a lot of Information Systems literature, in which one can find, if not great prose, certainly lots of discussion of technological change. I have found particularly useful the following, by Prof. James Senn of the Georgia State University Information and Technology Management Center:

Key steps in adopting new technologies, for technologies driven by technical needs:

o Identify emerging technologies and assess their usefulness
o Develop and run prototypes along business lines
o Activate the new technology and integrate with existing operations

Possibly I like Senn's writing because it describes the strategy that we are following:

I would also add:

These are very certainly unsettled times. One could easily persuade oneself that there is no future for the University Press in journals publishing and begin to make plans to gracefully fold up shop. But that would be a mistake. Better to build on existing strengths in selecting and distributing information and in adding value through editing and formatting and then look for new ways to be of service to the academic community. That is the direction we are headed . . . electronic text is just the beginning.

Overhead One

Overhead Two