One of the proposal's proponents' regular strategies is to insist that moving to electronic journals is a much simpler process than other participants believe to be the case. Richard Entlich, a librarian at Cornell, with substantial hands-on experience in implementing online journals for university researchers, shares his experience and points to the complexity of the publishing landscape and the interrelated nature of the various parts.
Date: Thu, 7 Jul 94 14:58:46 EDT
From: "Stevan Harnad" harnad@Princeton.EDU
From: Richard Entlich rentlich@oldal.mannlib.cornell.edu
Subject: Re: Ginsparg's Reply to Garson
To: harnad@Princeton.EDU
Date: Thu, 7 Jul 94 13:53:12 EDT
Stevan,
You forwarded Paul Ginsparg's comments on Lorrin Garson's response to your "subversive" proposal to VPIEJ-L and perhaps elsewhere. Please forward my comments to whatever lists you sent his comments.
Dr. Ginsparg['s]... comments on the CORE (Chemistry Online Retrieval Experiment) project are ill-informed. First of all, CORE was not conceived, nor has it ever been portrayed as a model for de novo electronic publishing. CORE is a retrospective conversion project, designed to test the efficacy of a variety of approaches to capturing previously published material, using whatever combination of machine-readable formats may be available or obtainable through conversion. Perhaps high energy physicists have no interest in anything published more than a few picoseconds ago, but in most disciplines, the existing print corpus has ongoing value.
Yes, CORE is using bitmapped page images, but it is hardly "another scan and shred project to post bitmaps of existing journals." Full-page bitmaps are used 1) because they are a reasonable alternative for conversion of existing print archives to machine-readable form, and 2) to capture portions of pages which were not available in machine-readable form, mainly illustrations of various types. However, the heart of the CORE project is over ten years of marked-up machine-readable text files from twenty ACS journals. These files are converted from ACS proprietary markup to SGML.
The resulting files can be searched, displayed and navigated via a sophisticated X Window based interface developed by OCLC called Scepter. Full-text searching is provided (including about two dozen fields, from author and title to CAS registry number and figure captions) and supports Boolean and adjacency operators, truncation, and direct searching on Greek letters and diacritics. Text is displayed using standard and custom-designed X Window fonts. The interface also supports direct access to article subsections, hypertext searching and citation linking, and full article printing.
Article text, equations and tables are all displayed based on the existing machine-readable files. Only figures are displayed as bitmaps. CORE makes the best possible use of these bitmaps by extracting them from the full-page image file and making them accessible from icons embedded in the text. In Scepter, they are also displayed thumbnail size along with the article front matter so they can be browsed as a kind of "visual abstract."
Another important element of CORE is that it is based on a large corpus of highly regarded publications, spanning many subdisciplines within chemistry. In addition to working out technical problems, CORE was designed to test user acceptance of network journal delivery in a variety of formats. A large enough body of material to create more than a "toy" system was seen as essential to the user testing process. Perhaps physicists are content with downloading TeK source or PostScript, but Ginsparg's system will not necessarily translate smoothly to other disciplines, at least not right away.
Not every group of scholars has the same degree of computing sophistication, or access to state-of-the-art computing equipment. Not everyone has ready access and familiarity with Unix workstations or can afford to replace equipment in order to keep pace with the latest network fad. For instance, there are still plenty of Macintoshes and PCs around which cannot run NCSA Mosaic.
I recognize that Ginsparg wants to make every physicist a self-publisher and believes that his colleagues all share that desire and are equipped to do so. Perhaps the pervasive use of computers in physics and established standard of TeK for manuscript preparation makes this reasonable--for physics. But even physicians, who are, as a group, wealthy and fairly technically literate, have expressed doubts about electronic journals. (See, for example, JAMA, May 6, 1992, vol. 267, no. 17, p. 2374 and The New England Journal of Medicine, Jan. 16, 1992, vol. 326, no. 3, pp. 195-97). Some of their concerns focus on the peer review process, but others focus on the expense of computing equipment, and lack of format standardization for manuscript generation.
And speaking of medicine, Ginsparg takes a shot at "...dead formats promoted in general by OCLC." OCLC happens to co-publish (with AAAS) an electronic journal in medicine, the Online Journal of Current Clinical Trials. Though I am in no way a spokesperson for OCLC, I am puzzled at Ginsparg's comments. OCLC has done pioneering work in the creation of de novo networked electronic journals, most of which is based on TeK and SGML. These hardly qualify as "dead formats."
Lest I come off sounding like an apologist for the publishing community, let me make my position clear. As a librarian, I am acutely aware of the down side of print publishing in terms of cost, distribution, access, time lag, functionality, space requirements, preservation, etc. Libraries have been too reluctant to embrace new technologies which offer potential solutions to some of these problems. But it is also hardly the case that Ginsparg's system resolves all the myriad issues involved in the transition from print to electronic publishing and distribution of scholarly articles. Some of the reticence on the part of libraries reflects the tremendous flux and lack of standardization in information technology. One does not throw out a proven, centuries old system, whatever its flaws and limitations, without solid assurance that its replacement is a reliable, stable substitute for the long-term.
I am as excited as anyone working in the electronic journal area about the promise of new technologies; I also recognize that progress towards network publishing will probably cause upheaval within libraries and very likely the disappearance of some. Libraries will attempt to find continuing relevance. Nevertheless, we will not support print publishing when it ceases to meet the needs of our patrons. In the meantime, despite the success of Ginsparg's preprint system, more research is needed in the areas of interface design, organization and classification of machine-readable files, the creation of machine-readable archives which will remain accessible for centuries, etc. Even though it is based on previously published material, CORE is helping to address these thorny issues.
Richard Entlich Technical Project Manager
Albert R. Mann Library Information Technology Section
Cornell University entlich@cornell.edu
(Note: some of the above comments are based on a talk I gave at the 9th annual NASIG (North American Serials Interest Group) conference in Vancouver, BC last month and will subsequently appear in the conference proceedings.)
Date: Thu, 7 Jul 94 20:17:21 -0600
From: Paul Ginsparg 505-667-7353 ginsparg@qfwfq.lanl.gov
Subject: Re: Entlich Reply to Ginsparg
richard entlich's remarks miss the point. the point i was trying to make was that garson's examples of electronic involvement were all irrelevant to the argument at hand, that of cost estimates for true electronic research distribution, and were just confusing the issues.
i'm eager to see other kinds of publishing efforts that look promising. i offer the physics and related servers as an example to others who might want to do something similar; various features clearly will not be applicable for all communities. others can learn from our mistakes. (o'donnell's Chicago Journal of Theoretical Computer Science (MIT Press) will be a most interesting experiment -- to see if they can provide sufficient "value-added" for which people will voluntarily pay.)
[Ginsparg's] comments on the CORE (Chemistry Online Retrieval
Experiment project are ill-informed. First of all, CORE was not conceived, nor
has it ever been portrayed as a model for de novo electronic publishing. CORE
is a retrospective conversion project,
correct, that's precisely why i identified it as irrelevant to the question of costs of an enterprise that starts electronic from inception.
Perhaps high energy physicists have no interest in anything
published more than a few picoseconds ago, but in most
disciplines, the existing print corpus has ongoing value.
my community accesses the archival database (journals in libraries) as well as the growing electronic one, never argued otherwise -- not sure why we're being reviled here. how best to port the archival database to electronic format is an important question, it is just not relevant to the issue at hand, as mentioned above. (and this is neither the proper forum to give an exhaustive technical critique of the "sophisticated X Window based interface developed by OCLC called Scepter.")
In addition to working out technical problems, CORE was
designed to test user acceptance of network journal delivery in a
variety of formats.
the report i heard from the head librarian at cornell (harvard "gateways to knowledge" meeting last fall) was that user acceptance was remarkably low for reasons they did not yet understand.
Perhaps physicists are content with downloading TeK source or
PostScript, but Ginsparg's system will not necessarily translate
smoothly to other disciplines, at least not right away.
that's TeX (the X according to Knuth is a chi, hence the pronunciation). undoubtedly it won't transfer smoothly, i have no doubt there are many features peculiar to my community. but we are looking towards the future and can envision a gradual transition. different communities will have different standards. perhaps no matter what word-processor is used, they may be able to choose the final output format (as we currently choose postscript for some applications): acrobat pdf, sgml, or some other -- all readily interconvertible. five years from now, the options for author-prepared documents are guaranteed to be dramatically improved over now; and each generation of more sophisticated software grows easier to use. the point is to start thinking ahead now.
For instance, there are still plenty of Macintoshes and PCs
around which cannot run NCSA Mosaic.
not sure i understand this comment. we've got macmosaic running here on the lowest end mac classic -- probably just means there are some macs and pc's not connected to the internet because no one installed mactcp or equivalent. it is true that the windows version of mosaic will not run on a pc that cannot run windows, but there will always be a mix of technology at any given time and servers can always provide a lowest common denominator interface (the systems i set up still allow for equal low-end e-mail access via dumb terminal and printer). the important point is that many communities will find self-sufficiency in their interests, and they will proceed accordingly.
OCLC happens to co-publish (with AAAS) an electronic journal in medicine,
the Online Journal of Current Clinical Trials.
yes, this was announced with great fanfare in mid '92. it required proprietary software that ran on low-end pc's ("for instance there are still plenty" of high end machines that do not run low-end pc emulation. in a few years will there be more of these or more "macs and pcs around which cannot run ncsa mosaic"?) and was far from state-of-the-art even at the time (i remember discussing this with representatives of other publishing companies.) after more than half a year it had published a grand total of only seven submissions (as reported in Science, another AAAS publication), and was used as the standard example of how not to proceed. i do not have statistics for how it is currently faring, but perhaps they have since made improvements to correct the deficiencies -- might even provide some solid basis for the 25% vs 75% cost question, but not if they're still too remote from critical mass.
OCLC has done pioneering work in the creation of de novo
networked electronic journals, most of which is based on TeX and
SGML. These hardly qualify as "dead formats."
as i mentioned in my message to andrew o., as a member of an aps advisory board i've seen their more recent proposals and while it is inappropriate to comment in detail here, i can readily affirm that there's nothing that impacts the issue of costs of publishing scientific vs. non-scientific material.
Lest I come off sounding like an apologist for the publishing community,
let me make my position clear. As a librarian, I am acutely aware of the
down side of print publishing in terms of cost, distribution, access,
time lag, functionality, space requirements, preservation, etc.
Libraries have been too reluctant to embrace new technologies which
offer potential solutions to some of these problems.
and i am entirely sympathetic to the plight of librarians for whom committing prematurely to the wrong technology would be a disaster. and i am sympathetic because i've always been a fan of libraries and librarians (aren't all academics?) and they're as much victims of the practices of pub co's as we are.
But it is also hardly the case that Ginsparg's system resolves all the
myriad issues involved in the transition from print to electronic
publishing and distribution of scholarly articles.
no argument.
Some of the reticence on the part of libraries reflects the tremendous flux
and lack of standardization in information technology. One does not throw
out a proven, centuries old system, whatever its flaws and limitations,
without solid assurance that its replacement is a reliable, stable
substitute for the long-term.
no argument. this is why it's so much easier for us to test the envelope -- the consequences of failure are less pronounced.
I am as excited as anyone working in the electronic journal area about
the promise of new technologies. I also recognize that progress towards
network publishing will probably cause upheaval within libraries and
very likely the disappearance of some. Libraries will attempt to find
continuing relevance.
important issues. and by no means clear at present what will be the evolving role of libraries (and in particular of university research libraries which satisfy a wide variety of different needs). perhaps they will be out of the loop entirely for many aspects of scholarly research communication, or perhaps they will become the natural local repositories to organize and serve this information to the rest of the world. cornell's mann library is clearly ahead of the game in technical sophistication (i have no problem with that, i got my doctorate from cornell) so may not be the best short-term model for involvement from the library community.
the creation of machine-readable archives which will
remain accessible for centuries, etc. Even though it is based
on previously published material, CORE is helping to address
some of these thorny issues.
very few libraries currently have dedicated resources to address these issues. but in the most optimistic scenario, perhaps this will become commonplace in a few years and libraries and research communities can become partners in subversion to their mutual benefit. time will tell.
none of these issues impact the cost distinction between scientific and non-scientific publication, however, and that was the original issue.
Paul Ginsparg
PS it is still not clear exactly how things will proceed from community to community -- harnad's original "subversion" proposal passed to an economist got back:
... but Harnad is a bit off (at least for econ types). Most of
them care less about whether others read their stuff, what is important is
publishing because that is what determines salary and promotion.
My guess is that around 2011 his vision will happen and journals
will be a thing of the past, and I will be retired.
(c.f. harnad on compos mentis; but also comment a bit off of course for usual reason that the on-line versions will ultimately receive similar certification in your scheme and be used [or abused] for allocation of jobs, promotions, and grant money.)
[Ed. Note: Entlich's reply to Ginsparg, copied to Harnad, follows. At the time, this remained a private communication between the three, but is now used with permission to clarify the sequence of messages and ideas. This is followed by a final, private exchange between the three correspondents.]
From To:ginsparg@qfwfq.lanl.gov Fri Jul 8 14:52:38 1994
Date: Fri, 8 Jul 94 14:52:38 EDT
To: ginsparg@qfwfq.lanl.gov
From: Richard Entlich rentlich@oldal.mannlib.cornell.edu
Subject: Ginsparg reply to Entlich
Cc: harnad@Princeton.EDU
Paul,
re>> His comments on the CORE (Chemistry Online Retrieval Experiment)
re>> project are ill-informed. First of all, CORE was not conceived, nor has it
re>> ever been portrayed as a model for de novo electronic publishing. CORE is
re>> a retrospective conversion project,
pg> correct, that's precisely why i identified it as irrelevant to the question
pg> of costs of an enterprise that starts electronic from inception.
I understand that the economics are the same whether we're using bitmaps or recycled phototypesetting tapes, since both involve reprocessing material that has already been through the print publishing process. But your comments did not focus on the retrospective aspect of CORE, but on the use of bitmaps of pages. I have no problem with your making the point that ACS' participation in CORE fails to address the cost issues your raised. I was disturbed by the incorrect characterization of CORE as primarily a bitmap scanning effort and the subsequent ridicule of such efforts.
re>> Perhaps high energy physicists have no interest in anything
re>> published more than a few picoseconds ago, but in most
re>> disciplines, the existing print corpus has ongoing value.
pg> my community accesses the archival database (journals in libraries) as
pg> well as the growing electronic one, never argued otherwise -- not sure why
pg> we're being reviled here. how best to port the archival database to
pg> electronic format is an important question, it is
pg> just not relevant to the issue at hand, as mentioned above.
Well, from what I've read of your preprint system, it did sound like archiving the submissions was something of an afterthought. However, the main point is that you seemed to be condemning all use of bitmaps, despite the fact that they may be the only practical way to "port the archival database to electronic format."
pg> (and this is neither the proper forum to give an exhaustive technical
pg> critique of the "sophisticated X Window based interface developed by
pg> OCLCcalled Scepter.")
Again, being unsure of how far your comments were promulgated, I was trying to set the record straight about CORE. You criticized all "scan and shred" projects, in which category you placed CORE, for being "unable to distinguish superficial appearance from information content" and for not "rethink[ing] the compromises embodied in the current paper format and robotically propagat[ing] them to the electronic format." The description of Scepter was included to clear the air and to indicate that we are working on the very functionality "that can only be embodied in the electronic format from the start," even if we are not producing new electronic journals.
The fact is, CORE has not done a very good job of publicizing what it's been doing (which may explain your own misperceptions), and I would hate for people to write us off because of your comments.
re>> In addition to working out technical problems, CORE was
re>> designed to test user acceptance of network journal delivery in a
re>> variety of formats.
pg> undoubtedly it won't transfer smoothly, i have no doubt there are many
pg> features peculiar to my community. but we are looking towards the
pg> future and can envision a gradual transition. different communities
pg> will have different standards. perhaps no matter what word-processor
pg> is used, they may be able to choose the final output format (as we
pg> currently choose postscript for some applications): acrobat pdf, sgml, or
pg> some other -- all readily interconvertible. five years from now, the options
pg> for author-prepared documents are guaranted to be dramatically
pg> improved over now; and each generation of more sophisticated
pg> software grows easier to use. the point is to start thinking ahead now,
pg> though it is still not clear exactly how things will proceed from community
pg> to community -- your original "subversion" proposal passed to an economist
pg> got back:
I basically agree, though I still think the pace of change alone will leave users in certain disciplines out.
re>> For instance, there are still plenty of Macintoshes and PCs
re>> around which cannot run NCSA Mosaic.
pg> not sure i understand this comment. we've got macmosaic running here
pg> on the lowest end mac classic -- probably just means there are some macs
pg> and pc's not connected to the internet because no one installed mactcp
pg> or equivalent.
No, I wasn't talking about network connectivity here. It's true that you can get MacMosaic to run on a "porthole" Mac. But, with 4 Meg of RAM available, System 7 needs 2, Mosaic itself wants another 1, and if you want to run several helper programs and load, say a 2 Mb QuickTime Movie, well, you're really out of luck. My main point is that the CPU speed, RAM, hard drive space, etc. ante gets upped regularly with each new network software innovation. Equipment replacement schedules in poorer disciplines are way behind the curve of such changes.
pg> it is true that the windows version of mosaic will not run on a pc that
pg> cannot run windows, but there will always be a mix of technology at
pg> any given time and servers can always provide a lowest common
pg> denominator interface (the systems i set up still allow for equal low-end
pg> e-mail access via dumb terminal and printer).
Yes, and we are very aware of the need to do this at the library, where equitable access is always a concern. On the other hand, you did describe full-text access to technical journals which lacked "mathematics, tables, or math" as "less than useless" and I agree with that evaluation. We may feel good about providing vt100 emulation as an alternative, but we should at least admit that such users, as a result of their technological poverty, are not being well served.
pg> the important point is that many communities will find self-sufficiency
pg> in their interests, and they will proceed accordingly.
Agreed.
re>> OCLC happens to co-publish (with AAAS) an electronic journal in
re>> medicine, the Online Journal of Current Clinical Trials.
pg> yes, this was announced with great fanfare in mid '92.
pg> it required proprietary software that ran on low-end pc's
pg> ("for instance there are still plenty" of high end machines that do not run
pg> low-end pc emulation. in a few years will there be more of these or more
pg> "macs and pcs around which cannot run ncsa mosaic"?)
pg> and was far from state-of-the-art even at the time (i remember discussing
pg> this with representatives of other publishing companies.)
pg> after more than half a year it had published a grand total of only seven
pg> submissions (as reported in Science, another AAAS publication),
pg> and was used as the standard example of how not to proceed.
pg> i do not have statistics for how it is currently faring, but perhaps
pg> they have since made improvements to correct the deficiencies -- might
pg> even provide some solid basis for the 25% vs 75% cost question, but not
pg> if they're still too remote from critical mass.
I intentionally avoided addressing the success or failure of OJCCT. I only wanted to say that, to my knowledge, OCLC was not emphasizing bitmaps in its electronic journal projects.
re>> OCLC has done pioneering work in the creation of de novo
re>> networked electronic journals, most of which is based on TeX and
re>> SGML. These hardly qualify as "dead formats."
pg> as i mentioned in my message to andrew o., as a member of an aps advisory
pg> board i've seen their more recent proposals and while it is inappropriate
pg> to comment in detail here, i can readily affirm that there's nothing
pg> that impacts the issue of costs of publishing scientific vs.
pg> non-scientific material.
You may well have heard something that I'm not privy to.
re>> Lest I come off sounding like an apologist for the publishing community,
re>> let me make my position clear. As a librarian, I am acutely aware of the
re>> down side of print publishing in terms of cost, distribution, access,
re>> time lag, functionality, space requirements, preservation, etc.
re>> Libraries have been too reluctant to embrace new technologies which
re>> offer potential solutions to some of these problems.
pg> and i am entirely sympathetic to the plight of librarians for whom
pg> committing prematurely to the wrong technology would be a disaster.
pr> and i am sympathetic because i've always been a fan of libraries
pg> and librarians (aren't all academics?) and they're as much victims of
pg> the practices of pub co's as we are.
Libraries have been in a difficult position with respect to serials publishers for years. Some of the responses to the ongoing "serials crisis" have not been very productive. We need to work with our faculty and other constituents to determine how we can best continue to provide service in an environment where the library is no longer the center of the information universe.
Thanks for all your comments.
Date: Mon, 11 Jul 94 02:34:13 -0600
From: Paul Ginsparg 505-667-7353 ginsparg@qfwfq.lanl.gov
To: rentlich@oldal.mannlib.cornell.edu
Subject: Re: Ginsparg reply to Entlich
Richard, some minor dangling issues.
pg>> (the systems i set up still allow for equal low-end
pg >> e-mail access via dumb terminal and printer).
re> ..full-text access to technical journals which lacked "mathematics, tables,
re> or math" as "less than useless" and I agree with that evaluation. We may
re> feel good about providing vt100 emulation as an alternative, but we should
re> at least admit that such users, as a result of their technological poverty,
re> are not being well served.
i was not clear enough. the low-end access is nonetheless full eqns/figs, just printed on laserprinter (so users are no worse off than they were with print -- and still somewhat better for the faster distribution -- they just miss out on all the higher end capabilities, e.g., hypertext access, window interface for searches, embedded hypertext in target text, links to .mpeg/.qt movies and other external software apps, etc.) this is again the advantage of starting from electronic material and not having to worry about transporting large bitmaps or reliability of ocr.
pg
[Ed. Note: The following is a message sent by Entlich to VPIEJ-L only reiterating a few of the points he made above. Though not, strictly speaking, a part of the subversive discussion, it tidies up some important points.]
Date: Mon, 11 Jul 1994 13:57:57 EDT
From: Richard Entlich rentlich@oldal.mannlib.cornell.edu
Subject: Re: Ginsparg's Reply to Entlich
To: Multiple recipients of list VPIEJ-L VPIEJ-L@VTVM1.CC.VT.EDU
In-Reply-To: from "Stevan Harnad" at Jul 11, 94 8:43 am
pg> richard entlich's remarks miss the point. the point i was trying to
pg> make was that garson's examples of electronic involvement were all
pg> irrelevant to the argument at hand, that of cost estimates for true
pg> electronic research distribution, and were just confusing the issues.
Paul Ginsparg's intention may have been as stated above, but part of the effect was to promulgate a highly misleading description and unjustified criticism of a project in which I (and many others) have invested several years. (This accounts for the angry tone of my original response). The CORE Project is obviously fair game for criticism, but even if that criticism was a sidebar to Ginsparg's thesis, it should have been based on fact, not speculation.
pg> (and this is neither the proper forum to give an exhaustive technical
pg> critique of the "sophisticated X Window based interface developed by
pg> OCLC called Scepter.")
Since the technical aspects of the CORE Project were inaccurately portrayed in this forum, what forum but this should be used to provide technical details which set the record straight?
Anyone interested in a brief summary and bibliography about the CORE Project may request one from me at the address given below.
Richard Entlich
Mann Library, Cornell University
entlich@cornell.edu
Forward to Chapter X
Backward to Chapter VIII