Contact Us | Members Only | Site Map

Association of Research Libraries (ARL®)

  Advancing Scholarly Communication Contact:
Julia Blixrud
Scholarly Journals at the Crossroads: A Subversive Proposal for Electronic Publishing

XVIII. Citations and Citation Frequency

Share Share   Print

The measure of use that is most easily quantified on a national or international basis is "citation frequency." This group of messages began during the net-wide subversive proposal discussion and then some of the discussants picked up the topic about two months after the main body of the conversation ended, for further probing. Not every message in the sequence went to the public lists directly; there was more discussion among individuals, with some of the postings occasionally being referred to the wider audience. In this regard, a rudimentary kind of editing and peer review is already taking place.


Date: Thu, 11 Aug 1994 08:57:37 EDT
From: David Stodolsky david@arch.ping.dk

Stevan Harnad harnad@princeton.edu writes:

field by field is the ULTIMATE (cross-journal) acceptance rate: It is
my belief that in one form or other, just about EVERYTHING gets
published eventually, if the author is persistent enough, even if it's
in the unrefereed vanity press. Having approximately the same
manuscript refereed repeatedly for different journals is a drain on
resources, but I'm not sure how to get around it: the prestige
hierarchy is based in part on (intellectual) competition.

Moving the arena of competition from publication rates to citation rates is one way. Since almost everything gets published, why not just abandon this competition? There is no economic justification for prior review in electronic publication. Various preprint archives have already demonstrated that direct publication is viable.

Even with traditional publication, citation rates are given more weight than publication rates. Trying to move the old mechanisms of accreditation on-line is domed to failure, in the long run. We need more powerful methods of evaluating citations. The old system of just counting them has always been recognized as inadequate. Networking tools allow us to see whether a citation supports or opposes a given publication.

This can reflect back upon the "publication rate." If an author sees that his/her article is being devalued by numerous bad reviews, then it would be wise to take it "out of circulation".

David S. Stodolsky, PhD
Internet: stodolsk@andromeda.rutgers.edu, or
Internet: david@arch.ping.dk
Peder Lykkes Vej 8, 4. tv.
DK-2300 Copenhagen S, Denmark
Voice + Fax: + 45 32 97 66 74


From: amo@research.att.com (Andrew Odlyzko)
Date: Sun, 14 Aug 94 08:40 EDT
Subject: citation frequency

Stevan,

In your comments on Bernard Naylor's "A SMALL CONTRIBUTION TO THE SUBVERSIVE DISCUSSION," one passage caught my eye, namely

Let's be more specific. Though it's risky to resort to figures from
hearsay (and that is all I must confess I have so far), I am confident
enough in what I am about to point out that even if I am wrong by one or
two orders of magnitude, the upshot is the same: The average published
scientific article has fewer than 10 readers and no citers; I'll bet the
same is true for the average piece of scholarship in the humanities.

I expect that your figure for no citers for the average scientific article is ultimately derived from the same source that I have seen quoted on many other occasions, namely the Science Citation Index (SCI). As I recall, the SCI figures indicated that only a couple of mathematics journals achieved an average of more than one citation to one of their articles, and most were well under one. Now if the average number of citations per article is below 0.5, then it certainly follows that most articles are not cited at all.

I have long been suspicious of the SCI figures, based on my own experience with them. It seemed that only a small selection of mathematics journals was covered, since often references that I knew existed would not be included in the SCI listings. However, your comment stimulated me to do some more thinking and research, and I believe I can show by a simple argument that the SCI estimates are bogus.

I have just picked up the latest issues of three mathematics journals that have accumulated on my stack of correspondence during my recent trip. They were from several areas of mathematics, and all were primary research journals, not survey ones. They contained 35 articles, and these 35 articles had a total of about 630 references, for an average of 18 references per article. (The range of number of references was from 3 to 51, and 18 seemed to be close to the median as well.) It seemed that of those typical 18 references, about 4 were to books, so there were usually about 14 references to research papers. This is a small sample, but it seems to me to be typical of the papers I see in mathematics, and so I did not bother to collect more data. It would be interesting to obtain similar estimates for other fields.

The figure of 14 backward references in a research paper is sufficient all by itself to show that the SCI figures are far from the truth. Since the scholarly literature is growing, the average number of references to a paper MUST BE IN EXCESS OF 14. To see this, consider a simple model in which papers published in a given decade reference only papers from the previous decade. In mathematics, about 250,000 papers were published during the 70s. Had there been only 250,000 papers published during the 80s, and each one referenced an average of 14 papers, each of the papers from the 70s would on average be referenced 14 times. However, the 80s saw the publication of 500,000 research articles in mathematics. Had they referenced an average of 14 papers from the 70s each, it would necessarily follow that the average number of citations per paper from the 70s would be 28. Thus it seems reasonable to estimate that the average number of citations to a mathematics paper is in the 15-30 range.

Comments:

  1. This argument does not have much bearing on the discussion of electronic journals. However, it might be important in terms of general policy issues. If the typical scholarly paper does get cited 30 times, as opposed to disappearing without a trace in the vast scholarly literature, then it is much easier to argue that public support for the original research and subsequent publication is warranted.

  2. The above argument can be used only to estimate the mean number of citations of a paper. For many purposes the median is a more useful figure, and it would be nice to obtain the complete distribution.

  3. As long as only the mean number of citations is of interest, it is possible to obtain a much better estimate than that presented above with only a little more work. It would suffice to take the 35 papers that I used and note the year of publication of each of the 630 references. Since we do have good data for the total number of mathematics papers published each year (from the reviewing journals Math. Rev. and Zentralblatt), we could then obtain a much better estimate for the total number of citations that a paper attracts, as well as the distribution of the time after publication that a paper is cited most often. This would provide much better data than that of SCI at a tiny fraction of the cost.

  4. The procedure suggested above, of sampling backward references, would not provide information on the variation in the impact that individual papers have, at least not without large sample sizes that would provide information about repeated references to a particular paper.

  5. In my article I used the figure of 20 serious readers per article. I don't think this is inconsistent with the estimates above, since scholars often reference papers that they do not know in detail. In mathematics, for example, a specialist in one area will often cite a result from another area without verifying it. By a serious reader in mathematics, I mean one who actually checks the technical details of the proofs at least to some extent. Clearly there are many more readers who just glance at papers to see what is in them.

  6. The SCI figures might be useful for gauging the relative merits of various journals within a given field.

Have you seen any arguments like this, debunking the SCI estimates?

Best regards,
Andrew


From: Stevan Harnad (harnad@clarity.princeton.edu)
To: Andrew Odlyzko

Dear Andrew,

Very interesting analysis, and we certainly need a lot more like this. I don't know of further literature, but perhaps those who read this posting will. Three comments:

(1) I don't think we need to prove that the average article has many readers or citers to justify esoteric research. First, some important contributions may be based on the work of very few people, who read and cite only one another. And second, as in all areas of human endeavor, there will always be the usual Gaussian cream-to-milk ratio: To skim of the top .01% cream, you need to allow the full volume of milk. Let 1000 flowers bloom...

(2) It is not clear whether your sample of articles was a random sample (i.e., whether they were average articles). This may not matter much, but what certainly matters is the point you note: Are they mostly citing one another or the cream of the crop (the rare "citation classics")? If so, the latter, this would still leave the average article (i.e., most) uncited and unread.

(3) There will no doubt be great variability in the answers to these questions from field to field (and subfield). What I think you don't contest is that, give or take an order or two of magnitude above and below 10, the vast bulk of the scholarly corpus is still ESOTERIC: It is a no-market literature. That's the key to the rationale for abandoning the trade model.

Stevan Harnad


From amo@research.att.com Sun Nov 20 19:19:01 1994
Message-Id: 9411210019.AA11083@a.cni.org
From: amo@research.att.com
Date: Sun, 20 Nov 94 18:54 EST
To: BERGE@guvax.acc.georgetown.edu
Cc: amo@research.att.com, ann@cni.org, garfield@aurora.cis.upenn.edu,
ginsparg@qfwfq.lanl.gov, harnad@ecs.soton.ac.uk, jlang@smtpgwy.isinet.com
Subject: how often are scholarly articles read

Stevan Harnad forwarded your query and his reply to me. The estimate that the average scholarly article is read fewer than 6 times is part of the folklore, but I don't know of any solid studies that support it.

One of the main difficulties is in defining what it means to read a paper. In "Tragic Loss...", I use the figure of 20 instead of 6, and cite it as an gross overestimate, but this is in reference to a thorough study, not browsing. If you include browsing, the correct number is on the order of several hundred. This is supported by the data from Paul Ginsparg's preprint server (we had some email discussion on this topic a few months ago) for electronically available preprints, as well as various older studies of print journals, such as those in:

D. W. King, D. D. McDonald, and N. K. Roderer, "Scientific Journals in the United States. Their production, use and economics," Hutchinson Ross, 1981.

Citations are a different matter. There the folklore is that the average paper receives no citations at all. This is often supported by the journal impact factors from the Journal Citation Reports from ISI (the publishers of Science Citation Index). However, that bit of folklore is based on a misinterpretation of what the impact factors measure. In some email correspondence a few months ago, I showed that in mathematics, the average number of citations to a paper is around 20 to 30. Other fields are likely to have different figures.

What will happen when most papers are available electronically is a fascinating question. There will be at least three different influential developments:

(1) For one thing, we are likely to have many more curious outsiders poking around. I do not believe that the general public will ever want to read most of my research papers in mathematics, but there are lots of amateurs as well as people who did get advanced training in mathematics but are working in other areas who like to look around. Nowadays they are limited by lack of convenient access to good libraries, but that barrier will disappear on the Net. Such people are likely to raise the amount of browsing that takes place.

(2) The scholars who do most of the browsing today are likely to do somewhat more in the future, since it will be so much easier. People are not going to read faster, but faster and more convenient access to information will mean they will be able to scan more in the same amount of time they do now.

(3) The most important development is likely to be the emergence of intelligent agents. Scholars and amateurs alike are likely to rely on software that will perform customized searches based on their interests. What this may mean is that people will look at a smaller number of papers, but their agents may go through vast numbers of papers to come up with that selection. This will require new techniques to deduce anything about how widely a paper is read, since most of the accesses will be by automated program that will make the raw access data unreliable.

Andrew Odlyzko


From ginsparg@qfwfq.lanl.gov Sun Nov 20 21:07:42 1994
Date: Sun, 20 Nov 94 19:06:20 -0700
From: Paul Ginsparg 505-667-7353 ginsparg@qfwfq.lanl.gov
To: amo@research.att.com, ann@cni.org, garfield@aurora.cis.upenn.edu, harnad@ecs.soton.ac.uk, jlang@smtpgwy.isinet.com
Subject: RE: how often are scholarly articles read

From: Stevan Harnad harnad@ecs.soton.ac.uk
Date: Sun, 20 Nov 94 14:05:39 GMT
To: "Zane Berge, Ph.D." BERGE@guvax.acc.georgetown.edu
Subject: Re: Your FAQ regarding Electronic Publishing

But if the figure is confirmed, I still think the database is too tiny
and unstable to make comparisons yet -- except for the physics preprint
archive. (And note that the paper data could not possibly monitor the
paper BROWSING figures, which is what a lot of the electronic "hits"
are: Remember, I share your intuition that the convenience and reach of
the Net will raise the browse/read rate appreciably, but it won't be
easy to make objective comparisons initially; only after a few years of
use, when perhaps a sample of readers could be asked to systematically
monitor their own "hit" rates in the two media.)

indeed the data here continues to collect (and usage, especially on the www interface http://xxx.lanl.gov/, had a dramatic increase after the summer -- i need to collect some revised figures on # of hits / day, etc.). the "browsing rate" remains high -- almost no paper on hep-th gets fewer than 50 hits the more popular ones instantly get a few hundred, and then there are the "megahits" -- typically review articles, that get thousands of requests. (most recently, for example, there was hep-th/9411028, posted only two weeks ago, that already has over a thousand requests. it's a 153 page set of lecture notes "what is string theory?" delivered at les houches in september that is also linked from my dedicated page http://xxx.lanl.gov/abs/hep-th/9411028 (i.e. the summer school proceedings i'm editing).

another positive virtue of that electronic version is that one late contributor no longer holds up the production -- things go on-line as they come in [actually i have two more contributions to post today or tomorrow as soon as i go over them just to verify the formatting]. also these proceedings have traditionally had b&w photos interspersed, but since i deal with a computer-literate community, i've been receiving them this year electronically as color jpegs and posting [the compression scheme gives reasonably good 24 bit quality at about 50-75kb per photo].)

stevan's comment that "the database is too tiny and unstable" for the time being remains correct, and interpretation of the data will not be entirely straightforward even when it is more complete.

From: amo@research.att.com
Date: Sun, 20 Nov 94 18:54 EST
To: BERGE@guvax.acc.georgetown.edu
Subject: how often are scholarly articles read

This will require new techniques to deduce anything about how
widely a paper is read, since most of the accesses will by automated
programs that will make the raw access data unreliable.

not sure i agree with this. automated programs will be required to respect an identification protocol -- this is already the case, for example, for all of the better www "robots" and "spiders" that identify themselves as such in the "user-agent" field for each requests. so things could in principle be set up to filter them out of the raw data if desired. automated programs are also fairly easy to detect dynamically when necessary -- i had some problems early on (see http://xxx.lanl.gov/RobotsBeware.html) and now when some automated program starts foolishly trying to index gigabytes of compressed postscript in violation of posted guidelines it gets blasted out of existence after only very few requests (remember these silly things have to open up a socket to read the data at their end... --- there was a period last spring when a new neophyte seemed to be born every week, and after getting something added to the official comp.infosystems.www faq about the problem, decided it was still useful to demonstrate that servers too could run automated programs).

on the other hand, it will remain impossible to distinguish browsing from reading (much less understanding...). the only way to gauge reading would be to adapt andrew's citation methodology, though in this case it would mean asking selected "typical" researchers to keep a diary over some period of time to keep track of papers they claim to read, and then scale from that to the size of the community and divide by total number of papers produced. in the electronic future, someone will undoubtedly try to enlist nielsen-like volunteers to have their electronic reading monitored automatically, but this will be subject to same deficiencies as current ratings (for example, statistics would be skewed by those scholarly members of nielsen families who insist on reading their esoteric scholarly publications while watching television (perhaps in a split screen display).


From amo@research.att.com Sun Nov 20 22:47:14 1994
From: amo@research.att.com
Date: Sun, 20 Nov 94 22:04 EST
To:ginsparg@qfwfq.lanl.gov
Cc: amo@research.att.com, ann@cni.org, garfield@aurora.cis.upenn.edu, harnad@ecs.soton.ac.uk, jlang@smtpgwy.isinet.com
Subject: RE: how often are scholarly articles read

Paul,

Thanks a lot for your comments.

Concerning the issue of automated programs, the problem is not that they could not be distinguished from human beings in accessing the database. Instead, what is likely to happen is that just about the only accesses will be by automated programs. What will make statistical studies of accesses hard to interpret is that different people will have different modes of operation. For example, I may set up my intelligent agents to scan your preprint server for articles of potential interest to me, and to download them to my machine. Just to be on the safe side, I might set the filters to pick up 5 times as much material as I really care to even browse, and do the final winnowing down to the desired 20% manually on my machine. Would you not count any of the downloads on the grounds they were done by an automated agent, or would you count all of them, even though only 20% of them lead to browsing? If you try to put in a scaling factor, would you use the 20% that applies to me, or the 50% that is appropriate for somebody else?

For truly reliable data, we will surely need what you suggest, namely a Nielsen-type system of monitoring usage patterns of a selected sample. That is what done in some earlier studies that were cited either in the [KingMR] or the [Machlup] books cited in my essay. (I don't have either one at hand to check.)

Regards,

Andrew


From harnad@ecs.soton.ac.uk Sun Dec 11 12:47:22 1994
From: Stevan Harnad harnad@ecs.soton.ac.uk
To: amo@research.att.com
Subject: Re: electronic publishing pointer?
Cc: ginsparg@qfwfq.lanl.gov (Paul Ginsparg) ann@cni.org (Ann Okerson)

From: amo@research.att.com
Date: Sat, 10 Dec 94 17:56 EST

Stevan,

I am glad you found my remarks on dual publication of interest.
It's too bad you could not come to the MSRI workshop last week.
There were many interesting talks there that you would surely
have enjoyed.

Will Hearst (who runs the Hearst publishing empire, and has
an honors degree in math from Harvard, so often comes to math
meetings) said in the closing panel presentation that from
his experience with mass market publishing, a new medium does
not displace an old one entirely, but rather develops a new market.
Thus, for example, radio and television did cut down the circulation
of newspapers, but did not destroy them. Unfortunately I did
not get to catch him afterwards to ask how he views the displacement
of LPs by CDs, which seems to be a counterexample to his thesis.

Best regards,
Andrew

Hi Andrew, I too think that a parallel, free incarnation is fine, and would hasten developments. On the other hand, hybrid paper/electronic projects trying to provide the electronic version only if it piggy-backs on a paper subscription, however (i.e., the kinds of projects most publishers are naturally tending to think of and try at the moment), are, I believe, regressive and doomed to fail, either because they will not have takers who are willing to pay, or because they will be done in by the contraband trade in the electronic version. At the same time, while they are attempted, these hybrid projects will SLOW developments. (Readers, authors and publishers are still confused about what there is or ought to be, and this will compound the confusion for a bit, till it dies its natural death.)

The open, independent, parallel path, on the other hand, has much to recommend it. (I hope you see the subtle but essential difference.)

As to predictions based on newspapers/books vs. TV: I'm sure the analogy will hold for the commercial, entertainment and literary texts on the Net. They still have a long paper lifetime ahead of them, perhaps forever. I suspect that in the case of esoteric science/scholarship, though, the paper incarnation may well get replaced completely -- or rather as a graded function of degree of esotericity. Then the only question is: Where will the effective threshold for a continuing viable trade-model paper incarnation of texts that are freely available on the Net (or for a trade-model, subscription or access-based electronic version) fall on this continuum?

I read the Garfield/ISI statistics differently from you, I think. I still think the vast bulk of the scientific/scholarly periodical literature has no market; I don't think the citation stats for those selected journals contradict that. Besides, it's readership stats we really need, per article, comparing how many do read it under the present constrained subscription-based conditions, with how many would if it were it on the Net always, for all, for free.

I think author's end subsidy for access to the quality-validated scholarly microphone will turn out to be the uncontestably optimal solution for the work on the long end of the threshold in question.

Chrs,

Stevan


From: Stevan Harnad harnad@ecs.soton.ac.uk
Date: Sun, 11 Dec 94 19:56:16 GMT
To: garfield@aurora.cis.upenn.edu, amo@research.att.com
Subject: Citation stats

Hi, Gene,

Andrew branched this to me too. Could I ask a couple of follow-up questions about these stats?

Are they the average number of cites per article? What is the distribution of these cites (means and standard deviations and Ns)?

Because of course this still leaves open the possibility that a few citation classics in every volume or issue of the "core" journals are doing most of the work, still leaving most articles uncited or much less cited.

And of course self-citations would have to be subtracted from all these figures.

Best wishes, Stevan

Stevan Harnad
Professor of Psychology
Director, Cognitive Sciences Centre
Department of Psychology
University of Southampton
harnad@ecs.soton.ac.uk
harnad@princeton.edu
phone: +44 703 592582
fax: +44 703 594597


Date: Fri, 9 Dec 1994 16:40:50 -0500
From: garfield@aurora.cis.upenn.edu (E. Garfield)

Dear Andrew: As you know, David Hamilton did a disservice in Science (Dec. 7, 1990, p. 1331, and Jan. 4, 1991, p. 25) by claiming that a large percentage of scholarly material is not cited, without properly distinguishing between the core journals that are regularly and consistently cited and the large numbers of small journals which are rarely cited. David Pendlebury of ISI corrected some of this information in Science, March 23, 1991, p. 1410.

In some recent communications on the Internet, you make some estimates about math journals. I have asked David Pendlebury at ISI to provide me with data for the past 13 years of citations to a few of the leading math journals. This report could, of course, be extended to other journals but just to give you an idea, this will tell you how often articles published in 1981 had been cited during the past 13 years.

List of papers published in 1981, and citations to these papers for the period 1981-1993

JournalISSNYearCites/paper
through 1993
ADV MATH0001-87088116.63
ANN MATH0003-486X8121.98
B AM MATH S0273-09798110.70
COM PA MATH0010-36408124.67
CR AC S I0764-4442812.59
DISCR MATH0012-365X813.27
DUKE MATH J0012-7094818.33
INDI MATH J0022-25188110.23
INVENT MATH0020-99108117.67
J ALGEBRA0021-8693815.79
J DIFF EQUA0022-0396817.46
J DIFF GEOM0022-040X813.88
J FUNCT ANA0022-12368110.83
J LOND MATH0024-6107813.81
J MATH ANAL0022-247X814.93
J PURE APPL0022-4049815.38
J REIN MATH0075-4102815.95
LECT N MATH0075-8434812.00
MATH ANNAL0025-5831816.83
MATH PROC C0305-0041814.00
MATH Z0025-5874816.35
NONLIN ANAL0362-546X814.50
P AM MATH S0002-9939812.84
P LOND MATH0024-6115819.27
PAC J MATH0030-8730813.29
T AM MATH S0002-9947815.99

Papers published between 81-93, cited in the period 81-93**

JournalISSNYearsCites/Paper
ANN MATH0003-486X81-9312.71
COM PA MATH0010-364081-939.56
INVENT MATH0020-991081-937.99
J DIFF GEOM0022-040X81-937.72
ADV MATH0001-870881-936.71
B AM MATH S0273-097981-936.67
P LOND MATH0024-611581-935.14
J FUNCT ANA0022-123681-934.86
INDI MATH J0022-251881-934.41
J DIFF EQUA0022-039681-934.17
DUKE MATH J0012-709481-934.05
T AM MATH S0002-994781-933.72
J REIN MATH0075-410281-933.54
MATH ANNAL0025-583181-933.46
MATH Z0025-587481-933.07
J ALGEBRA0021-869381-932.64
J LOND MATH0024-610781-932.51
NONLIN ANAL0362-546X81-932.35
MATH PROC C0305-004181-932.32
PAC J MATH0030-873081-932.29
J PURE APPL0022-404981-932.19
J MATH ANAL0022-247X81-932.10
CR AC S I0764-444281-931.72
P AM MATH S0002-993981-931.50
DISCR MATH0012-365X81-931.41
LECT N MATH0075-843481-931.17

**It is important to realize that in the second list averages are lower, since articles published in the last year are included. Only a year by year study of cumulated cites can give a true picture. Note that the journal indicators file available from ISI also indicates percentage of uncitedness for each journal year.

For other reasons, I am interested in getting the same information on journals with a much wider readership and impact. You will be interested to know that in the same period and file, articles published in Science, Nature, etc. in 1981 have been cited on an average over 70 times each. In the list of journals that I have studied, there are only a small number of articles that are never cited.

Best wishes,
Eugene Garfield, Ph.D.
Chairman Emeritus ISI and Publisher, THE SCIENTIST
3501 Market Street
Philadelphia,PA 19104
Tel: (215)243-2205 // Fax: (215)387-1266
E-mail: garfield@aurora.cis.upenn.edu


From: amo@research.att.com
Date: Wed, 14 Dec 94 05:32 EST
To: harnad@ecs.soton.ac.uk
Cc: ann@cni.org, ginsparg@qfwfq.lanl.gov
Subject: Re: E-Pub

Stevan, Who, me? Sorry, you have the wrong guy. I certainly do not recognize anything I ever wrote as implying that "it could all be fought out by a Darwinian popularity contest among readers and commentators of the posted papers and their successive iterations." All that I every claimed (and I still claim) is that if we had a continuation of a chaotic system on the net, there would be a Darwinian evolution of some type of peer review. Moreover, given the speed with which everything moves on the Net, such Darwinian evolution would be extremely rapid. Like Paul, I am not satisfied with the present refereeing system, even though I have grown up with a much better one than he has to deal with in his field. That's why I spent so much time in my essay complaining about the inadequacies of what we have. My point is that we can do better on the Net, with improved tools.

As an example of how quickly a review system can evolve on the Net, let me cite the following story (which I also mentioned at the MSRI workshop two weeks ago). Last summer, at a cryptography conference, a bunch of us were standing around, and the conversation turned to the cypherpunks mailing list. One of my colleagues was complaining that some of the most interesting news items about security (such as about changes in government Clipper chip policy, lawsuits over basic public key cryptography patents, etc.) were showing up first on that list, but that he (let us call him X) found it much too time consuming to wade through the huge amount of stuff pouring through in order to dig up the few nuggets of interesting information. At that point one of the other chaps, call him Y, opined that it was not all that hard at all, and that he found it amusing to scan all that material. Here is roughly how it went from that point on:

X: "How much would you charge to store the valuable pieces and send them to me once a day for a year?"

Y: "Twenty bucks."

X: pulls out a $20 bill and hands it to Y.

Z, W: "Here is $20 for me,. How about a discount rate for our group?..."

People are resourceful, and they will find ways to cope with information overload.

Best regards, Andrew


From: Stevan Harnad harnad@ecs.soton.ac.uk
Date: Fri, 16 Dec 94 22:46:30 GMT
To: garfield@aurora.cis.upenn.edu (E. Garfield)
Subject: Re: Citation Stats
Cc: amo@research.att.com (Andrew Odlyzko EJ)

Yes, these are averages for 1981. We could also obtain separate averages
for "cited" papers, thus omitting the small number of uncited papers.
The second set of numbers is a cumulative average for all the
papers published in the period 1981-93 and that is why the
averages are lower -- less chance for more recent years to
accumulate citations.

Hi Gene, any way to get actual DISTRIBUTIONS (average citations = 10 could happen because most papers get 10, fewer get 8 or 12, etc. all the way to the fewest getting 1 or 19; OR it could happen because most papers = 0 and a few get LOTS of citations. Which is it? Only variances and distribution statistics will tell you, not averages.

Second, self-citations must be subtracted, or you could already pump it to an average of 10 right there!

Any chance of getting data like that?

Happy Holidays!

Stevan


From: amo@research.att.com
Date: Mon, 2 Jan 95 08:40 EST
To: harnad@ecs.soton.ac.uk
Cc: 70244.1532@compuserve.com, B.Naylor@soton.ac.uk, ann@cni.org, dpendle@isinet.com, garfield@aurora.cis.upenn.edu, ginsparg@qfwfq.lanl.gov, lederberg@rockvax.rockefeller.edu, quinn@math.vt.edu
Subject: citation frequency

Stevan,

At the request of Gene Garfield, David Pendlebury of ISI has provided me with some information about citation statistics, of the type we both felt would be useful to have. He wrote that generally citations to a paper peak in years 2-3 (and in years 3-5 in chemistry and applied sciences). He also provided pointers to two articles in Science, both by David Hamilton, in the Dec. 7, 1990, and Jan. 4, 1991, issues, that were based on studies carried out by Pendlebury at the request of Science. One statistic indicated that 55% of the papers in the ISI database (for the 1984 publication year) had not received a single citation in the 5 years after publication. The "uncitedness" fraction ranged from 9.2% for atomic, molecular, and chemical physics, to 47.4% for all the so-called hard sciences, to 72% for all of engineering, to 90.1% for political science, 95.5% for history (but only 29.2% for history and philosophy of science) and 99.6% for architecture. The overall figures are

47.4% hard sciences

72.0% engineering

74.7% social sciences

98.0% humanities

The March 22, 1991 issue of Science has a letter from Pendlebury mentioning a variety of caveats that have to be applied when interpreting these statistics. For example, many of the items in the ISI database that were used in compiling the statistics for the Hamilton articles were meetings abstracts, editorials, and so on, and thus would not be regarded as primary scholarly publications. Excluding just those items lowers the "uncitedness" fraction to 22.4% for the 1984 science articles, 48.0% of the social sciences, and 93.1% for the arts and humanities.

Perhaps our difference of opinion, with you feeling that the vast majority of esoteric scholarly papers are never cited, and I claiming that at least a large fraction of the papers do get cited, reflect the different fields we are in. In any case, we both get support for our opinions from the ISI data.

Best regards, and Happy New Year, Andrew


From: Stevan Harnad harnad@ecs.soton.ac.uk
Date: Mon, 2 Jan 95 17:33:17 GMT

Andrew, Thanks for forwarding the ISI data. There are no doubt differences between fields, and no doubt they are in the direction you note (one would also like some statistics on numbers of publications, authors and readers in each field).

You're right, that there's support there for BOTH of our (opposite) views! To sort things out one would AT LEAST have to know the following:

(1) Were self-citations (for all co-authors) systematically eliminated from this set? That's always good for a few gratuitous cites per article.

(2) Breakdowns by citation-frequency would be more informative (no doubt there were some in the articles you mention) than dichotomous cited/noncited data.

(3) Besides wanting to know (a) the absolute numbers of publications, authors and readers in the different fields, and (b) how these might be related to citation frequency, one would want to relate them to (c) the journal prestige hierarchy in each field (no doubt ISI has these figures too) and perhaps even (d) the degree of interdisciplinarity of the field.

I'm not sure the variance is accounted for entirely by the hardsci-softsci-nonsci continuum (though it might be).

Happy '95, Stevan


Forward to Chapter XIX

Backward to Chapter XVII