The Electronic Archive: Scientific Publishing for the 1990s

10
The Electronic Archive: Scientific Publishing for the 1990s Author(s): William Gardner Source: Psychological Science, Vol. 1, No. 6 (Nov., 1990), pp. 333-341 Published by: Sage Publications, Inc. on behalf of the Association for Psychological Science Stable URL: http://www.jstor.org/stable/40062819 . Accessed: 16/06/2014 02:34 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Sage Publications, Inc. and Association for Psychological Science are collaborating with JSTOR to digitize, preserve and extend access to Psychological Science. http://www.jstor.org This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AM All use subject to JSTOR Terms and Conditions

Transcript of The Electronic Archive: Scientific Publishing for the 1990s

Page 1: The Electronic Archive: Scientific Publishing for the 1990s

The Electronic Archive: Scientific Publishing for the 1990sAuthor(s): William GardnerSource: Psychological Science, Vol. 1, No. 6 (Nov., 1990), pp. 333-341Published by: Sage Publications, Inc. on behalf of the Association for Psychological ScienceStable URL: http://www.jstor.org/stable/40062819 .

Accessed: 16/06/2014 02:34

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Sage Publications, Inc. and Association for Psychological Science are collaborating with JSTOR to digitize,preserve and extend access to Psychological Science.

http://www.jstor.org

This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AMAll use subject to JSTOR Terms and Conditions

Page 2: The Electronic Archive: Scientific Publishing for the 1990s

THE ELECTRONIC ARCHIVE:

SCIENTIFIC PUBLISHING FOR

THE 1990s

William Gardner

PSYCHOLOGICAL SCIENCE

Electronic Publishing

University of Virginia

PS looks at the promise, problems, and prospects of electronic publishing

Abstract - / offer a description and rationale for an electronic- journal publishing program for psychologists, called the elec- tonic archive. Three principles are critical. First, electronic publishing must retain the readability of a traditional printed journal. Second, it must be both accessible and attractive to all members of the discipline, whether they use computers or not. Most importantly , it must provide improved facilities for re- trieving information, while continuing to serve as a permanent archive of the Society. I argue that the primary advantage of electronic publishing is not the inexpensive delivery of text, but the use of a centralized archive to concentrate resources for discovering and utilizing information. The archive would pro- vide a platform for programs embodying knowledge about the field and the intellectual goals of individual users to facilitate the intelligent retrieval of text. By using the dynamic branching and graphical display capacities of the computer, the archive can present texts in ways that cannot be rendered in print. These facilities can give scholars personalized access to infor- mation with increased scope and depth.

Electronic journals have obvious rationales. Authors are frustrated by long delays in the publication of their articles. Many libraries cannot afford new journals: li- brary subscription prices in the early 1980s have risen at about 10% per annum, substantially above the rate of inflation, while the number of scientific and technical journals doubles every 30 years (Lambert, 1985). Elec- tronic publishing would be faster for authors and cheaper for libraries and readers.

Despite years of discussion (Rogers & Hurt, 1989; Seiler, 1989; Senders, 1977; Singleton, 1981), however, there are presently no electronic journals in psychology.

Correspondence and reprint requests to William Gardner, Depart- ment of Psychiatry, School of Medicine, University of Pittsburgh, Pitts- burgh, PA 15216; electronic mail to [email protected].

The problems with electronic scientific publishing are lit- erally visible in the many electronic newsletters that do exist. Newsletter issues are text files sent over the vari- ous networks, using either news software or mailing lists. You read them by scrolling on the screen or downloading them to a word processor. These newsletters are inex- pensive to produce and distribute, but in most respects electronic newsletters are inferior to printed journals as media for information storage, retrieval, and dissemina- tion.

Most computer screens are terrible for reading The lowest common denominator viewing technol-

ogy - the monochrome PC screen - forces the newsletter publisher to omit the highly developed fonts and layouts of a printed journal. However, tying the journal to a high quality display format would drastically restrict its read- ership.

Newsletter publishing excludes non-computer-using members of the discipline Therefore, it is an inappropriate medium for the jour-

nals of record of a scientific society. Serious researchers would not send important work to a newsletter journal and, as a consequence, serious readers would not ac- quire it.

Personal computers are poor facilities for storing and retrieving text One can easily scan the table of contents of a printed

journal and turn to the desired page. The best one could do with a text file is to scan the contents at the head and then search for the title. With a printed journal, you can easily store information about prior searches with book- marks and highlighters, but few academics have the tools or skills to do this electronically. It is also difficult for users to archive text files. With a cumulative index, you can reliably retrieve an article from a shelf of printed journals. Users receiving electronic newsletters are likely to store them haphazardly on their fragile hard disks.

VOL. 1, NO. 6, NOVEMBER 1990 Copyright © 1990 American Psychological Society 333

This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AMAll use subject to JSTOR Terms and Conditions

Page 3: The Electronic Archive: Scientific Publishing for the 1990s

PSYCHOLOGICAL SCIENCE

The Electronic Archive

Successful electronic scholarly publishing must solve all these problems. It must retain the highly evolved read- ability of a traditional printed journal. It must be acces- sible and attractive to all members of the discipline, whether they use computers or not. Most importantly, it must provide improved facilities for retrieving informa- tion, while continuing to serve as a permanent archive of the society.

THE ELECTRONIC ARCHIVE

Instead of an electronic newsletter or journal, I envi- sion an electronic archive. A journal is a serial publica- tion, with articles that are bound into issues, delivered on a fixed schedule, and stored on shelves. The electronic archive would retain the article as the fundamental unit of scientific communication. But it would publish articles, not journals, on demand for individual readers, in what- ever format suits them best (and one option would be bound and printed serial issues, that is, a traditional jour- nal). Instead of contracting with subscribers to purchase all that it publishes, it would communicate to them what is has, so they can acquire and read selectively. What follows is a description of how such an archive might work. It is illustrated by a description of the life of an article, from its submission to the archive through its acquisition by three users of varying computer skills.

The article Professor Z has just mailed an article entitled "Age

differences in risk perception" to the APS editor for cog- nitive development. Four months and two rounds of re- view later, Z's article is accepted. The editor sends the text to the electronic archive's typesetter. The typesetter scans Z's printed text and figures, converting them to a computer file in the APS archival storage format. Ninety minutes of fine tuning the code and she has printed a galley that is FAXed to Z, who checks it and FAXes it back the next day. The typesetter corrects the code and transmits the archive format file to the APS archivist.

The archivist is responsible for seeing that the article is stored and catalogued in the electronic archive, the da- tabase of the Society's publications. She runs a program that assigns Z's article a call number and establishes links between it and all other papers in the archive that Z has cited (Conklin, 1987). She runs Indexer, a program that indexes Z's article using the words in the text and key- words supplied by the author. Indexer suggests some ad- ditional indexing terms, based on its thesauri (Rada & Martin, 1987) and inferences from its knowledge about some of the keywords (Humphrey, 1989). Then the ar- chivist runs a program that examines the keywords, in- dexed terms, and citations in Z's article and generates a list of APS subscribers who are expected to be interested

in his work. Computer-generated regular and electronic mail begins flowing to these subscribers.

A low-skill user Professor A, Director of Child-Clinical Psychology at

a prestigous university, has never touched a computer. He learns of Z's article through a postal letter three weeks after the paper was archived. The letter prints the abstracts from all the developmental articles accepted by APS in the previous month, plus a few that the computer guesses he might want based on an interest survey and his history of article requests. Four times a year, he also gets the texts of all the articles accepted by the APS clinical editor, sent bound with a table of contents, rather than as loose reprints.

Z's article is relevant to a review A is writing, so he checks a box on the form and drops it in the mail. Four days later, the form is opened and placed in a scanner run by a program that processes subscriber orders. The ar- chive-formatted text of Z's paper is converted to TEX typesetting code (Knuth, 1984) and then to PostScript, which is laserprinted and mailed 3rd class. Professor A receives Z's article 6 weeks after its acceptance and a charge is added to his APS account.

A moderate-skill user B is directing a research program on substance abuse

at a medical school. Forty-five minutes after the paper was archived, he reads Z's abstract in an electronic letter from the archive, received on a Macintosh connected to the medical school's LAN. It dawns on B that he will have to read this article: Z is famous for propagating the wrong answer to the question motivating the grant pro- posal that B is now preparing. He could request the ar- ticle from the archive by typing r and then putting SEND Z in the reply. But 3rd class is too slow, given that the grant is due Friday. He types FAX Z, exits mail and frantically resumes writing the grant. Minutes later, Z's article appears in the lab's FAX machine, but B has al- ready forgotten it and doesn't check the FAX until Friday afternoon. Returning to the Macintosh, he remotely logs into the archive's debate computer and finds the bulletin board where discussions of Z's article are posted.1 There are already seven contributions, all disparaging. B is mildly elated, a feeling that transmutes to jealousy when he realizes that he could have been first to point out what was wrong with the piece.

A high-skill user It is five years after Z's article was archived and the

first day of C's post-doc. She is working on a substance

1 . Printed debates on published articles are featured in the journals Behavioral and Brain Sciences and Contemporary Anthropology. The possibilities of on-line scientific conferences were studied in the Elec- tronic Information Exchange System project (Standera, 1987).

334 VOL. 1, NO. 6, NOVEMBER 1990

This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AMAll use subject to JSTOR Terms and Conditions

Page 4: The Electronic Archive: Scientific Publishing for the 1990s

PSYCHOLOGICAL SCIENCE

William Gardner

abuse prevention project in an Adolescent Medicine pro- gram. She needs to learn quickly about adolescents' risk perceptions and decision making.

C looks at her new Sun 6 workstation and says "Lit- erature search, APS." A 10" x 8" pewter frame appears on the screen, enclosing what appears to be Durer's Eras- mus of Rotterdam (Fig. 1). However, the caption in the frame behind Erasmus now reads 4nsf librarian, na- tional CENTER FOR SUPERCOMPUTING APPLICATIONS. mcmxcvi.' One of the leather-bound books is titled Search Strategies, another User Model.

The Librarian anounces that "You are connected to the APS archive." It occurs to C that before searching she should tell the archive about her new job: it believes that she is still a graduate student in quantitative psychology. "User Model," she orders. That book opens to another engraving, which zooms to cover most of the Erasmus. It is a pastiche of the Diirer Melancolia, bat and all, but

Fig. 1. Erasmus of Rotterdam, Albrecht Diirer, 1526. Erasmus and Diirer actively participated in the last great media shift, the print revolution (Eisenstein, 1979). Erasmus oversaw the print- ing of many of his own books. Diirer designed typefaces and wrote books on perspective, geometry, military engineering, and - like Knuth (1984) - the mathematical design of type (Panofsky, 1955).

behind the glowering Angel is a Sun 6 instead of a trun- cated rhomboid (Fig. 2). The User Model interviews C and revises her biography and interest survey. The image disolves back to the Erasmus. "Search Strategies," says C. "Boolean or Graph-directed?" replies the Librarian. "Graph-directed, and make it my default." The Boolean option retrieves titles that satisfy a logical expression of keywords (Cooper, 1988), while the Graph-directed op- tion displays the articles as nodes in a network, with edges representing the citation of one article by another (Conklin, 1987).

"What are you looking for?" asks the Librarian. "I'm interested in refereed articles on decision making, sub- stance abuse, risk perception. And adolescents." The searching process will weight these keywords in calcu- lating a function assigning relevance scores to the refer- eed articles in the archive. "Does personal relevance matter?" "Yes, make it default." The data in the User Model - the interest survey, her education, and the records of her prior searches - will also affect the rele- vance function (Brajnik, Guido, & Tasso, 1987). For ex- ample, mathematical articles will be more relevant to her than to most psychologists. "Does citation frequency

Fig. 2. Melencolia I, Albrecht Diirer, 1514. It has been sug- gested that this engraving is a memorial to Johann Muller, who went by the latin name Regiomontanus. An important figure in the Renaissance revival of Hellenistic astronomy, Regiomon- tanus established the first scientific press (Eisenstein, 1979).

VOL. 1, NO. 6, NOVEMBER 1990 335

This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AMAll use subject to JSTOR Terms and Conditions

Page 5: The Electronic Archive: Scientific Publishing for the 1990s

PSYCHOLOGICAL SCIENCE

The Electronic Archive

matter?" "No." but she doesn't make it default. Weight- ing citation frequency helps ensure that you have read what everyone else has read.

"Start now," C orders. There is a pause. "Should I show you articles in clinical psychology?" asks the Librar- ian, which was, in effect, perplexed to discover articles with high relevance scores in a domain of the archive that C has never visited. The User Model inferred that the search may be diverging from C's goals. C says "Yes," so the Librarian finishes calculating the relevance scores while the User Model changes its assumptions about C. The frame behind Erasmus zooms slightly to accommo- date a short list of the references with the highest rele- vance scores (Frisse, 1988).

C moves the cursor to a title on the list. "I want the Notebook," she remarks, as she highlights the title and clicks. The Erasmus glides up and to the right, filling the upper right corner of the screen. The page on which Eras- mus writes lifts off his lectern, floats down below the portrait, then zooms to fill the lower right corner of the screen. C's name, the date, and a summary of the search goals inferred by the Librarian are printed at the top of the page. Some whitespace follows, then the author, title, and publication information of the highlighted reference. The remainder of the screen - that is, everything to the left of the Erasmus and the Notebook - displays an im- mense, 3-dimensional network (Fairchild, Poltrock, & Furnas, 1988; Utting & Yankelovich, 1989), with edges that extend off the screen in the foreground and vanish in a haze of spiderweb at the back. It is the contents of the archive. The nodes of the graph are boxes containing references denoted by first author and brief title. The edges are the citation links. In the center of the fore- ground, toward the top, is a highlighted box containing the title C had clicked.

"Abstract." The highlighted article's abstract appears below the reference information in the Notebook. C reads the abstract and types some comments. She moves the network cursor down a citation link to a later article and clicks. The network shifts slightly, so that the new article occupies the center of the screen. "Abstract." A pause. "Full text." An 8.5" x 11" page, white with dense black characters, covers the center of the network. Her atten- tion is caught by an equation,

dU - = MUi = \piJ= 1, . . . ,az. (2) OJC/

defining the marginal utilities of accident prevention be- haviors. "Equations are catalogued now," thinks C, and asks, "Equations similar to (2), any near us?" The jour- nal page shrinks and moves to the lower left corner. The network hurtles past as the new article moves to center

screen. "Display the equation." The text opens to that point. It isn't what she had hoped. "Back." They return quickly to Z's article. "Print it." The pages of Z's text start appearing on her printer.

Advantages of the Archive

These scenarios focus on three important aspects of the archive. First, the archive can provide useful elec- tronically mediated services to the entire field, despite the variation in members' involvements with computers. Second, the archive personalizes the journal. People can get the articles they want as they are published, based on their own definition of their professional needs. The ar- chive allows readers to take selective reading to its logi- cal conclusion: users can compile their own selection of articles and, in effect, create the journal that is most rel- evant to them. However, traditional journals remain an option. There will still be editorial boards responsible for subdisciplines and subscribers can elect to receive all the articles accepted by a specific editor.

Third, the software of the archive should actively sup- port scholarly work on its documents. Researchers need assistance in discovering, recovering, and using the sub- set of information relevant to them (Raymond, Canas, Tompa, & Safayeni, 1989). Providing on-line access to scientific text may be necessary but it is hardly sufficient. A long-run goal for the archive should be the develop- ment of a suite of programs embodying knowledge about users and the field to support the intelligent retrieval of text (Chiaramella & Defude, 1987; Fox, 1987). Another long-run goal should be the design and implementation of a hypertext structure interlinking the articles (Bush, 1945; Conklin, 1987; Halasz, 1988; Raymond & Tompa, 1988). The idea is to use the graphical display and dy- namic branching capacities of the computer to present relations among documents (and fragments of text within documents) that cannot be printed on a conventional page.

Other themes could have been stressed and are rele- vant to a complete consideration of the topic. Inexpen- sive storage and distribution costs would allow the inclu- sion of a wider range of documents and materials in the archive. For example, data files could be linked to arti- cles that report analyses based on them, increasing the credibility of Results sections (Jardetzky, 1989) and al- lowing other researchers to use the data. Articles that are important but of narrow interest - for example, replica- tion studies - could be deposited and effectively ac- cessed. The computerized archive will facilitate rigorous reviewing. For example, suppose citations could be checked quickly and cheaply to verify that the articles referred to actually say what is claimed in the manuscript under review? This is the practice in law journals, which

336 VOL. 1, NO. 6, NOVEMBER 1990

This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AMAll use subject to JSTOR Terms and Conditions

Page 6: The Electronic Archive: Scientific Publishing for the 1990s

PSYCHOLOGICAL SCIENCE

William Gardner

use student labor for editing and reviewing, but it is not standard for science journals, which rely on peer review- ers. Finally, the near future will provide revolutionary technologies for the interactive display of scientific infor- mation with motion, color, sound, and three dimensions (Christodoulakis, Theodoridou, Ho, Papa, & Pathria, 1986; Fox, 1989; Frenkel, 1989). The integration of voice into electronic documents means that the electronic ar- chive can provide a more efficient means of preserving and retrieving talks and their accompanying graphics than conventional audio technology.

MAKING IT HAPPEN

We cannot, however, leap to full hypertext or multi- media electronic publishing. The archive has to evolve from the current practices of scholarly life and remain continuous with them. The high-skill users can only be supported if the base is well served. To begin discussion of how to bring the electronic archive into being, I con- sider problems encountered in previous experiments with alternative journal schemes. I then identify the key tech- nical problem to be solved: the design of an architecture for representing documents in the archive.2

Problems of Electronic Journals

Many of these proposals have antecedents in the his- tory of alternative journal publications (Lambert, 1985). The publication of separate articles from a common de- pository instead of bound journals was proposed by J.D. Bernal in 1948. Eternal's idea was never implemented, but several scientific societies have experimented with con- verting journals to "separates," or offering separates as an alternative. Separates publishing was offered by the American Society of Civil Engineers, the Physical and Chemical Societies of Great Britain, the Psychonomic Society, and the American Psychological Association (for the 1969 volume of the Journal of Applied Psychol- ogy), but all of these groups have reverted to conven- tional journal publications. Separates publication was found to be as expensive as serial publication, if not more so, and was not overwhelmingly popular with readers. There are, however, successful separates publication schemes run by the Society of Automotive Engineers, the American Chemical Society, and Elsevier Science Pub- lishers. Some experimental, on-line scientific journals

were launched in the late 1970s (Schackel, 1983; Turoff & Hiltz, 1982). These experiments failed, largely because authors would not submit articles to unknown journals. It does not follow, however, that a contemporary electronic journal would meet a similar fate.

Cost of electronic publishing Can the archive provide services at a quality and price

that will allow it to compete successfully with printed journals? In particular, can the start-up investment be covered? Conventional journals may require 5 or 6 years to recover a publisher's start-up costs (Woodworth, 1979). No definitive answer can be given without a de- tailed study. But the strongest argument for the feasibility of the electronic archive is that non-human computing costs halve every few years, while printed journal costs have been increasing faster than inflation (Senders, 1977). In the long run, technology and economics must favor electronic scientific publishing, and there are signs that they favor it now (Office of Technology Assessment, 1988).

Much of the technology of electronic publishing has become comparatively inexpensive, including products for scanning documents, processing images, storing large volumes of information, manipulating text, and for ren- dering documents on paper or screens (van Vliet, 1989). What these technologies allow is illustrated by an inno- vation in textbook publishing by McGraw-Hill and East- man Kodak (McDowell, 1989). Beginning in the Fall of 1990, instructors will be able to order versions of certain high- volume textbooks with the instructors' own selec- tion of chapters and supplementary material. These ver- sions will be hardbound, correctly paginated and in- dexed, and available in lots as small as 10. This suggests that the processing of individual orders by the archive can be automated to a great degree, and that the archive can also print on demand rather than producing a large run of separates, which must be kept in inventory and may never be sold. Finally, telecommunications equip- ment necessary for electronic distribution of printed doc- uments is now widely diffused. Texts can be distributed at low print quality via FAX or at higher quality through modem-equipped or networked personal computers with laser printers.

Although purely electronic journals have not been suc- cessful, many print journals now have a parallel existence on-line. Many trade and financial journals (e.g., Harvard Business Review) are now accessible on-line through ser- vices like Knight-Ridder's Dialog or Dow- Jones. Case reports of judicial decisions are available on-line through the LEXIS service. Several publications of the Associa- tion for Computing Machinery (Fox, 1988), Elsevier Sci- ence publishers, and the American Chemical Society are

2. There are also many ancillary questions. For example, how will

copyright issues be resolved, particularly for older documents that

might be included in the archive (de Sola Pool, 1983)?

VOL. 1, NO. 6, NOVEMBER 1990 337

This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AMAll use subject to JSTOR Terms and Conditions

Page 7: The Electronic Archive: Scientific Publishing for the 1990s

PSYCHOLOGICAL SCIENCE

The Electronic Archive

now available on-line.3 Finally, separates publication has occurred in psychology, albeit indirectly, because some bibliographic databases and interlibrary lending services (Campbell & Stern, 1987) now offer full text document delivery. So, for example, after you locate a reference using PsycLIT or Current Contents you can order a printed copy of the text by telephone, surface, or elec- tronic mail and have it delivered by mail or FAX. This service is not cheap ($14 per article for a document de- livered from a search of PsycLIT) and must derive in part from hand xeroxing.

Resistance by authors and readers Resistance to electronic journal publishing by authors

and readers must be anticipated. The resistance will stem, in all likelihood, from the belief that few others will read the journal and that, as a consequence, nothing im- portant will appear there. Until recently, it was certainly true that few psychologists could read an electronic jour- nal. However, the level of computer skills among aca- demics is much higher than when the first experiments with electronic journals were conducted. All but a few researchers now use computers somehow and most use word processing intensively. The use of a personal com- puter for remote access to larger computers or networked systems is now a given rather than an innovation.

Moreover, research universities in the developed world are rapidly connecting to the Internet. The band- width and reliability of network service can be expected to improve, particularly if the proposed Federal High Performance Computing Program (that will include a Na- tional Research and Education Network) is enacted (Mace, 1989; Sun, 1989).4 Installation of Integrated Ser- vices Digital Networks (ISDNs; see Bocker, 1988) in place of current analog telephone service will also be widespread in the late 1990s. ISDNs will allow individu- als or institutions who are not connected to the Internet to access the archive through multiple simultaneous tele- phone channels, each running an order of magnitude faster than the best contemporary telecommunications data links.5 These improvements of the communication

networks will mean that large volumes of textual and graphical information can be transmitted to people and offices both on and off the conventional academic nets.

In a longer view, however, the resistance of readers and authors to electronic publishing should be viewed as data about the quality of the interface between the ar- chive and its users, rather than evidence of skill deficits among scientists. People will come to the archive to the extent that it can be made into a convivial environment for scholarly work (Fischer & Lemke, 1988). There is an emerging body of knowledge about how to engineer com- puter systems so as to optimize human-computer inter- action (Card, Moran, & Newell, 1983; Carroll, 1987; Guindon, 1988; Norman & Draper, 1986). Much of this work is being done by cognitive psychologists and one should expect that the electronic archive would be an ideal platform for aspects of this research. The electronic archive should be more than a database, it should also be an organization of applied researchers who would work to adapt the archive to meet the needs of its users.

Summary It seems unlikely that either cost or reader resistance

will be permanent barriers for the electronic archive. Many components of the system are already appearing as commercial ventures, including services for bibliograph- ical database searching, on-line text databases, document delivery services, and customized textbook publishing. In a sense, the chief novelty in the archive proposal is that all these services, plus journal editing and publish- ing, should be fused into a single organization run by scientists. My claim is that this organization would be more than the sum of its parts: It is the appropriate or- ganizational vehicle for travel to a future computerized network of scientific texts. This ambitious goal can only be reached through a prolonged evolution, constantly in- formed by cognitive, computer, and information scien- tific research. The program to achieve it should involve the broadest possible consortium of social scientists, be- cause external funding will be needed, particularly in the early stages. This funding can easily be justified, how- ever, in terms of economies of scale that apply across all of academia. The investment in software development for a psychological electronic archive would be repaid to society through the establishment of similar archives in other fields.

An Architecture for Representing Scientific Documents

There are important choices to be made before we can specify a software design, let alone begin electronically archiving scientific articles. The authoritative copy of the article in the archive has to be encoded in some format and the text database must be given a logical structure

3. Further experiments with the electronic publication and delivery of academic and technical materials are being conducted by the Euro- pean Community (Commission of the European Communities, 1987) and the U.S. government (Office of Technology Assessment, 1988).

4. The upgrading of the network, to be installed by 1996, is designed to facilitate commerical and academic research use of supercomputers. A $1 .75 billion initiative is sponsored by Senator Gore and a $1 .9 billion variant is supported by the Bush administration. It is conceivable that Federal support might be obtained for a pilot project in electronic schol- arly publishing.

5. Conversion of the communication networks from copper wire to fiber optic cables all the way to the individual user's terminal would increase the amount of information that can be transmitted by many additional orders of magnitude.

338 VOL. 1, NO. 6, NOVEMBER 1990

This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AMAll use subject to JSTOR Terms and Conditions

Page 8: The Electronic Archive: Scientific Publishing for the 1990s

PSYCHOLOGICAL SCIENCE

William Gardner

(Horak, 1985; Mamrak, Kaelbling, Nicholas, & Share, 1987). At least three goals should govern these choices. First, we must retain the text quality, that is, typography and layout, of traditional journals. Second, we want to be able to access and manipulate components of text in an intelligent manner. Third, the design for representing documents should be standardizable across social scien- tific disciplines.

Text formatting Putting text into the archive will involve more than

simply typing the words and punctuation into a comput- er's memory. An on-line database storing articles in plain text files inherits most of the problems of an electronic newsletter. For example, it loses the typography, much of the formatting, and most of the readability of a printed journal. A simple text file also provides little support for the hypertext and intelligent retrieval technology that is the long run goal of the archive.

The encoding of an article's text must specify the in- formation necessary to render the complex typography and layout of a scientific article on diverse media and devices. For example, it must distinguish the beginning of the abstract, which might not be indented, from the be-

ginning of an ordinary paragraph, which will be. The text format codes used in word processing programs are un- suitable, because they specify the physical appearance of the article on a printed page (Coombs, Renear, & DeRose, 1987), for example, that paragraphs are sepa- rated by so many millimeters. These specifications will be incorrect for other media and become obsolete with

every change in the archive's graphic design. The perma- nent and authoritative encoding of an article should sim-

ply identify textual components like abstracts, para- graphs, and equations. You would not ordinarily read a file in this form. Rather, the identifying codes would be translated into specifications of the physical appearance of the text to suit the specific media at hand. This strategy for encoding computerized texts has recently been en- dorsed by a consortium of Humanities groups (Associa- tion for Literary and Linguistic Computing, 1989; Bar- nard, Fraser, & Logan, 1988).

Database structure We must also decide how to file text within the ar-

chive. One extreme choice would be to treat the docu- ments in the archive as the smallest integral units known to the database. At the other extreme, one could treat each sentence (equation, etc.) as a unit known to the database, and represent documents as higher order ob-

jects constructed from these atoms. Making documents be atoms would be vastly simpler and cheaper, but it would make the discourse structure within the text invis-

ible (Kintsch & van Dijk, 1978). It seems almost certain that intelligent and hypertextual support for scholarship will require the ability to catalogue, manipulate, and point to text components below the level of the article. Making sentences be atoms, however, would be pointless in the absence of working technology that could effec- tively use them (see Smolensky, Fox, King, & Lewis, 1988, for research toward this goal). Informed by on- going research in computer science (Gutting, Zicari, & Choy, 1989; Tompa, 1989; Trigg & Weiser, 1986), we should seek an intermediate level of discourse structure, above the sentence but below the document. Perhaps the classical text objects of the APA publication manual (American Psychological Association, 1983) - that is, Abstract, Introduction, Methods, Results, Discussion, Table, Figure, etc. - provide a level where discussion could begin. It can be anticipated, however, that we will eventually want to perform database operations on even smaller fragments of text, so extensibility must be a pri- mary design goal.

Summary The architecture of the representation of texts will be

critical for the archive's future. On the one hand, a well- designed text format and database structure would pro- vide a foundation for the growth of the archive to a hy- pertextual and intelligent text retrieval system. On the other hand, a poor design would isolate the archive and hinder its integration into an eventual, cross-disciplinary, inter-science electronic library.

DISCUSSION

The dispersion of archived texts through the reproduc- tion and distributed storage of serial journal issues - whether through print or electronic media - is the pri- mary inefficiency of traditional scientific publishing. The purpose of saving distribution costs and concentrating the archive in a single place is to create capital that can be spent on developing new facilities to automate and ex- tend scholarly work. Only a centralized electronic data- base with a text format, file structure, and user interface designed for intelligent text retrieval will meet the needs of researchers in the next century. By actively supporting the efficient and intelligent retrieval of information, the electronic archive will lower the marginal cost of good scholarship, and this will attract readers and authors.

A design for the computerized representation of sci- entific texts will affect how these texts are read and un- derstood and, thus, how we do science (see Katsch, 1989, for a discussion of the effect of communication technol- ogy on the legal system). But this has been true for every technical change in the representation of text. In 1707

VOL. 1, NO. 6, NOVEMBER 1990 339

This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AMAll use subject to JSTOR Terms and Conditions

Page 9: The Electronic Archive: Scientific Publishing for the 1990s

PSYCHOLOGICAL SCIENCE

The Electronic Archive

John Locke regretted the lack of a true understanding of the epistles of St. Paul,6 due to

The dividing of them into Chapters and Verses, . . . whereby they are so chop'd and minc'd, and as they are now Printed, stand so broken and divided, that not only the Common People take the Verses usually for distinct Aphorisms, but even Men of more advanced Knowledge in reading them, lose very much of the strength and force of the Coherence, and the Light that depends on it. (Locke, 1987, p. 105)

He saw that the problem was an interaction between the

representation of the text and the psychology of reading,

When the Eye is constantly disturbed with loose Sentences, that by their standing and separation, appear as so many distinct Fragments; the Mind will have much ado to take in, and to carry on in its Memory an uniform Discourse of dependent Reason- ings. (Locke, 1987, p. 105)7

Because of the interdependence between the technology of representation and human understanding, every re- presentation of a text really creates a new text (McGann, 1987). If electronic publishing will be forced upon us by economics, we should begin work immediately to design the electronic representation of scientific text to our max- imum advantage.

The social scientific literature is today broken and di- vided, and in need of a new textual apparatus. The goal of electronic representation of scientific text is to increase the scope of what any researcher can see of the scientific literature. This was precisely Locke's goal:

Our Minds are so weak and narrow, that they have need of all the helps and assistances can be procured, to lay before them undisturbedly, the Thread and Coherence of any Discourse. (Locke, 1987, p. 105)

The computerization of scientific text should, in time, lead to the weaving of a more coherent system of texts and, thus, a more powerful science.

Acknowledgment - Thanks to Michael Cole, Janna Herman, Michael Kubovy, Jerome McGann, Donald Norman, Jeannine Pinto, Sandra Scarr, Arthur Schulman, and Michael Strait for con- structive feedback and support.

6. The text is from An essay for the understanding of St. Pauls epistles. By consulting St. Paul himself, the preface to Locke's com- mentaries on the letters. I found this reference in a discussion by Mc- Kenzie (1986).

7. One could argue that the problem was not the technology of enumerated chapters and verses - these unify the text by supporting cross-reference - but rather their effect on those who fail to perceive their merely technical character.

REFERENCES

American Psychological Association. (1983). Publication manual of the American Psychological Association (3rd ed.). Washington, DC: American Psycho- logical Association.

Association for Literary and Linguistic Computing. (1989). Reports from repre- sentatives. Literary and Linguistic Computing, 4, 51-58.

Barnard, D., Fraser, C, & Logan, G. (1988). Generalized markup for literary texts. Literary and Linguistic Computing, 3, 26-31.

Bocker, P. (1988). The integrated services digital network: Concepts, methods, systems. New York: Springer Verlag.

Brajnik, G., Guida, G., & Tasso, G. (1987). User modeling in intelligent informa- tion retrieval. Information Processing and Management, 23, 305-320.

Bush, V. (1945). As we may think. Atlantic Monthly, July, 101-108. Campbell, R., & Stern, B. (1987). ADONIS- A new approach to document de-

livery. Microcomputers for Information Management, 4, 87-107. Card, S., Moran, T., & Newell, A. (1983). The psychology of human-computer

interaction. Hillsdale, NJ: Erlbaum. Carroll, J. (Ed.). (1987). Interfacing thought: Cognitive aspects of human-

computer interaction. Cambridge, MA: MIT Press. Chiaramella, Y., & Defude, B. (1987). A prototype of an intelligent system for

information retrieval: IOTA. Information Processing and Management, 23, 285-303.

Christodoulakis, S., Theodoridou, M., Ho, F., Papa, M., & Pathria, A. (1986). Multimedia document presentation, information extraction, and document formation in MINOS: a model and a system. ACM Transactions on Office Information Systems, 4, 345-383.

Commission of the European Communities. (1987). Electronic publishing: The new way to communicate. London: Kogan Page.

Conklin, J. (1987). Hypertext: An introduction and survey. IEEE Computer, 20, 17-41.

Coombs, J., Renear, A., & DeRose, S. (1987). Markup systems and the future of scholarly text processing. Communications of the ACM, 30, 933-947.

Cooper, W. (1988). Getting beyond Boole. Information Processing and Manage- ment, 24, 243-248.

de Sola Pool, I. (1983). Technologies of freedom. Cambridge, MA: Harvard Uni- versity Press.

Eisenstein, E. (1979). The printing press as an agent of change. Communications and cultural transformations in early-modern Europe (Vols. 1-2). New York: Cambridge University Press.

Fairchild, K., Poltrock, S., & Furnas, G. (1988). SemNet: Three-dimensional graphic representations of large knowledge bases. In R. Guindon (Ed.), Cognitive science and its applications for human-computer interaction, (pp. 201-234). Hillsdale, NJ: Erlbaum.

Fischer, G., & Lemke, A. (1988). Constrained design processes: Steps toward convivial computing. In R. Guindon (Ed.), Cognitive science and its appli- cations for human-computer interaction (pp. 1-58). Hillsdale, NJ: Erlbaum.

Fox, E. (1987). Development of the CODER system: A testbed for artificial in- telligence methods in information retrieval. Information Management and Processing, 23, 341-366.

Fox, E. (1988). ACM Press database and electronic products - New services for the information age. Communications of the ACM, 31, 948-951.

Fox, E. (1989). The coming revolution in interactive digital video. Communica- tions of the ACM, 31, 794-801.

Frenkel, K. (1989). The next generation of interactive technologies. Communica- tions of the ACM, 31, 872-881.

Frisse, M. (1988). Searching for information in a hypertext medical handbook. Communications of the ACM, 31, 880-886.

Guindon, R., (Ed.). (1988). Cognitive science and its applications for human- computer interaction. Hillsdale, NJ: Erlbaum.

Gutting, R., Zicari, R., & Choy, D. (1989). An algebra for structured office doc- uments. ACM Transactions on Information Systems, 7, 123-157.

Halasz, F. (1988). Reflections on Notecards: Seven issues for the next generation of hypermedia systems. Communications of the ACM, 31, 836-852.

Horak, W. (1985). Office document architecture and office document interchange formats: Current status of international standardization. IEEE Computer, 18, 50-60.

Humphrey, S. (1989). A knowledge-based expert system for computer-assisted indexing. IEEE Expert, 4(3), 25-38.

Jardetzky, O. (1989). Reporting biological structures. Science, 246, 431. Katsch, M. (1989). The electronic media and the transformation of law. New

York: Oxford University Press. Kintsch, W., & van Dijk, T. (1978). Toward a model of text comprehension and

production. Psychological Review, 85, 363-394. Knuth, D. (1984). The TEXbook. Reading, MA: Addison- Wesley. Lambert, J. (1985). Scientific and technical journals. London: Clive Bingley. Locke, J. (1987). A paraphrase and notes on the epistles of St. Paul to the

Galatians, 1 and 2 Corinthians, Romans, Ephesians. Volume 1 of The Clar- endon edition of the works of John Locke, A. Wainwright (Ed.). Oxford: Oxford University Press.

340 VOL. 1, NO. 6, NOVEMBER 1990

This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AMAll use subject to JSTOR Terms and Conditions

Page 10: The Electronic Archive: Scientific Publishing for the 1990s

PSYCHOLOGICAL SCIENCE

William Gardner

Mace, P. (1989, September 11). High speed national net proposed. Info World, p. 3. Mamrak, S., Kaelbling, M., Nicholas, C, & Share, M. (1987). A software archi-

tecture for supporting the exchange of electronic manuscripts. Communi- cations of the ACM, 30, 408-414.

McDowell, E. (1989, October 23). Facts to fit every fancy: Custom textbooks are here. The New York Times, pp. Dl, Dll.

McGann, J. (1987). Social values and poetic acts. Cambridge, MA: Harvard University Press.

McKenzie, D. (1986). Bibliography and the sociology of texts. The 1985 Panizzi Lectures. London: The British Library.

Norman, D., & Draper, S. (Ed.). (1986). User-centered system design: New per- spectives on human-computer interaction. Hillsdale, NJ: Erlbaum.

Office of Technology Assessment. (1988). Informing the nation. Federal infor- mation dissemination in an electronic age. Washington, DC: United States Congress Office of Technology Assessment.

Panofsky, E. (1955). The life and art of Albrecht Durer. Princeton: Princeton University Press.

Rada, R., & Martin, B. (1987). Augmenting thesauri for information systems. ACM Transactions on Office Information Systems, 5, 378-392.

Raymond, D., Canas, A., Tompa, F., & Safayeni, F. (1989). Measuring the ef- fectiveness of personal database structures. International Journal of Man- Machine Studies, 31, 237-256.

Raymond, D., & Tompa, F. (1988). Hypertext and the Oxford English Dictionary. Communications of the ACM, 31, 871-879.

Rogers, S., & Hurt, C. (1989, October 18). How scholarly communication should work in the 21st century. Chronicle of Higher Education, p. A56.

Schackel, B. (1983). The BLEND system: Programme for the study of some "Electronic Journals." Journal of the American Society for Information Science, 34, 22-30.

Seiler, L. (1989). The future of the scholarly journal. Academic Computing, 4, 14-16.

Senders, J. (1977). An on-line scientific journal. The Information Scientist, 11, 3-9.

Singleton, A. (1981). The electronic journal and its relatives. Scholarly Publish- ing, 72(3), 3-18.

Smolensky, P., Fox, B., King, R., & Lewis, C. (1988). Computer-aided reasoned discourse or, how to argue with a computer. In R. Guindon (Ed.), Cognitive science and its applications for human-computer interaction (pp. 109-162). Hillsdale, NJ: Erlbaum.

Standera, O. (1987). The electronic era of publishing: An overview of concepts, technologies, and methods. New York: Elsevier.

Sun, M. (1989). Research news: Supercomputer market needs supersalesmen. Science, 243, 596-597.

Tompa, F. (1989). A data model for flexible hypertext database systems. ACM Transactions on Information Systems, 7, 85-100.

Trigg, R., & Weiser, M. (1986). TEXTNET: A network-based approach to text handling. ACM Transactions on Office Information Systems, 4, 1-23.

Turoff, M., & Hiltz, S. (1982). The electronic journal: A progress report. Journal of the American Society for Information Science, 33, 195-202.

Utting, K., & Yankelovich, N. (1989). Context and orientation in hypermedia networks. ACM Transactions on Information Systems, 7, 58-84.

van Vliet, J. (Ed.) (1989). Document manipulation and typography. Proceedings of the International Conference on Electronic Publishing, Document Ma- nipulation, and Typography, 1988. Cambridge, UK: Cambridge University Press.

Woodworth, D. (1979). Financing serials from the producer to the user. Oxford: Basil BlackwelPs.

American Psychological Society The American Psychololgical Society was founded in 1988 as an independent, multipurpose organization to advance the discipline of psychology, to preserve the scientific base of psychology, to promote public understanding of psychological science and its applications, to enhance the quality of graduate education, and to encourage the "giving away" of psychology in the public interest.

All members of the American Psychological Society receive Psychological Science and the APS Observer as part of their annual membership dues, which are $75.00 per year through 1991. For membership information and applications contact the American Psychological Society, Suite 345, 1511 K Street, NW, Washington, DC 20005. Telephone: 202-783-2077; Fax 202-783- 2083; Bitnet: APSAGK@UMUC or APS2@UMUC.

VOL. 1, NO. 6, NOVEMBER 1990 341

This content downloaded from 194.29.185.216 on Mon, 16 Jun 2014 02:34:39 AMAll use subject to JSTOR Terms and Conditions