Read Ebook: Workshop on Electronic Texts: Proceedings 9-10 June 1992 by Library Of Congress Daly James Editor
Font size:
Background color:
Text color:
Add to tbrJar First Page Next Page
Ebook has 647 lines and 73194 words, and 13 pages
Acknowledgements
Introduction
Proceedings Welcome Prosser Gifford and Carl Fleischhauer
Appendix I: Program
Appendix II: Abstracts
Acknowledgements
I would like to thank Carl Fleischhauer and Prosser Gifford for the opportunity to learn about areas of human activity unknown to me a scant ten months ago, and the David and Lucile Packard Foundation for supporting that opportunity. The help given by others is acknowledged on a separate page.
INTRODUCTION
The Workshop on Electronic Texts drew together representatives of various projects and interest groups to compare ideas, beliefs, experiences, and, in particular, methods of placing and presenting historical textual materials in computerized form. Most attendees gained much in insight and outlook from the event. But the assembly did not form a new nation, or, to put it another way, the diversity of projects and interests was too great to draw the representatives into a cohesive, action-oriented body.
Everyone attending the Workshop shared an interest in preserving and providing access to historical texts. But within this broad field the attendees represented a variety of formal, informal, figurative, and literal groups, with many individuals belonging to more than one. These groups may be defined roughly according to the following topics or activities:
This summary is arranged thematically and does not follow the actual sequence of presentations.
NOTES: In this document, the phrase electronic text is used to mean any computerized reproduction or version of a document, book, article, or manuscript , and not merely a machine- readable or machine-searchable text.
The Workshop was held at the Library of Congress on 9-10 June 1992, with funding from the David and Lucile Packard Foundation. The document that follows represents a summary of the presentations made at the Workshop and was compiled by James DALY. This introduction was written by DALY and Carl FLEISCHHAUER.
PRESERVATION AND IMAGING
In her talk on preservation, Patricia BATTIN struck an ecumenical and flexible note as she endorsed the creation and dissemination of a variety of types of digital copies. Do not be too narrow in defining what counts as a preservation element, BATTIN counseled; for the present, at least, digital copies made with preservation in mind cannot be as narrowly standardized as, say, microfilm copies with the same objective. Setting standards precipitously can inhibit creativity, but delay can result in chaos, she advised.
In part, BATTIN's position reflected the unsettled nature of image-format standards, and attendees could hear echoes of this unsettledness in the comments of various speakers. For example, Jean BARONAS reviewed the status of several formal standards moving through committees of experts; and Clifford LYNCH encouraged the use of a new guideline for transmitting document images on Internet. Testimony from participants in the National Agricultural Library's Text Digitization Program and LC's American Memory project highlighted some of the challenges to the actual creation or interchange of images, including difficulties in converting preservation microfilm to digital form. Donald WATERS reported on the progress of a master plan for a project at Yale University to convert books on microfilm to digital image sets, Project Open Book .
The Workshop offered rather less of an imaging practicum than planned, but "how-to" hints emerge at various points, for example, throughout KENNEY's presentation and in the discussion of arcana such as thresholding and dithering offered by George THOMA and FLEISCHHAUER.
NOTES: Although there is a sense in which any reproductions of historical materials preserve the human record, specialists in the field have developed particular guidelines for the creation of acceptable preservation copies.
Titles and affiliations of presenters are given at the beginning of their respective talks and in the Directory of Participants .
THE MACHINE-READABLE TEXT: MARKUP AND USE
The sections of the Workshop that dealt with machine-readable text tended to be more concerned with access and use than with preservation, at least in the narrow technical sense. Michael SPERBERG-McQUEEN made a forceful presentation on the Text Encoding Initiative's implementation of the Standard Generalized Markup Language . His ideas were echoed by Susan HOCKEY, Elli MYLONAS, and Stuart WEIBEL. While the presentations made by the TEI advocates contained no practicum, their discussion focused on the value of the finished product, what the European Community calls reusability, but what may also be termed durability. They argued that marking up--that is, coding--a text in a well-conceived way will permit it to be moved from one computer environment to another, as well as to be used by various users. Two kinds of markup were distinguished: 1) procedural markup, which describes the features of a text , and 2) descriptive markup, which describes the structure or elements of a document .
The TEI proponents emphasized the importance of texts to scholarship. They explained how heavily coded texts can underlie research, play a role in scholarly communication, and facilitate classroom teaching. SPERBERG-McQUEEN reminded listeners that a written or printed item is merely a representation of the abstraction we call a text. To concern ourselves with faithfully reproducing a printed instance of the text, SPERBERG-McQUEEN argued, is to concern ourselves with the representation of a representation . The TEI proponents' interest in images tends to focus on corollary materials for use in teaching, for example, photographs of the Acropolis to accompany a Greek text.
Three nonTEI presentations described approaches to preparing machine-readable text that are less rigorous and thus less expensive. In the case of the Papers of George Washington, Dorothy TWOHIG explained that the digital version will provide a not-quite-perfect rendering of the transcribed text--some 135,000 documents, available for research during the decades while the perfect or print version is completed. Members of the American Memory team and the staff of NAL's Text Digitization Program also outlined a middle ground concerning searchable texts. In the case of American Memory, contractors produce texts with about 99-percent accuracy that serve as "browse" or "reference" versions of written or printed originals. End users who need faithful copies or perfect renditions must refer to accompanying sets of digital facsimile images or consult copies of the originals in a nearby library or archive. American Memory staff argued that the high cost of producing 100-percent accurate copies would prevent LC from offering access to large parts of its collections.
THE MACHINE-READABLE TEXT: METHODS OF CONVERSION
Although the Workshop did not include a systematic examination of the methods for converting texts from paper into machine-readable form, nevertheless, various speakers touched upon this matter. For example, WEIBEL reported that OCLC has experimented with a merging of multiple optical character recognition systems that will reduce errors from an unacceptable rate of 5 characters out of every l,000 to an unacceptable rate of 2 characters out of every l,000.
OPTIONS FOR DISSEMINATION
The topic of dissemination proper emerged at various points during the Workshop. At the session devoted to national and international computer networks, LYNCH, Howard BESSER, Ronald LARSEN, and Edwin BROWNRIGG highlighted the virtues of Internet today and of the network that will evolve from Internet. Listeners could discern in these narratives a vision of an information democracy in which millions of citizens freely find and use what they need. LYNCH noted that a lack of standards inhibits disseminating multimedia on the network, a topic also discussed by BESSER. LARSEN addressed the issues of network scalability and modularity and commented upon the difficulty of anticipating the effects of growth in orders of magnitude. BROWNRIGG talked about the ability of packet radio to provide certain links in a network without the need for wiring. However, the presenters also called attention to the shortcomings and incongruities of present-day computer networks. For example: 1) Network use is growing dramatically, but much network traffic consists of personal communication . 2) Large bodies of information are available, but a user's ability to search across their entirety is limited. 3) There are significant resources for science and technology, but few network sources provide content in the humanities. 4) Machine-readable texts are commonplace, but the capability of the system to deal with images lags behind. A glimpse of a multimedia future for networks, however, was provided by Maria LEBRON in her overview of the Online Journal of Current Clinical Trials , and the process of scholarly publishing on-line.
The contrasting form of the CD-ROM disk was never systematically analyzed, but attendees could glean an impression from several of the show-and-tell presentations. The Perseus and American Memory examples demonstrated recently published disks, while the descriptions of the IBYCUS version of the Papers of George Washington and Chadwyck-Healey's Patrologia Latina Database told of disks to come. According to Eric CALALUCA, PLD's principal focus has been on converting Jacques-Paul Migne's definitive collection of Latin texts to machine-readable form. Although everyone could share the network advocates' enthusiasm for an on-line future, the possibility of rolling up one's sleeves for a session with a CD-ROM containing both textual materials and a powerful retrieval engine made the disk seem an appealing vessel indeed. The overall discussion suggested that the transition from CD-ROM to on-line networked access may prove far slower and more difficult than has been anticipated.
WHO ARE THE USERS AND WHAT DO THEY DO?
Although concerned with the technicalities of production, the Workshop never lost sight of the purposes and uses of electronic versions of textual materials. As noted above, those interested in imaging discussed the problematical matter of digital preservation, while the TEI proponents described how machine-readable texts can be used in research. This latter topic received thorough treatment in the paper read by Avra MICHELSON. She placed the phenomenon of electronic texts within the context of broader trends in information technology and scholarly communication.
Among other things, MICHELSON described on-line conferences that represent a vigorous and important intellectual forum for certain disciplines. Internet now carries more than 700 conferences, with about 80 percent of these devoted to topics in the social sciences and the humanities. Other scholars use on-line networks for "distance learning." Meanwhile, there has been a tremendous growth in end-user computing; professors today are less likely than their predecessors to ask the campus computer center to process their data. Electronic texts are one key to these sophisticated applications, MICHELSON reported, and more and more scholars in the humanities now work in an on-line environment. Toward the end of the Workshop, Michael LESK presented a corollary to MICHELSON's talk, reporting the results of an experiment that compared the work of one group of chemistry students using traditional printed texts and two groups using electronic sources. The experiment demonstrated that in the event one does not know what to read, one needs the electronic systems; the electronic systems hold no advantage at the moment if one knows what to read, but neither do they impose a penalty.
DALY provided an anecdotal account of the revolutionizing impact of the new technology on his previous methods of research in the field of classics. His account, by extrapolation, served to illustrate in part the arguments made by MICHELSON concerning the positive effects of the sudden and radical transformation being wrought in the ways scholars work.
Susan VECCIA and Joanne FREEMAN delineated the use of electronic materials outside the university. The most interesting aspect of their use, FREEMAN said, could be seen as a paradox: teachers in elementary and secondary schools requested access to primary source materials but, at the same time, found that "primariness" itself made these materials difficult for their students to use.
OTHER TOPICS
Marybeth PETERS reviewed copyright law in the United States and offered advice during a lively discussion of this subject. But uncertainty remains concerning the price of copyright in a digital medium, because a solution remains to be worked out concerning management and synthesis of copyrighted and out-of-copyright pieces of a database.
As moderator of the final session of the Workshop, Prosser GIFFORD directed discussion to future courses of action and the potential role of LC in advancing them. Among the recommendations that emerged were the following:
CONCLUSION
The Workshop was valuable because it brought together partisans from various groups and provided an occasion to compare goals and methods. The more committed partisans frequently communicate with others in their groups, but less often across group boundaries. The Workshop was also valuable to attendees--including those involved with American Memory--who came less committed to particular approaches or concepts. These attendees learned a great deal, and plan to select and employ elements of imaging, text-coding, and networked distribution that suit their respective projects and purposes.
Still, reality rears its ugly head: no breakthrough has been achieved. On the imaging side, one confronts a proliferation of competing data-interchange standards and a lack of consensus on the role of digital facsimiles in preservation. In the realm of machine-readable texts, one encounters a reasonably mature standard but methodological difficulties and high costs. These latter problems, of course, represent a special impediment to the desire, as it is sometimes expressed in the popular press, "to put the Library of Congress on line." In the words of one participant, there was "no solution to the economic problems--the projects that are out there are surviving, but it is going to be a lot of work to transform the information industry, and so far the investment to do that is not forthcoming" .
PROCEEDINGS
WELCOME
From AM's emphasis followed the questions with which the Workshop began: who will use these materials, and in what form will they wish to use them. But an even larger issue deserving mention, in GIFFORD's view, was the phenomenal growth in Internet connectivity. He expressed the hope that the prospect of greater interconnectedness than ever before would lead to: 1) much more cooperative and mutually supportive endeavors; 2) development of systems of shared and distributed responsibilities to avoid duplication and to ensure accuracy and preservation of unique materials; and 3) agreement on the necessary standards and development of the appropriate directories and indices to make navigation straightforward among the varied resources that are, and increasingly will be, available. In this connection, GIFFORD requested that participants reflect from the outset upon the sorts of outcomes they thought the Workshop might have. Did those present constitute a group with sufficient common interests to propose a next step or next steps, and if so, what might those be? They would return to these questions the following afternoon.
Carl FLEISCHHAUER, coordinator, American Memory, Library of Congress, emphasized that he would attempt to represent the people who perform some of the work of converting or preparing materials and that the core of the Workshop had to do with preparation and production. FLEISCHHAUER then drew a distinction between the long term, when many things would be available and connected in the ways that GIFFORD described, and the short term, in which AM not only has wrestled with the issue of what is the best course to pursue but also has faced a variety of technical challenges.
FLEISCHHAUER remarked AM's endeavors to deal with a wide range of library formats, such as motion picture collections, sound-recording collections, and pictorial collections of various sorts, especially collections of photographs. In the course of these efforts, AM kept coming back to textual materials--manuscripts or rare printed matter, bound materials, etc. Text posed the greatest conversion challenge of all. Thus, the genesis of the Workshop, which reflects the problems faced by AM. These problems include physical problems. For example, those in the library and archive business deal with collections made up of fragile and rare manuscript items, bound materials, especially the notoriously brittle bound materials of the late nineteenth century. These are precious cultural artifacts, however, as well as interesting sources of information, and LC desires to retain and conserve them. AM needs to handle things without damaging them. Guillotining a book to run it through a sheet feeder must be avoided at all costs.
Beyond physical problems, issues pertaining to quality arose. For example, the desire to provide users with a searchable text is affected by the question of acceptable level of accuracy. One hundred percent accuracy is tremendously expensive. On the other hand, the output of optical character recognition can be tremendously inaccurate. Although AM has attempted to find a middle ground, uncertainty persists as to whether or not it has discovered the right solution.
Questions of quality arose concerning images as well. FLEISCHHAUER contrasted the extremely high level of quality of the digital images in the Cornell Xerox Project with AM's efforts to provide a browse-quality or access-quality image, as opposed to an archival or preservation image. FLEISCHHAUER therefore welcomed the opportunity to compare notes.
FLEISCHHAUER observed in passing that conversations he had had about networks have begun to signal that for various forms of media a determination may be made that there is a browse-quality item, or a distribution-and-access-quality item that may coexist in some systems with a higher quality archival item that would be inconvenient to send through the network because of its size. FLEISCHHAUER referred, of course, to images more than to searchable text.
As AM considered those questions, several conceptual issues arose: ought AM occasionally to reproduce materials entirely through an image set, at other times, entirely through a text set, and in some cases, a mix? There probably would be times when the historical authenticity of an artifact would require that its image be used. An image might be desirable as a recourse for users if one could not provide 100-percent accurate text. Again, AM wondered, as a practical matter, if a distinction could be drawn between rare printed matter that might exist in multiple collections--that is, in ten or fifteen libraries. In such cases, the need for perfect reproduction would be less than for unique items. Implicit in his remarks, FLEISCHHAUER conceded, was the admission that AM has been tilting strongly towards quantity and drawing back a little from perfect quality. That is, it seemed to AM that society would be better served if more things were distributed by LC--even if they were not quite perfect--than if fewer things, perfectly represented, were distributed. This was stated as a proposition to be tested, with responses to be gathered from users.
In thinking about issues related to reproduction of materials and seeing other people engaged in parallel activities, AM deemed it useful to convene a conference. Hence, the Workshop. FLEISCHHAUER thereupon surveyed the several groups represented: 1) the world of images ; 2) the world of text and scholarship and, within this group, those concerned with language--FLEISCHHAUER confessed to finding delightful irony in the fact that some of the most advanced thinkers on computerized texts are those dealing with ancient Greek and Roman materials; 3) the network world; and 4) the general world of library science, which includes people interested in preservation and cataloging.
FLEISCHHAUER concluded his remarks with special thanks to the David and Lucile Packard Foundation for its support of the meeting, the American Memory group, the Office for Scholarly Programs, the National Demonstration Lab, and the Office of Special Events. He expressed the hope that David Woodley Packard might be able to attend, noting that Packard's work and the work of the foundation had sponsored a number of projects in the text area.
Serving as moderator, James DALY acknowledged the generosity of all the presenters for giving of their time, counsel, and patience in planning the Workshop, as well as of members of the American Memory project and other Library of Congress staff, and the David and Lucile Packard Foundation and its executive director, Colburn S. Wilbur.
Add to tbrJar First Page Next Page