David Chesnutt
History Department
University of South Carolina
Columbia, SC 29208
Tel: 803-777-6525 Fax: 803-777-4494
[email protected]
The World Wide Web holds great promise for teachers and researchers -- particularly those of us who have devoted most of our careers to editing and publishing historical documents. The Web can and does deliver material to classrooms and libraries in a way never before possible. Instead of the few hundred research libraries which now hold the volumes and microfilms we have prepared, we dream of a time when those works will be available in every university, college, or public library--and perhaps even high school libraries. The cultural mandate to provide the resources seems clear, but the path somewhat uncertain. To help define that path, the Model Editions Partnership (http://mep.cla.sc.edu) was organized in the spring of 1994 to develop a series of models for historical editions in the digital age.
Modern historical editing dates from the publication of Julian Boyd's first volume of The Papers of Thomas Jefferson in 1950. Although there had been earlier compilations of the papers of famous Americans, his carefully prepared texts of Jefferson's letters and other writings, "warts and all," set a new standard for accuracy and reliability. His equally careful selection of what to include and what not to include reflected the historian's thoughtful appraisal of what needed to be set before readers so they could begin to understand the essential Jefferson. And Boyd's incisive commentary provided the context needed to place Jefferson in the wider world of the American Revolution and the early national period of American history.
Boyd's greatest legacy, however, was the model he set for the generations of historians who followed in his footsteps--editing the letters and documents of a broad range of individuals who played roles, both large and small, in creating the new nation. The number of book and microfilm editions which followed in the wake of Boyd's Jefferson made available hundreds of thousands of historical documents gathered from repositories on both sides of the Atlantic and sometimes, both sides of the Pacific. As historical editors move toward the age of the digital library, however, they face the challenge of developing new kinds of editions in which to present the letters and documents which explain our past. The Model Editions Partnership is a first step toward meeting that challenge.
The partnership brought together editors from seven on-going editorial projects and leaders from the Center for Electronic Texts in the Humanities (CETH) and the Text Encoding Initiative (TEI). Major funding for the three-year project, which began July 1, 1995, is provided by the National Historical Publications and Records Commission (NHPRC) with additional support from the University of South Carolina, Rutgers University and the University of Illinois at Chicago. The partnership's goals are:
Perhaps equally important will be the documentation of the trials and errors as well as the successes of the effort.
The editors and projects in the partnership include Charlene Bangs Bickford and William diGiacomantonio, Documentary History of the First Federal Congress; John P. Kaminski and Richard Leffler, Documentary History of the Ratification of the Constitution and the Bill of Rights; G. Cullom Davis and Martha L. Benner, Lincoln Legal Papers; Dennis M. Conrad and Martha J. King, Papers of General Nathanael Greene; C. James Taylor, Peggy J. Clark, and myself, Papers of Henry Laurens; Esther Katz and Cathy Moran Hajo, Papers of Margaret Sanger; Ann D. Gordon and Tamara G. Miller, Papers of Elizabeth Cady Stanton and Susan B. Anthony. I serve as project director and as co-coordinator with Susan Hockey, the director of CETH, and C.M. Sperberg-McQueen, the editor-in-chief of the TEI. (Addresses for each of these projects have been appended.)
Historical editions are printed on paper designed to last 300 years. If fragile electronic text is to survive even five years, it must outlive the rapid changes in hardware and software. To tie an edition to a particular kind of software or hardware platform would ensure its death. The de facto standard for building platform and application-independent-text today is the architecture of the Standard Generalized Markup Language (SGML). In the humanities, the most sophisticated SGML application is the Text Encoding Initiative Guidelines for markup which were published in the spring of 1994. That the partnership was being organized at the same time was no coincidence. The TEI Guidelines provide the foundation on which the partnership will build its model editions.
Markup like the Hyper-Text Markup Language (HTML) is simply a set of conventions which control the appearance of text and provide links to other texts and digital resources. Like all SGML applications, HTML provides a set of "start tags" and "stop tags" which can be used to determine the appearance of the text. The most familiar HTML tags are probably those which control paragraphs (P), headings (H1, H2, H3, etc.), unordered and numbered lists (UL, NL); and italic and bold text (I, B). But where HTML affects display, SGML markup can be used to support access. Thus, the TEI Guidelines provide markup which enables scholars to identify any part of a text which may be of interest.
Consider a few examples. Conventionally, the names of ships, books and journals are all displayed in italics. The student of 18th century trade patterns would profit if the names of ships were clearly identified by the markup in an electronic edition of mercantile letters. A student seeking to compile a list of "works cited" would surely be grateful for an electronic edition in which the names of books and journals were marked. Each of these examples may seem relatively simple in isolation, but imagine them multiplied on a national scale, supported by a national database encompassing hundreds of editions. A project of such scope and scale is well within the possible, but well-designed markup is essential if editors and other scholars are to produce the kind of intellectual resources which will serve the public interest. To achieve that end, the partnership is in the early stages of designing a subset of the TEI Guidelines tailored for historical editions.
Although markup is essential, the first task of the editors was to develop a set of principles to govern the creation of the markup scheme for the models. The partnership agreed on five principles of design for an electronic edition.
These principles are discussed at some length in a "Prospectus for Electronic Historical Editions" posted on the Web site of the partnership (http://mep.cla.sc.edu). The basic point to be made here is that the editors wanted a set of clear statements which emphasize that scholarly, not technical, criteria should govern the development of editions in an electronic environment.
The first three principles address the editors' scholarly concerns. "Current scholarly editorial practice" encompasses the basic practice of providing reliable texts of the documents, adequate commentary to explain their historical context, and tools like good indices to provide intellectual access--all hallmarks of the Boyd tradition. Well-designed markup must accommodate those practices. "Changes in editorial practice" will certainly emerge as editors learn to take advantage of the electronic environment. Even in this early stage of the partnership, the editors have begun to realize that new forms of annotation and commentary are both possible and desirable in electronic editions. Other changes in the way in which editions are organized or made accessible to readers are also likely. "Post-publication enhancements" refer to the ability to integrate newly-found documents, to correct misreadings of an existing text, or perhaps to create a subset of the larger edition which can be used as a classroom reader. Although we cannot anticipate all of the needs of future generations of scholars, we can design our editions so that the texts are both durable and reusable.
The last two principles address more practical matters. Because we are in an age of transition in which historical editions may continue to be published in book and microfilm editions, markup which accommodates "multiple forms of publication" is important. Well-designed markup will enable projects that begin as microfilm editions or projects that begin as book editions to migrate smoothly to image editions or live-text editions on CD-ROM or the Web. "Non-proprietary standards" are essential if long-term resources are to survive in midst of rapidly changing technology. De facto proprietary standards are simply too volatile. WordStar 3.3 became a kind of de facto standard for text files in the mid-1980s. Almost any word processing software could read those files, a situation which is no longer true. For text files, the partnership will use the SGML standard inherent in the TEI Guidelines. Relevant standards for images and other digital resources have yet to be determined.
In addition to articulating principles to guide the design for editorial markup, the editors defined a typology for electronic historical editions. Editors have a well-developed shorthand for describing current editions. "Microfilm editions" contain images of documents and usually have indices and other access tools, but limited commentary. "Selected Letterpress Editions" are usually based on microfilm editions and tend to present a small sample of the documents. They include transcriptions of selected documents, extensive commentary, indices and other editorial apparatus. "Comprehensive Letterpress Editions" are generally understood to be more exhaustive and may or may not be based on previous microfilm editions. (The latter descriptions were developed in the 1950s when volumes were typeset in hot metal and printed on letterpresses.) A somewhat analogous typology is set forth in the partnership's Prospectus: Image Editions, Live-Text Editions, Combined Editions and Transitional Editions.
Image editions are envisioned as editions which present images or pictures of historical documents linked to control files and other types of scholarly apparatus. Control files usually identify each document, the date it was created, and the repository which holds the original. Other bits of information may also be included (the author and recipient of a letter, copyright information for modern materials, etc.).
Two of the partner projects, one addressing papers of Abraham Lincoln and the second the papers of Margaret Sanger, fall into this category. These projects are creating "silicon microfilms" and both editions will go far beyond current microfilm editions by providing greater supplementary information; by allowing users to define subsets of the documents to suit their particular interests; and by eliminating the necessary tedium of cranking through reel after reel of film to reach the documents.
The Lincoln Legal Papers is creating a CD-ROM edition of letters and documents relating to Lincoln's career as a lawyer before his election as president. The editors have amassed more than 250,000 photocopies from which they are creating digital images. Extensive database files are used to provide item-level control of the collection and to provide information about the individuals involved, the types of cases at issue, written summaries of the cases, and many other kinds of information. These database files will be used to retrieve documents and to provide supplementary information about the documents.
The editors of the Margaret Sanger Papers are planning an equally interesting image edition to bring together three discrete microfilm collections totaling more than 300,000 pages. The Sanger databases provide item-level access as well as links to four related research files created by the project: a chronology of Sanger's day-to-day activities, biographical sketches of prominent correspondents, copyright information regarding individual correspondents, and repository holdings for the documents. Hypertext links will also bring together letters, enclosures, and referenced documents which are now separated in the three microfilm collections.
Live-text editions will contain searchable ASCII transcriptions of documents. They are seen as being somewhat like letterpress editions, since transcriptions of documents, commentary, indices and other scholarly apparatus would form the core of this kind of edition. The editors of the Papers of Elizabeth Cady Stanton and Susan B. Anthony are in the initial phases of creating a six-volume, selected letterpress edition. Because they are using computers to prepare the texts and supplementary materials for print publication, the creation of a "live-text" edition seems imminently feasible.
Combined editions would include both images of the documents as well as live-text transcriptions of the documents. This type of edition would be well suited for a small classroom "reader" which provided students with both images and transcriptions of seminal documents like the Declaration of Independence. This is a concept which has been incorporated in some of the Library of Congress exhibits like the Walt Whitman journals and Lincoln's Gettysburg Address.
Transitional editions are seen as a way of bringing together existing letterpress volumes and subsequent volumes or supplements for which live-text exists. Editions like the Ratification project, the First Congress project, the Laurens Papers, the Greene Papers and almost every major edition began many years ago before computers were used to prepare material for the printed volume. Most of these projects also began using word processing software in the early 1980s and thus have electronic files for their more recent volumes. Because creating live-text for the early volumes does not seem financially feasible at this time, we have proposed that this type of edition combine images of the printed pages for the early volumes with live-texts for the later volumes or supplements. Printed indices for the earlier volumes would be combined with the electronic indices of later volumes to produce a comprehensive index for the entire edition. Page numbers in the comprehensive index would point either to page images or to live-text as appropriate. Our preliminary experiments indicate that this kind of edition would be both a technologically feasible and cost-effective approach to making entire editions available in an electronic form deliverable on CD-ROM or over networks.
Although we are continually working with the new concepts, the partnership's major focus in the next few months will be on defining a subset of the TEI markup for historical editions. The Guidelines provide more than 400 tagsets in the 1,300-page reference work. Though many of those are not relevant to our work because they relate to linguistics or dictionary tagging, we will have to make use of the extension features of the Guidelines to create a few additional tagsets suited to the needs of our editions. When all is said and done, we hope to have a subset of around a hundred or so tags that will meet the needs of most historical editions. But like the TEI Guidelines, our subset will retain that flexibility which allows scholars to think of interesting new ways of analyzing historical texts.
We expect to complete our "Markup Guidelines for Historical Editions" by the mid-summer of 1996 and to have encoded samples ready to begin working with CD-ROM delivery by the end of 1996 and Internet delivery six months later. Completion of the models will be an important milestone, but the models are simply a small step toward our ultimate goal: the creation of a national database for documentary editions on the Internet. Plans are already underway for a parallel project which will begin building a more extensive testbed of historical editions on the basis of our experience in the partnership. Beyond that, we have begun discussions with editors in other disciplines to explore the possibility of creating a comprehensive database which reflects the rich diversity of American culture. But the models are the first step.
hdl://cnri.dlib/november95-chesnutt