Archivists
Workbench: White Paper Robin
Chandler, Online Archive of California Bill
Landis, University of California, Irvine Bradley
Westbrook, University of California, San Diego 1
November 2001
Background Ten
to fifteen years ago the process of archival description was fairly simple. Typically,
archivists created inventories or finding aids for archival collections using
a word processor or, in some cases, a typewriter. Administrative information-such
as deeds of gift, accession records and action logs-was kept as printed forms
in collection control files. Some repositories with sufficient staff expertise
and access to an online bibliographic utility created collection-level MARC records
for their archival holdings. Beginning
around 1990 the complexity of the descriptive process increased dramatically as
archivists began to experiment with the Internet as a tool for publicizing their
collections to the research community. Archivists first utilized Gopher and WAIS
technology to deliver ASCII versions of collection finding aids but quickly migrated
to HTML-encoded finding aids once that encoding scheme was broadly introduced
in 1993. HTML served to improve the presentation of finding aids; however, its
limitations for facilitating searching and navigating online finding aids was
quickly apparent to archivists. Furthermore, it did not help to promote consistent
identification of encoded data elements within and across repositories. Dissatisfaction
with these drawbacks led to the development of an SGML DTD specifically for encoding
archival collection descriptions and facilitating their publication online. This
DTD, known as Encoded Archival Description (EAD), allows archivists to represent
the hierarchical structure inherent in archival collections in encoding and utilize
it for searching and navigating through a finding aid or groups of finding aids.
EAD also makes possible the kind of data encoding standardization that more predictable,
less idiosyncratic access systems require. The success of EAD quickly led to the
construction of union databases of EAD-encoded finding aids, of which the Online
Archive of California (OAC) was the first. Similar statewide efforts are underway
in New Mexico, Texas, Virginia, and North Carolina, along with several international
projects. An analysis of
OAC efforts thus far reveals two areas clearly in need of additional work if the
OAC is to mature satisfactorily as a user-responsive database of finding aids
and associated digital objects representing archival holdings in California repositories.
First, while many significant archival repositories in California are currently
participating, many more are not. Moreover, some repositories are not able to
participate very actively. Among the factors that help to explain this are the
difficulty of integrating encoding with description and the cost and complexity
of maintaining separate description and encoding processes. The majority of archivists
create a finding aid using a word processing application, followed by a secondary
encoding process utilizing one of several available methods: manual encoding,
use of scripts and macros, or commercial encoding tools such as Author/Editor
or XMetaL. A recent posting to the Archives listserv by the Director of the Five
College On-Line Finding Aids Access Project is indicative of the challenges:
We are
starting a three-year EAD project that involves five institutions, and will encompass
both conversion of legacy finding aids and the creation of new ones. Four of the
five institutions are currently using Word or WordPerfect to create finding aids.
Of these, two also have some collection-level description in a database format
(InMagic and Minaret) but neither of these use the database for complete finding
aids. One institution generates complete finding aids for all of its collections
from a database (Minaret). We would like to utilize the same encoding method for
new finding aids at all institutions. (Archives listserv, 25 Jan. 2001) Second,
the encoding of data is highly inconsistent, thereby impeding the functionality
of the union database. For instance, searches of scope and content notes suffer
a certain amount of imprecision since many encoded finding aids in OAC lack a
scope and content note, while other finding aids have the scope and content note
coded with a tag other than the scope and content tag. Achieving optimal performance
in a union database requires a high degree of encoding and, to a lesser extent,
content consistency. Such consistency enables the construction of navigational
interfaces and search indexes to support more sophisticated and precise use of
the data by both archivists and researchers. Lacking encoding consistency, a union
database of SGML-encoded finding aids has not much more functionality than one
created utilizing ASCII text or HTML encoding. In short, integrating multi-institutional
descriptive and encoding processes and normalizing archival data are essential
for developing the OAC efficiently and effectively, but each requires a time-consuming
and expensive effort that most individual repositories simply cannot undertake.
In the fall of 2000, the
OAC Metadata Standards Committee formulated best practice guidelines to reduce
encoding inconsistencies in newly encoded OAC finding aids. To be effective, however,
these guidelines must be incorporated into a work process that integrates description
and encoding. In recent years, several efforts have been made to reduce the learning
curve for incorporating EAD encoding into an individual repository's workflow.
The Society of American Archivists' EAD Application Guidelines and Michael Fox's
EAD Cookbook are two notable examples. Neither of these tools, though, helps effect
the integration of description and encoding into a single process.
Proposal
As
a solution to this continuing problem, we believe a national level project to
build tools for addressing these issues is needed. We envision an initial planning
meeting to strategize about these issues and the tools their solutions require.
Individuals attending the meeting would include domain experts in archival description,
information technology and administration. We anticipate this initial meeting
will lead to a series of planning meetings based on identified high level tasks.
The foremost purpose of these meetings would be to begin development of a suite
of Open Source tools to increase the efficiency of managing archival collections
and producing EAD-encoded finding aids by integrating description and encoding
and creating metadata for digital objects associated with finding aids. The meetings
would also identify funding sources to support construction and testing of a prototype.
When implemented, an archivists' workbench would result simultaneously in more
consistently encoded data in the OAC, in more sophisticated searching and navigation
of finding aids and attached digital facsimiles, and more streamlined processes
for administering archival collections. A suite of digital tools to support archives
would be indispensable in building multifunctional virtual collections that would
satisfy the interest and needs of disparate audiences. This
suite of tools--an archivists' workbench--would satisfy requirements for several
archival tasks or processes: - It
would support input and editing of information elements derived from archivists'
management of archival collections, including appraisal, accessioning, and processing.
- It would support
input and editing of all necessary forms of metadata regarding original or surrogate
digital objects associated with a collection.
-
It would facilitate manipulation and use of that data by archivists in management
of collections, both online and in printed reports.
-
It would support searching, extracting, displaying, and publishing the data for
a variety of research needs in both online encoded and print formats.
- It would promote quality
assurance of the data. · It would enable exporting of data in multiple encoding
standards, and it would be adaptable to emerging encoding standards.
The
suite of tools would likely consist of the following components:
- Databases of archival administrative
and descriptive data.
-
Web-based data inputting and editing templates / forms.
-
Specialized scripts for querying data in various ways.
-
Specialized output style sheets for a number of encoded (e.g., EAD, MOAII, TEI,
HTML) and print (e.g., printed finding aids and other printed research and access
tools) formats.
Constructing
and implementing an archivists' workbench would alleviate the two fundamental
problems mentioned above: it would increase standardization of descriptive data
elements and would allow data encoding to be done behind the scenes, as it were,
according to pre-established encoding protocols. The use of input / editing templates
would increase data consistency to a certain degree, while still allowing repositories
a reasonable amount of latitude in degree of detail used in a given description.
The tools in an archivists' workbench would streamline the descriptive process,
as archivists would be able to begin describing a collection at the point of accession,
amplifying and completing the description as the collection becomes fully processed.
In the end, after collection descriptions are completed using the tool suite,
the workbench could easily facilitate the development of more sophisticated access
mechanisms that would benefit specialized researchers. Standardization
and consistency of encoding and description will facilitate more sophisticated
uses of encoded data within the larger world of the California Digital Library
and nationally. Within the OAC testbed, encoding done with this suite of tools
would enable the creation of topical views, based on controlled access terms,
of OAC resources for which curators and other specialists might provide a contextual
overview, as well as permit end users to extract, merge, and otherwise manipulate
information resources from the OAC in a way that is more meaningful to their individual
needs. A very specific objective of this endeavor will be to enable "out of context"
searching of finding aid data and attached digital objects, which will greatly
supplement the "in context" searching now supported. "In context" searching returns
a set of finding aids that contain matches to the search query. Each finding aid
then must be searched individually for the match(es). "Out of context" searching
will return only the part of the finding aid or the digital objects that match
the query. However, these results should be presented in such a way that the researcher
can easily identify the collection and its repository to which the description
pertains and, also, that the researcher can easily jump to the part of the finding
aid from which the description or object is taken. Enabling "out of context searching"
is fundamental to establishing true, multipurpose virtual collections. The archivists'
workbench would make managing collection data and digital objects much easier
than it is currently. PROJECT
WORK SEQUENCE Phase 1:
Basic design. A project
team will convene a series of retreats with a dozen or so identified experts in
archival description and / or information technology, the goal of which will be
to determine and elaborate the data and metadata requirements for an archivists'
workbench, to develop clear output pathways for extracting descriptive data in
a number of predefined structures, and to discuss the advantages and drawbacks
of the various technological options available to realize this suite of tools.
Ultimately, the project team and the experts invited to these retreats will produce
the specifications for a prototype archivists' workbench. The specifications,
in turn, will be offered to the archival community for comment and additional
refinement. Phase 2: Prototype
development. The project
team will develop a prototype archivists' workbench based on the specifications
defined in Phase 1. The prototype, at various phases in its development, will
be tested by a representative group of participants, including the OAC as a testbed,
and their feedback will be incorporated into ongoing refinement of the prototype.
At the same time, members of the project team will begin work on formulating documentation
strategy needed for training project participants in the use of the archivists'
workbench. Phase 3: Funding
procurement Members of
the project team will develop funding request(s) to support construction and implementation
of the tool suite. Phase
4: Archivists' workbench implementation. The
project team will develop a documentary infrastructure to support implementation
of the archivists' workbench among all project participants, including the OAC
testbed, and training of staff from these repositories in use of the workbench.
This work will also include identification of costs associated with training and
continued documentation. Phase 4 will also include the definition of a procedure
within the CDL for future development of the archivists' workbench to insure that
it remains synchronized with pertinent technological developments. As happened
with EAD, the workbench will be available to other repositories and consortia
outside of CDL-OAC. It is anticipated the archivists' workbench will be tested
and implemented first at the partnering institutions. Potentially, these include
UCI and UCSD, and prospectively Cornell University, Library of Congress, Minnesota
Historical Society, University of Pittsburgh, and Yale University. PHASE
ONE DETAIL It is expected
the initial meeting will cost between $5,000.00 to $7,000.00 dollars, depending
on how many participants invited are from the mid-west or east. Funding will be
used to cover travel and per diem expenses. Phase One High Level Tasks for Discussion
(not prioritized) 1: Administration
Identifying organizational
structures and resources required to develop and implement the project. 2:
Data Modeling (Description): Identify
the descriptive elements required to be managed by the tool set, as well as the
preferred type of tools for managing them. 3:
Data Modeling (Administration): Identify
the administrative elements required to be managed by the tool set, as well as
the preferred type of tools for managing them. Administrative data elements are
key to managing collections, but they are often ancillary to the descriptive data
that constitutes the larger portion of a finding aid. Moreover, they don't necessarily
belong in a finding aid, as their value is largely administrative and not research.
Attention must be given to integrate descriptive and administrative elements in
the same tools, but it is recognized that the differences between descriptive
and administrative data, as well as the idiosyncratic ways archival collections
are managed by different repositories, may require the construction of separate
but related tool sets. 4:
System Integration: Discuss
and define the system parameters based on data modeling considerations. 5:
Prototype Development: Develop
a prototype archivists' workbench, a paper based schematic of the tool set, showing
its functions and relationships between component parts. 6:
Response to / Critique of Prototype: Selected
archivists, administrators, and information technologists critique and refine
the prototype and define the process for constructing the prototype. 7:
Education / Promotion / Community Participation Define
method for promulgating the workbench and assisting its implementation in the
general archival community. 8:
Maintenance of Archivist's Workbench Discuss
issues and develop strategies for evolving workbench to take advantage of technological
developments. |