|
PROSPECTING AN
ARCHIVISTS' DIGITAL TOOLKIT INTRODUCTION
Archives
and manuscript resources have been an indispensable part of the modern research
endeavor. Many groundbreaking works in numerous disciplines would not have been
executed without access to the kinds of documentary evidence present in institutional
archival collections. Initial development and application of online technologies
to archival resources in the 1990s has been a key means for promoting archival
resources to a greater number of researchers across research and national domains.
The Encoded Archival Description Document Type Definition has provided the means
for an increasing number of repositories to publicize their holdings to remote
researchers and to participate in the development of union databases of finding
aids. These databases have matured quickly to include not only finding aids but
also digital surrogates of collection materials that can be accessed with or without
the use of a finding aid. Presenting digital surrogates via the internet permits
archivists and curators to share their historical content with other kinds of
audiences, for example, K-12 classrooms, in a manner more efficient for all concerned
and without threatening or degrading the condition of the original resources.
While the initial convergence of archival information and digital technology
was and continues to be very beneficial and exciting, it also has revealed or
created problems that could be easily ignored in the earlier days of printed finding
aids and MARC collection records but that now represent significant obstacles
to the aggregation and usability of archival finding aids and resource surrogates.
For researchers to use union databases of finding aids and digital surrogates
effectively and efficiently, standards for content and structure of resource description
must be adhered to by all participating repositories. Without employing such standards,
union databases will only be able to serve gross chunks of information. This will
become intolerable and useless to researchers as the magnitude of union databases
increases and the chunks of information increases from three or five to fifty
or more finding aids to browse through. However, application of content and structure
standards requires substantial training and modification of work patterns. As
many repositories will attest, EAD encoding of finding aids has resulted in adding
another work routine in an already labor intensive processing regimen. The
development of a suite of digital tools to support archival processing work and
access would help to solve this problem to a substantial degree, although not
completely. A suite of digital tools or toolkit could be designed to force adaptation
and adherence to extant content standards. It could be constructed so that structure
standards are applied automatically in the production of outputs such as EAD encoded
finding aids and standardized digital objects (e.g., METS), thereby reducing the
need and cost of training. And it can be built to completely or nearly completely
automate some routines, thereby streamlining a repository's processing work. But
most important a toolkit designed according to the requirements suggested above
and described more fully below will lead to more compatible data streams into
union databases and to more efficient and productive use of the those union databases.
Such a toolkit will promote and support good research. BACKGROUND Sponsored
by the Digital Library Federation (DLF) and the California Digital Library (CDL),
twenty-one archivists and information technologists met in La Jolla, Calif., on
February 4-5, 2002. The purpose of the meeting, known as the Archivists' Workbench
meeting, was to discuss the concept of a workbench or suite of digital tools that
would facilitate collection and management of information about archival materials
at the various points along the life cycle of those collections. Ideally, a workbench
would facilitate integration of the disparate filing systems and databases now
used in most archival repositories for collecting and managing their archival
information, and it would enable more efficient production of various outputs,
ranging from encoded finding aids for use by end users to internal administrative
reports. Chief among the meeting's successfully met objectives was validation
of a broad need for a digital toolkit that would: - Create efficiencies
in data capture and reuse at various points in repository workflows;
- Reduce barriers to participation in consortial and institutional access
systems by making digital encoding for online access a system byproduct rather
than a complex additional segment of staff work;
- Reduce educational
requirements and training tasks by automating complex encoding procedures and
other kinds of work routines;
- Increase application of data content
and structure standards, assuring greater interoperability of end-user access
products such as encoded finding aids and digital objects; and
-
Integrate in one system, serving one or more archival repositories, archival data
typically dispersed across several databases and filing systems, digital and analog.
Participants
in the February meeting also discussed strengths and weaknesses of a variety of
technological solutions that might serve as a possible platform on which to build
this suite of tools. In addition, participants considered incomplete or unsuccessful
efforts by the archival community during the late 1980s and early 1990s to construct
a comprehensive data management utility, as well as the lessons derived from those
efforts. In light of lessons learned from previous unsuccessful attempts
to build archival information systems, meeting participants concluded it was extremely
important to focus narrowly initial design of an archivists' workbench. Earlier
attempts at the creation of such tools had failed in part because they aimed for
comprehensiveness of process and participation at the outset. Participants in
the Archivists' Workbench meeting decided it would be best focus initial construction
and application of the toolkit to a homogeneous group of repositories, smallish
archives and special collection units in which one professional is typically responsible
for most, if not all, of the archival work. This group was targeted because meeting
participants believe such repositories are lacking in staffing resources to standardize
their archival processes and contribute their descriptions and surrogates to consortial
databases and because publication of the archival materials administered by these
repositories would greatly benefit the research community. In addition, such repositories
represent a middle ground between the "lone processor" historical society
and the multi-staffed manuscripts and archives unit that exist at a few of the
nation's research libraries. Workflows would be easier to discern in those environments,
and it would be easier to build upon those results, presuming their success, to
enlarge subsequent designs to include a broader range of repositories and more
complicated workflows. Participants in the February meeting also cautioned
that this current effort to construct a suite of digital tools for archivists
not become paralyzed at the outset due to too grand a vision. They advised that
a few key archival functions be targeted. That advice has been considered thoroughly
in the aftermath of the February meeting and during the composition of this grant
request. The planning process, for which funding is being requested with this
proposal, will be devoted in large part to identifying those archival functions
that are typical and related and, hence, could and should be accommodated in a
toolkit. The objective is not to be comprehensive in the initial design but, rather,
to make sure we allow collection of related data when it can be collected relatively
easily and enable thorough use of all data collected. Another objective is to
build the toolkit with an eye toward facilitating future modifications and extensions.
In short, an accommodating design, and not a comprehensive design, is the target
of the planning sessions. The particulars of that design will be the product of
the planning sessions. The meeting concluded with the commitment of twelve
participants, known as the Archivist Workbench Core Team, to begin defining the
functional requirements and system attributes of a workbench by elaborating and
specifying the high-level requirements agreed to during the meeting and to join
together in a planning process, the objective of which is to define a paper prototype
of the archivists' workbench and secure funding for building and testing a working
prototype. DESIGN CONSIDERATIONS FOR AN ARCHIVISTS' WORKBENCH First
among the high level requirements validated at the meeting is that the tool set
needs to be informed by the life cycle of an archival collection or item as it
progresses through a repository, from first contacts with a creator or donor of
the archival materials through completion of the arrangement and description to
use of the resource by the research community. However, while it is true that
all collections or documents reflect the same basic life cycle, how that life
cycle is articulated in one repository may differ in some ways from its articulation
in another repository. Work may be sequenced one way in one repository and another
way in another repository. One repository may cluster its data differently than
does another repository. And one repository may choose not to collect information
than another repository believes indispensable. Second, every archival
function typically has two basic aspects. One aspect is the physical labor required
to perform the function, such as transferring a set of boxes to the custody of
the repository. The other aspect is the documentation or representation of the
task and its results. Archival representation is the sum of the recording of the
archival work of acquiring, processing, and servicing of archival materials. Historically,
data generated from these events has been stored in a variety of locations, some
digital (e.g., spreadsheets, databases, word processor files) and some analog
(e.g., paper collection files, rolodexes, printed finding aids). As a consequence,
the richness of this information and its myriad relationships has rarely been
utilized to its fullest potential by archivists and curators. Third, as
demonstrated during the February meeting, there are significant differences across
repositories regarding the sequence or workflow of the archival functions generating
the representations, not to mention differences in how repositories represent
each function (i.e., character and number of data elements). Meeting attendees
agreed that an archivists' workbench would need to be flexible and adaptable to
different work environments and able to accommodate different workflows. With
minimal customizing, the suite of tools should be deployable on a single desktop
in a one-person repository, or on a network serving a larger repository or even
a consortium of repositories such as the Five Colleges or participants in CDL's
Online Archive of California. Meeting participants also agreed it was important
for the toolkit to accommodate processes and workflows as established by individual
repositories, since variance in institutional missions, staffing patterns, funding,
and space are important determinants for how a repository represents and sequences
its archival work. Accommodating a range of representational practices and workflows
is complicated by the probability that not all archival repositories define their
archival functions with the same delimiters. This state of affairs necessitates
building flexibility into the toolkit that permits implementers to tailor it to
their own needs but without compromising archival standards for content and structure
that are imperative for developing broadly useful consortial access systems to
archival resources. Obviously, it is inevitable that successful design and implementation
of an archivists' workbench will require repositories to analyze their local practice
and evaluate whether or not changes to those practice would be beneficial; however,
the toolkit will enjoy even greater success if it can accommodate a wide range
of those local practices and minimize the need for conformity to the toolkit.
The strong consensus reached in the February Archivists' Workbench meeting
was that a modular design would best accommodate different work environments and
workflows; hence, a blueprint for a suite of tools or toolkit would be the desired
outcome of the planning phase of this project. Modules determined by archival
functions or predictable archival representation events allow for sequencing the
modules in a manner that best conforms to the actual workflow employed in a given
repository. In simple terms, a modular toolkit would consist of input templates
and associated program code, storage data tables, and output formats and associated
program code. The configuration of input screens would be determined by repository
workflows, and they would funnel data to the storage data tables. These storage
tables would not necessarily reflect boundaries or relationships suggested by
the input templates. When the same data is required in the representation of different
archival functions, it would be collected at the first available opportunity in
the workflow, stored in a single location in the storage tables and reused for
representation of subsequent functions. Data would be entered and stored according
to community content standards. For example, controlled access terms would be
entered and stored in accord with the principles of the LC Name Authority File,
the LC Subject Heading list and other established thesauri. Data structure and
transmission standards would be applied on export of information in one of the
defined output routines. Output products would minimally include encoded and printed
finding aids, standardized digital objects (MOA2 or METS), and cross collection
browse lists created by archivists in response to end user queries, but they could
also include provisional MARC and DC cataloging records for the collection and
selected sub-parts and a wide and diverse set of administrative reports such as
shelf lists, or periodic quantitative statements on major functions such as acquisition,
digitization, or cataloging. Effective delimitation of the modules, accompanied
by sufficient documentation, should make the suite of tools capable of being implemented
differently by different repositories, or of being modified by a single repository
through time to reflect changes in the workflow pattern due to changing staff
levels or repository goals. In addition, if modules are defined at high enough
levels of granularity, it will be possible for modules to be combined in such
a way that best reflects how archival functions are defined and represented in
a specific repository. Finally, this design approach will enable repositories
to use only those modules pertinent to their current workflow. Assuming, for example,
that the toolkit includes a digital object production module, a repository not
creating digital objects could elect not to use it at all or use it at a later
date when the repository begins to create and upload digital objects. Participants
in the February Archivists Workbench meeting clearly confirmed that the most pressing
need at present is a tool to facilitate the output of encoded finding aids to
enable online access to archival resources through repository websites and union
databases. Nonetheless, participants also agreed that while efficient production
of finding aids and other access products should be the primary rationale for
building a toolkit, it should not be sole objective for an archivists' workbench.
Consideration should also be given to how the archival information might be re-used
for other purposes already extant in archival repositories and how it could be
adapted to future needs. The toolkit we envision incorporates finding aid production
but looks well beyond it to include a greater range of functionality that could
result in significant efficiencies for archival workers across the range of archival
work and not just for finding aid encoding. For example, we envision a toolkit
that, with some adaptation, could facilitate ingestion of electronic records and
their associated metadata, as well as other kinds of born digital materials. A
service and maintenance model is the final critical feature for an archivists'
toolkit. Meeting participants concurred it would be folly to invest considerable
resources in constructing a suite of digital tools and not address how the toolkit
will be maintained and modified over time to keep current with technological developments
and changes in archival work. A good service model would satisfy several basic
requirements: - Provide training for repositories in the use of the
toolkit;
- Provide ample documentation of all component parts
of the toolkit;
- Provide assistance to toolkit users with implementing
and customizing the input templates and output formats;
- Provide
structure and procedure for updating the toolkit in a timely and appropriate manner
to keep pace with technological evolutions; and
- Provide a mechanism
for tracking all registered users so they can be easily notified of new modifications
and features.
FUNDING REQUEST The Five Colleges,
a Consortium in Western Massachusetts made up of Amherst College, Hampshire College,
Mount Holyoke College, Smith College and the University of Massachusetts at Amherst,
and the California Digital Library request funding of $40,000.00 to support 5
two-day planning meetings over the course of a year for developing the functional
requirements, system attributes, paper prototype, and business / service model
for a digital toolkit that would embody the objectives agreed to in the February
Archivists' Workbench meeting in La Jolla. The meetings will lead to the development
of a paper prototype for the toolkit and a grant request for construction and
trial implementation of a working prototype. Team members A core
group of 12 persons will participate in each of the projected 5 two-day meetings.
The exact composition of meeting attendees will be adjusted where necessary to
bolster the content and fulfill the objectives of the particular meetings. For
the planning phase, a core team of 12 persons will be composed of Five Colleges
and University of California personnel and other participants from the original
Archivists' Workbench meeting who have volunteered their contributions to this
project. The Five College Archivists Group (Daria D'Arienzo, Amherst College;
Susan Dayall, Hampshire College; Peter Carini, Mount Holyoke College; Nanci Young,
Smith College and a staff member from the University of Massachusetts) led by
Peter Carini and Kelcy Shepard will represent the Five Colleges. Robin Chandler,
Bill Landis, and Brad Westbrook will represent the University of California. Other
members will be Mary Lacy (LC), Merrilee Proffitt (RLG), Chris Prom (Univ. of
Illinois), Clayton Redding (American Institute of Physics), David Ruddy (Cornell
Univ.), Elizabeth Shaw (Univ. of Pittsburgh), and Elizabeth Yakel (Univ. of Michigan).
Archivists and curators from the Five Colleges will also participate in the meetings,
contributing substantial information to the first few meetings. It is also expected
that other domain experts may be needed for specific aspects of the planning phase;
for example detailing storage and platform options. It is not expected that a
facilitator will be required during this planning phase of the project. The
core team will be broken in to sub-teams, which will be assigned tasks for the
entire planning process or particular meetings. Sub-teams are identified in the
description of the meetings below. Projected Meetings: Data Modeling
(2 Meetings) On the basis of examining work flows and case scenarios for
several archival repositories conducted prior to this meeting, participants will
identify a range of archival functions and the data elements used to represent
them. Attention will then turn to defining the input templates. This will require
specifying which data elements need to be governed by community standards and
best practice guidelines. While primary emphasis will be placed on making
sure the templates enable adequate representation of each function, consideration
will also be given to customizability and usability of the input forms. Usability,
and methods for testing it, will be high priorities throughout the entire project.
Sub-team: Peter Carini, Chris Prom, Kelcy Shepherd, and Beth Yakel will
assume responsibility for workflow descriptions and data specifications from a
number of archival repositories. They will analyze the information they obtain
and present a list of data elements used by surveyed repositories and a descriptive
analysis of the variance among workflows. This data will be used in the meeting
to determine the number and range of data elements required for the toolkit and
basic kinds of workflows that need to be encompassed. This understanding can then
lead to productive design of input templates and specification of input or data
entry rules. The sub-team will draw substantially but not exclusively on
input from Five Colleges archivists and curators. Output Products (1 meeting) The
chief concern of this meeting will be defining a variety of output products that
will be available in the prototype archivists' toolkit and assessing the products
of the previous two meeting to insure that adequate data has been captured and
stored to support the output of these specific products. Attention will also be
devoted to the need for support of some degree of customization in these output
routines, for example, layouts of printed finding aids and administrative reports.
Sub-team Bill Landis, Merrilee Proffitt, David Ruddy, and Brad Westbrook
will identify the outputs enabled by the data elements legitimized at the conclusion
of the first meeting. They will present versions of these outputs, with recommended
formatting, to the second meeting for modification. Storage Architecture
(1 meeting) Once the data elements are identified, the input templates developed,
and the desired outputs specified, attention will be given to the architecture
for storing the data to enable variable outputs. Efforts will be made to identify
repeating data elements that need be stored only once and other data elements
useful for linking data subsets. Prior to the meeting participants will investigate
various technical options for the storage architecture for the toolkit and during
the meeting will discuss the strengths and weaknesses of each, deriving specifications
that will be used during the development of the working prototype. Sub-team
Clayton Redding and Liz Shaw will present models for how the data is to be mapped
from the input templates to storage "containers". This analysis may
require the presence of another domain expert. Platform and Service Considerations
(1 meeting) Participants will come to the last meeting in this sequence
of meetings prepared to discuss the advantages and disadvantages of various software
environments and platforms that might be used for the toolkit. Arguments will
be weighed for it being a relational database or an object oriented database,
for it being an SQL or XML database, and for the software being proprietary or
open source. Sub-team Clayton Redding and Liz Shaw will present options
for software environment and platform for the toolkit. Again, this component of
the planning phase may require the contribution(s) of additional domain experts.
During the second day of the meeting, participants will discuss service
requirements and models for the archivists' toolkit. In addition, an inventory
of documentation needs for the toolkit will composed, and a strategy for evaluating
the utility of the toolkit will be developed. Finally, an attempt will be made
to identify institutions interested in sole or joint governance of the toolkit.
Sub-team Peter Carini, Robin Chandler, Mary Lacy, and Merrilee Proffitt
will present varying models for governing and sustaining utility of the toolkit.
As part of their presentation, sub-team members will attempt to gain some sense
of institutional support for each service model. Meeting attendees will evaluate
the different models and rank them according to projected success. Overall
Process, Budget, and Outcomes It is expected the 5 two day meetings will
be conducted over a 12 month period beginning shortly after funding is granted.
Typically, meetings will be held at either a Five Colleges site or a University
of California site (Oakland or San Diego), but meetings may be held at other sites
if it is cost effective and convenient to do so. Meetings will be separated
by adequate time to allow for resolution of the issues raised and objectives targeted
in the previous meeting and preparation for the upcoming meeting. Each
meeting will be substantially documented. Sub-team members Robin Chandler, Kelcy
Shepherd, Brad Westbrook, and Beth Yakel will be responsible for collecting and
creating documentation and for synthesizing into a subsequent grant proposal to
support construction of a working prototype. The documentation sub-team will record
meeting contents and deliver them to meeting participants well in advance of the
next scheduled meeting. Expectations are that each meeting will cost approximately
$8,000.00. Each meeting includes a cushion of $400.00 (a total of $1,600.00 to
$2,000.00 for the 5 meetings) to invite additional domain experts to particular
meetings. It is expected that additional domain experts will only be required
in two of the meetings at most. Meetings will be held in either the Five Colleges
Area, San Diego, or Oakland, thereby alleviating the need for two airfares for
each meeting. Meeting space and technology will be donated by the Five Colleges,
the CDL, or UCSD, depending on where the meeting is held.
$4,000.00 Ten airfares at average cost of $400.00 per airfare $1,800.00
Six rooms for two days at $150.00 per day (assumption is team members will
share rooms) $1,800.00 Meal per diems of $75.00 for each per team member
(12) for two days $400.00 Extra guest support _________ $8,000.00
Total cost per meeting The projected total
cost for four meetings is $32.000.00, or $40,000.00 for five meetings if a fifth
meeting proves necessary. The planning meetings will be extensively documented
and result in several discernible sets of data: - Archival data
elements, rules for their entry in templates, and template mock ups
- Data
storage model, depicting relationships among data elements
- Output
products and the rules for formatting them
- An informed decision
for the best platform and software environment
- One or more detailed
service models (e.g., centralized vs. distributed) for how the toolkit is to be
maintained and sustained over time and an assessment of what kinds of education
and training mechanisms will be useful
At the conclusion of the
five meetings, team members will use online and telephone interaction to refine
these data sets and then knit them together to form a paper prototype of the toolkit.
This prototype will serve as the basis for a subsequent and more substantial funding
request to support development and trial implementation of a working prototype
and a more detailed service model. After the successful recruitment of programmers,
a working prototype will be designed, accompanied with an effective user interface,
and then deployed for trial implementation. It is expected that implementations
will be tested at the Five Colleges, selected repositories participating in the
OAC, and other repositories represented by the team members. During the implementation
trial, the results obtained during the planning phase will be iteratively tested
and, where necessary, modified to meet real life practices. Programmers and toolkit
implementers will work closely and quickly to optimize the tool's performance.
These trials will be accompanied by on-going evaluation and iterative design modification.
Evaluation techniques will include usability analyses, on-site observations, surveys,
and interviews with archivists at participating institutions. __________ Note:
A single three day meeting, to take place in western Mass., was funded by DLF.
(6/21/02) |