GUIDELINES FOR - CONTENTdm

113
GUIDELINES FOR DIGITIZATION EDITED BY KATHERINE M. WISSER 2007 REVISED EDITION North Carolina ECHO (Exploring Cultural Heritage Online) is supported with federal Library Services and Technology Act (LSTA) funds made possible through a grant from the Institute of Museum and Library Services (IMLS) administered by the State Library of North Carolina, a division of the Department of Cultural Resources.

Transcript of GUIDELINES FOR - CONTENTdm

GUIDELINES FOR

DIGITIZATION

EDITED BY KATHERINE M. WISSER

2007 REVISED EDITION

North Carolina ECHO (Exploring Cultural Heritage Online) is supported with federal Library Services and Technology Act (LSTA) funds made possible through a grant from the Institute of Museum and

Library Services (IMLS) administered by the State Library of North Carolina, a division of the Department of Cultural Resources.

TABLE OF CONTENTS

Introduction p. 1 Chapter 1 Project Planning p. 9 Chapter 2 Selection p. 17 Chapter 3 Legal Considerations p. 21 Chapter 4 Digital Production p. 31 Chapter 5 Metadata p. 53 Chapter 6 Digital Preservation p. 67 Chapter 7 Presenting Your Digital Project p. 77 Chapter 8 Targeting the K – 12 Audience p. 85 Chapter 9 Project Evaluation p. 93 Chapter 10 Project Management p. 99 Resources p. 109 Glossary p. 115

INTRODUCTION

All of North Carolina's cultural institutions work together to make the state's unique cultural and historical resources accessible for the education and enjoyment of people of all ages in the state, nation, and the world. Vision Statement of the NC ECHO Advisory Committee

North Carolina has many treasures -- letters written by slaves, tape-recorded reminiscences of old veterans, quilts pieced together from family scraps, photographs of main streets long vanished, paintings by old masters, and diaries of young dreamers. Over the years, the state's libraries, museums, archives, and historical and genealogical societies have diligently collected, preserved, and made accessible special materials such as these to educate, entertain, and enlighten their communities. And among the stacks, storage rooms, and locked cases rests the raw material that will undoubtedly inspire future letters or books, tape recordings, handicrafts, photographs, artwork, and dreams. In 1998, the State Library Commission made the task of making these treasures more accessible to the people of North Carolina one of its priorities. In essence, they sought to bring these treasures out of the stacks, opening the storage rooms, and unlocking the cases. The following year, the Commission appointed an Access to Special Collections Working Group (ASCWG) to begin the planning process by which this greater access would occur. ASCWG determined that the State's cultural institutions could best achieve this greater access by making use of digital technologies and the World Wide Web. Seeking input from their colleagues, ASCWG held a Special Collections Leadership Conference, in High Point, North Carolina, in March of 2000. Approximately, one hundred twenty representatives of the State's libraries, archives, museums, and historical and genealogical societies gathered to review the progress of major digitization programs from around the country and to make recommendations for a North Carolina project. These representatives reached a consensus for statewide action that included the call for:

• Long-term vision for the project, as well as a process by which to execute it; • Searchable Web Portal to present existing digital materials and their finding aids

created and maintained by North Carolina institutions; • Set of standards for digitization and access of digital materials; • Statewide survey of cultural repositories to identify needs, priorities, and

opportunities; • and North Carolina's Department of Cultural Resources to take the leadership role in

the project. This statewide action has been addressed in a variety ways. A survey was created and all 100 North Carolina counties were all visited by December 2005 (http://www.ncecho.org/travelog/travelog.asp). On-going surveying and a final report are currently underway. Continuing education programs have been developed and are held on an on-going basis for cultural institutions (http://www.ncecho.org/conted/continuing_education_template.asp). Other initiatives include metadata standards, search and retrieval development, and K-12 initiatives involving partners with other statewide programs. In accordance with these initiatives,

Introduction -- 2 --

these guidelines provide an overview of some major issues involved in digitization, some of the best practices being followed in the field, and "a set of standards for digitization and access of digital materials" for those institutions wishing to participate in the statewide program. Drawing upon information provided by leaders in the field of digitization, these guidelines will ask and attempt to answer some of the questions institutions involved in digitization will want to ask themselves or others. Knowing what questions to ask and when to ask them is a key to any successful effort. Answers to these questions can come from within the institution or organization, or from a variety of other sources. This guide is just one of the sources to consult when addressing key digitization issues. In addition, the guidelines do not represent a static document. As technology development creates new techniques, processes and concerns, the NC ECHO Guidelines for Digitization are revised to represent timely advice. If more extensive information is needed, there are links to national and international digitization literature at the end of each chapter. A Resources section at the end of the guidelines groups all documents and links in one place for easy access. Are You Ready to Digitize Your Collections? Cultural institutions face many challenges. They collect, preserve, and make their special materials available to the public, often with limited resources. Digitization appears to be yet another major project. Yet the long-term promise of digitization is compelling to even the most "challenged" institutions. This new project, however, involves more than simply sitting a scanner in the stacks and keeping it humming. Traditional practices form the superstructure of any digital project: basic preservation techniques, good descriptive cataloging, and standard arrangement and description must be performed before the first digital image is created. Digitization is not a "replacement activity," but rather an addition to traditional cultural repository techniques and procedures, and in many cases serves as an enhancement of them. Digitization does not necessarily mean starting from scratch; in many cases, it involves building upon work performed years and years ago. A review of digitization initiatives seems to suggest that successful projects

• have support among institution administrators and boards, • begin with an inventory and assessment of holdings (whether informal or formal)

and their extant information management tools (finding aids, indexes, registration records, etc.),

• find allies among potential collaborators, • understand and follow standards and best practices being used by other institutions, • draft a plan that outlines work flow, staffing, a schedule of activities, and a budget,

and, • start with a project that is "do-able," and celebrate early successes.

Before warming up the scanner or setting the digital camera tripod, institutions may wish to ask themselves some questions: Why are we considering digitization?

Is it to answer the frequent queries of a group of users who send email everyday, asking, "What have you scanned that I can see from home?" Is it to meet the

Introduction -- 3 --

expectations of board members or to have an answer the next time an administrator asks, "Have we thought about putting that on the Web?" Is it for publicity and public relations? Is it to help preserve fragile materials by reducing their handling? Is it to provide the greatest possible access to the treasures your constituents have entrusted to you for safekeeping?

Does our institution hold materials worth all of this effort?

Do you hold unique materials or are they the same items held by many other regional collections? Would your holdings be of interest to users searching from home?

Do we have administrative support for the project?

Does the boss know what you are up to, and does he or she think it's a good idea? Is the board aware of the resources required? Are the different divisions, branches, or sections of the organization willing to collaborate to make this digitization project successful?

Do we have the financial and technical support available to sustain a digitization project?

Where is the extra money going to come from? Who is capable of fixing the inevitable glitches that accompany any project requiring wires, plugs, and a bit of electricity? Who do you have on staff that can "mark-up" the online collection so that it can "speak" to search engines? Can you find appropriate training opportunities, and, once again, where is the extra money going to come from?

Do we have the copyright to the materials we wish to digitize?

Do you hold permission letters, or use agreements? Can you get them? Is there any way to determine who holds copyright on your materials?

Are we willing to commit our institution to the long-term maintenance of our online creation?

Who will answer the reference questions generated by the online collection? Have you factored in the migration of the digital exhibit into hardware upgrades? Have you assigned space and made arrangements for the preservation of the backups and archived digital materials created by the project?

If you know why you are going to digitize and are content with those reasons, if you hold materials worth the effort and have the support of administrators, if you have the financial and technical underpinnings, and if you are committed to sustaining the effort for the long haul, then you may be ready to begin the digitization process--but creating digital images is still a few steps away. Collaboration Cultural institutions have long known the value of collaboration: interlibrary loan, traveling exhibitions, joint conservation centers, and consortial disaster plans demonstrate the willingness and advantages of collaboration. The advent of new technologies has extended

Introduction -- 4 --

and reinforced many extant collaborative undertakings of cultural institutions. By their very nature, digital projects not only benefit from collaboration through the sharing of resources and expertise but also lend themselves easily to collaborative undertakings. Digitization reduces the distance between repositories to a keystroke, eliminates barriers erected between different types of research materials, reunites separated collections. Online collaboration offers great promise for the users of cultural materials. Through this collaboration, museum and library collections can be consulted simultaneously. For example, items at the Outer Banks History Center and the Mountain Heritage Center, with an entire breadth of the state physically separating them, share the promise of easily being consulted from one place, the researcher's home. The Internet can link collections never before brought together and virtually reunites holdings that may have been separated, all to the user's benefit. Equally important, perhaps, is the potential interaction of the state's many excellent small, often volunteer-run collections with the state's major repositories. As a whole, the smaller institutions constitute the largest holders of cultural information in the state of North Carolina. Current surveying results estimate that there are over 900 individual repositories across North Carolina's 100 counties. Many of these small to mid-sized institutions have collections that are at risk for a variety of reasons, particularly preservation and conservation concerns. In addition, many collections have limited or no public access, making them essentially hidden to the public at large. Digitization can dramatically change the visibility and accessibility of these collections. It holds the promise of greatly expanding the state's collective cultural knowledge. These smaller institutions hold the history of local and regional North Carolina, and it is often within the local and regional collections that schools look to build educational units for their curriculum and to stimulate young students to study history, anthropology, science, literature, and a myriad other subjects. They do it by first looking, quite literally, in their own back yards. Many schools across the state search for sources for local history and find the process frustrating. Digitization initiatives offer solutions to these problems. The larger institutions within the state, many of which are nationally and internationally recognized, often have greater technological, fiscal and staffing resources. Many have begun digitization or are well into the process. When smaller institutions begin to plan for digitizing their collections, they may want to collaborate with their colleagues at the larger institutions. Larger institutions may, in turn, wish to reach out to smaller, local institutions in order to expand their intellectual base. Often holdings at one institution can be linked to holdings at another, or institutions can share in the development of a digital project built around a particular concept. Small institutions can learn from the larger institution's practices and successes, while contributing valuable insight on content and organization as well as a reality check for technical experimentation. Standards and Best Practices Digitization involves a myriad of standards and best practices that inform digital production and access. While standards and best practices are not new to cultural heritage institutions, the nature of digitization makes adherence an imperative. Making decisions about which standards to follow and which practices are really best can be a daunting and overwhelming task. In addition, digital standards are more fluid than traditional standards. Often, they

Introduction -- 5 --

must be reshaped to quickly assimilate new information services. This fluidity has indeed created a shift in the way we understand the term "standard." Yet, fluid or static, standards make it easier for everyone to use information. Technical standards are generally developed by a process of voluntary consensus. As those elements of a digital project (the technology, the software, hardware, cataloging standards, etc.) are most often in a continual state of flux and are likely to remain so, consensus can be difficult to reach. Unlike manual practices which have been standardized for years in libraries, the manual practices of museums and archives often relied on idiosyncratic and local processes. As a result, technical standards in archives and especially in museums have been notoriously difficult to establish and have lagged behind the standards of uniform practice adopted in libraries. Further complicating the picture is the fact that the "pioneer nature" of the digital world encouraged non-standardized practices to proliferate, resulting in local collections exercising their creativity. Today the shifting landscape of digital practice, where revision and conversion dominate the horizon, where practices appear and disappear, and where everyone recommends their favorite solution, can create great confusion for cultural heritage professionals considering digitization. Many library and museum managers have elected to simply wait out the confusion, expecting the dust to eventually settle. They wait for a definitive manual of procedures that will miraculously support their home-grown practices. Others forge ahead informed by a plan of action gathered from their understanding of current practice, exposure to the literature of digitization, and a strong support network. If institutions are to reap the benefits of digitization and compete with the rapid commercialization of cultural collections, it is clear that the latter is the preferred course. While standards are constantly under revision and can restrict creativity and innovation, there also exists what are referred to as "best practices." Best practices (often the germinator of standards) allow managers to pick and choose among the many practices in use today and to evaluate how those practices might be a good fit within the context of their institutions. Today, best practices guide most of the processes of developing digital projects. Accompanying those best practices are the advances in technology over the last few years. A range of new hardware and software options have greatly simplified digitization efforts. Technology is now more affordable for institutions with limited budgets and does not require extensive technical training to implement and maintain. New technologies and well-developed best practices have enabled many collection managers in smaller institutions to pursue digital solutions to deeper indexing, to access of images, to preservation of fragile collections, and to overall improved retrieval of materials. Conclusion Digitization carries great promise for the caretakers of the cultural heritage of North Carolina and for the many people who are interested in its treasures. Running the scanner, snapping the digital camera, or banging the keyboard, however, needs to follow an institutional assessment of support and resources and a careful planning process. If this process leads to a digital project, then that digitization will build upon the conventional practices that provide the foundation for our institutions. Digitization does not replace this; it works in conjunction with it. While digitization enhances conventional access and preservation practices, it carries with it the promise of greater interaction between

Introduction -- 6 --

collections. This is true only if participating institutions collaborate early to realize this future goal. Best practices and standards are perhaps the best tools to ensure that the collections of separate institutions "speak to each other" in the virtual world, bringing greater value to their users. It is with the users in mind that the North Carolina State Library Commission, its Access to Special Collections Working Group (now the NC ECHO Advisory Committee), and its many partners throughout the State of North Carolina began their work in 2000. Scholars and students, hobbyists and businessmen may consult that long ago letter or book, that tape recording or handicraft, that forgotten photograph or major work of art, never knowing the first thing about metadata or standards, resolutions or work flow charts. They only know that they've found what they were looking for and that they are happier and better informed for it. These guidelines are structured to help you successfully initiate digital projects. Chapters 1 and 2 (Project Planning and Selection) provide direction on the essential initial stages of a digitization project. Chapter 3 (Legal Considerations) uncovers some of the major components of United State copyright law as it pertains to digitization, including suggestions and recommendations for activities to undertake to ensure compliance with the law. Chapter 4 (Digital Production) outlines the details for creating digital surrogates of your materials. Included are decision-making matrices for hardware and software as well as the technical standards endorsed by NC ECHO. A new section of that chapter deals with the challenges of audio digitization. Chapter 5 (Metadata) introduces the concept of metadata and outlines standards in constructing appropriate and adequate metadata to accompany your digital images to assure access to your materials in the online environment. Chapter 6 (Digital Preservation) introduces issues about sustainability and long-term persistence of your digital project, including recommendations about storage practices and mediums. Chapter 7 (Presenting your Digital Project) covers issues of web design and accessibility to ensure that the hard work of digital production and metadata is not lost in poor presentation on the Internet. Chapter 8 (Targeting the K-12 Audience) deals with the specific issues that are important in creating digital projects that will be primarily used by as an educational resource. Finally, Chapter 9 (Project Evaluation) discusses the components of evaluation to take into consideration throughout the life of a digital project. Finally, Chapter 10 (Project Management) discusses the very real impact of digital projects on institutions from a management perspective. This chapter includes sections on workflow and staffing, training, timelines and objectives, physical facilities, and disaster preparedness. Taken as a whole, the Guidelines for Digitization attempts to offer insight into the many areas that emerge as a digital project is underway. Careful planning and management are essential, but without a clear understanding of the various elements of a digital project, institutions are assured to hit many potholes along the way. These guidelines seek to help you avoid the most common ones.

Introduction -- 7 --

For Further Reading A Framework of Guidance for Building Good Digital Collections, Digital Library Forum, Institute of Museum and Library Services, available at: http://www.niso.org/framework/Framework2.html Kenney, Anne R. and Oya Y. Rieger. Moving Theory into Practice: Digital Imaging for Libraries and Archives. Mountain View, CA: Research Libraries Group, 2000. Moving Theory into practice: A Digital Imaging Tutorial, Cornell University Library, available at: http://www.library.cornell.edu/preservation/tutorial/contents.html Smith, Abby, “Why Digitize?” Washington D.C.: Council on Library and Information Resources, 1999. available at: http://www.clir.org/pubs/reports/pub80-smith/pub80.html

CHAPTER 1 PROJECT PLANNING

These guidelines provide a detailed examination of the many aspects of creating and maintaining a digital project. Many of the components of a digital project need to be discussed before the scanner is turned on, the tripod is set up, a single element of metadata is written, or a reference question is answered. Project planning forms the core of a digital project because it addresses each aspect of digitization and its impact on your institution. This chapter will provide an overview of the issues that need to be considered in project planning, with specific references to more detailed information provided in other sections of the Guidelines for Digitization. Before the selection process, the purchasing of hardware and software, and the assignment of staff, planners of digital projects must do just that -- plan! And planning for digitization projects involve an assessment of several factors as well as much foresight as can be gotten. Know your strengths. Know your weaknesses. Determine where your opportunities lie and how you can best take advantage of them, adapting where you can to meet the project's challenges. Know thy own self, but also know thy users, their expectations, and their needs. There are several points to consider that will provide more assurance of success:

• Understanding the institution’s goals and missions and where the digital project fits into those goals and missions

• Envisioning exactly what the digital project is envisioned to be, what are the component parts, are there areas that can be developed now versus things developed later?

• Assessing an institution’s existing resources against those that need to be acquired • Establishing the standards that will be adhered to in conducting the digitization

project • Beginning the documentation process to assure that decisions are well-

communicated • Planning the implementation of the project, including milestones and a timetable • Monitoring, evaluating, and providing in-project direction

This chapter will help you in this process by discussing these seven general areas that should be addressed in a digitization project's planning process: goals, objective and scope of project; definition of intended audience; analysis of collection materials; needs analysis; cost assessment and impact on institution; development of standards and processes; and project evaluation. Goals, objectives, and scope of the project You have to know where you are going in order to get there. An explicit statement of a project's goals ensures that all personnel share the same "destination" and can measure their accomplishments. Determining the scope of a project gives it focus and guarantees greater impact and more efficient use of resources. The first part of this process is to develop a keen understanding of the overall goals and mission of your institution. Digital projects divert a great deal of resources and require a lot of attention. If a digital project

Chapter 1 -- Project Planning - 10 -

will not help your institution to meet with those goals and objectives, then it may not be the right choice at this time. Once you have determined that the digital project is going to be part of the overall institutional goals, you want to determine the individual project’s goals and objectives. Your project may have a specific, single goal, such as the development of a Web site to support a special exhibit or event, or the content presented in the digital project may be used for a variety of purposes and applications. The latter, "use-neutral," approach assumes that future use of the material will be varied. Use-neutral digitization aims for longevity, high quality, and as many uses as imaginable. For example, a use-neutral digitization project would include the creation of Web sites to support the interests of scholars, hobbyists, and schoolchildren. It also would provide digital, master duplicates for future Web projects or traditional publishing. While the use-neutral approach to digitization can be more costly at first, requiring more planning, more training, and more storage space, it may be the most cost-effective strategy in the long term. It is certainly the best approach for the preservation of sensitive originals (in that the originals do not have to be re-handled or re-scanned for each new, specific digital creation). Use-neutral digitization is one aspect of a "scan once methodology” discussed in the chapter on Digital Production. These goals and objectives will allow you determine the scope of this particular project. Even if you determine to establish a use-neutral approach to your digitization project, you should not try to do everything at first. Determining the scope of your project will allow you to undertake an achievable digital project that will serve as a foundation for a digital program (for more on transitioning from digital projects to digital programs, see Final Thoughts). Determining the Audience Part of the goals and objectives and scope of the project is to identify the dimensions of your intended audience is. In the use-neutral project described above, it is clear that even a small digital project can be geared toward several different audiences. Outlining the intended audience(s) will provide elementary information in the planning of the project as it impacts both the selection of what to digitize (see Selection) and design of the online presence of your digital project (see Presenting your Digital Project). Identifying potential users will help to define your digitization strategies. This is not something that should be determined in isolation. The project leader should gather information and feedback from various members of an organization, patrons and experts will help you to identify and make decisions regarding a digital project. It is also clear that this audience determination will be guided by your institution’s goals and objectives. It may be that the decision to move toward digitization is an attempt to expand your user-base beyond your traditional patrons. Or it could be to better serve the existing patron-base. All of this should be clear in both your institution’s mission and the project’s goals and objectives. In thinking about the audience, it is important to think not only of intended users of the digital project, but to make note of potential ones. One of the most important lessons learned early on by digital project producers is that while you may intend to reach a specific audience, that by placing digital material on the Web, there are a number of unpredicted users as well. The impact these unpredicted users may have on an institution can be lessened by giving some initial thought to who those users might be and how they will be handled as they begin to make contact with the institution.

Chapter 1 -- Project Planning - 11 -

Analysis of collection materials Planners will want to survey their holdings to determine which of their collections will best meet the goals they have established. This initial survey may be made with the help of questions such as the following:

• Will we choose documents, photos, slides, negatives, objects, or oversized materials to digitize?

• Will the materials be a mixture of formats (such as manuscripts, maps, photographs, etc.)?

• How much material will need to be captured to a digital format? • What is the condition of the material? • Will items require special treatment or handling? • Will the material be digitized from the original or from a surrogate (e.g., a

photograph of the object or photocopy of the fragile manuscript)? • Does the material have a physical relationship to something (e.g., to a mount, an

album page, or a pedestal)? • How much time will be involved in physical preparation of the material to be

digitized? Combining both the project goals and this analysis will provide an assessment of the selection process that is discussed in more detail in the chapter on Selection. It should also be noted that this analysis will affect decisions of hardware and software purchasing and other associated costs in preparing the materials for digitization. Needs Analysis Once these overview aspects of the digitization project have been established, and it is clear that a digital project will meet your needs and an audience has been determined, the next step in planning a digital project is to take stock of your environment and resources to assess needs. Typically, this kind of analysis achieves several goals. These include but are not limited to:

• determining funding sources, • assessing staffing required, and • examining the extent and type of technical support needed.

To conduct this analysis, it is helpful to ask specific questions such as: Equipment:

• Do you have the hardware to digitize? • Do you have the software to digitize? • Do you have adequate storage for master digital images? • Do you have the software and hardware to provide access to the digitized collections

and documents? • Will your equipment provide the speed of access needed for large files? • Will you be able to upgrade equipment as newer technologies come online?

Chapter 1 -- Project Planning - 12 -

Materials documentation and conventional practices:

• Do you have sound materials documentation, or will you need to substantially re-work your collection data?

• Do you have appropriate metadata for the collection or can it be derived quickly from previous work on the collection? (i.e., do you have document identification, acquisition records, provenance information, indexing?)

• Do the formats for the digital capture, storage, preservation, metadata, and access meet institutional, state, national, and international standards?

Administration and staffing:

• Have you considered the scale of the project and how it will affect routine work flow? • Does the cost of the digitization project fit within the planned budget? Is the project

worth the cost? Will additional funds be needed to complete the project? • Do you have enough time to complete the project? • Do you have sufficiently skilled staff (including those who understand the technical

needs of digitization) to effectively complete the project? • Do you have the means to train staff and keep their training current?

Audience and patrons:

• Will the digitized materials meet your audience's needs? • Will the impact that increased access to some materials have an effect on your

institution’s public services? How will you handle increased interest? By conducting this kind of analysis, you will be able to refine the project’s goals and objectives. These questions also allow you to avoid some of the common pitfalls to embarking on a digitization project. Cost analysis and impact on institution Digitization projects are exciting to undertake. Often times it represents a change of pace from the day-to-day work that you do, and there is a lot of room for creativity! However, it is important to understand up front what a digitization project “costs” and what the impact on your institution will be. Below is a table that covers the different kinds of “costs” that exist with digitization projects. Many of these expenses will be things that you already have, as discovered in your needs analysis. However, remember that resources that are allocated to a digital project are still an expense because they are diverted from their current work. This is especially true with staff time, which is often overlooked as a “cost” for a digital project.

Chapter 1 -- Project Planning - 13 -

Category Cost specifics Comments/Options

Hardware Digital capture equipment, computer, and storage

Will you use existing or purchase new?

Software Digital capture, image manipulation, design, and access

Will you use existing or purchase new?

Staff wages Project management, selection, preservation and conservation, digital capture and image processing, metadata creation, web design, quality control, and evaluation

Staff wages should include not only those new staff that are hired but an estimation of the allocation of existing staff time devoted to the project. In particular, don’t forget the administrative details that the project manager will need to do, this is often a “hidden cost.”

Training costs Trainer and staff time in training

Training primarily has to do with the time spent by both internal staff and new staff; the project manager should be trained at all aspects of a project while new staff can be trained for specific roles.

Presentation and preservation costs

Server space, data migration and long term preservation

Assess how much server space is available and will be needed to host the digital project; can you use existing or purchase new? Migration and long term preservation is also a cost that needs to be assessed.

Material costs Preparation for digitization, conservation

Typically expressed in time, but conservation work may need to be outsourced and there can be associated supply costs.

Adoption of standards and processes The determination of image capture specifications is one of the first considerations addressed by digital image managers. It is also one of the most complex decisions to be made, affecting the ultimate size of the digital collection, and influencing all decisions concerning equipment, storage, presentation, and staffing. Many of the standards are addressed in other sections of this guide, such as Digital Production and Metadata.

Chapter 1 -- Project Planning - 14 -

Planners will want to explore the actual processes involved in the project. This will assist them in developing a workflow plan and help understand the impact that following standards for digital production and metadata will influence the time and ultimately the cost of the project. As standards and best practices are adopted and as the processes and workflow are refined, project managers need to be flexible. Digitization requires constant adjustment to keep pace with changing technology and standards of practice. Be prepared to be "under development" indefinitely. Documentation The importance of making all these decisions will be lost if they are not recorded in the planning phase to provide the backbone of the documentation of the digital project. Documentation allows be managed and worked effectively and efficiently. It also prepares the project for later migration and sustainability issues that are faced with electronic resources of all kinds. The continuity provided by documentation will be of benefit throughout the life-cycle of the project, as staff come and go, and in planning future projects. Individual tasks should be clearly defined, and documentation should provide that information. Aspects that need to be included in the documentation are:

• Project goals and missions • Selection criteria to be used and items selected • Digitization and metadata standards chosen • Workflow and tasks to be performed

How will you document your project? Documentation strategies are an essential aspect of the planning process, as details and decisions made during the planning process can guide the project. Documentation ensures that decisions are recorded to avoid repetition or conflicting solutions. This documentation will also guide you in the sustainability of the project over the long term (see Digital Preservation for more information). Evaluation The last step of the planning process should include an outline of how you will evaluate the digital project against your goals or objectives. From the very beginning, digitization planners will want to think about assessment and evaluation, asking:

• How will the project be assessed? • Should that assessment tools be built into the project? • Can a quantitative as well as qualitative assessment be taken throughout the

project? This evaluation stage will allow you to re-examine your choices made in the planning process for necessary adjustments and to leverage the lessons learned to more successful implementation of digitization the next time. Thorough evaluation of the project -- throughout and at the end -- will provide ways to refine your current project as well as inform you on future projects (see Project Evaluation for more information).

Chapter 1 -- Project Planning - 15 -

Conclusion Even the best-laid plans can be upset by unexpected obstacles and problems. No amount of planning will cover every exigency, so plan and be prepared to re-plan. At some point, all plans must lead to the creation of that first digital image for the work to actually begin. Successful digitization projects are the products of successful planning. While it is tempting to plunge right in, a more methodical approach will save time, effort, and resources in the long run. It will also help make certain that projects maintain focus. By maintaining focus, projects will more than likely meet their goals. Planning certainly will help when it comes time to choose which materials should be digitized. Further Reading Colet, Linda Serenson. “Planning an Imaging Project,” prepared as one of the Guides to Quality in Visual Resource Imaging, July 2000 for the Research Libraries Group (RLG) and the Digital Library Federation (DLF), http://www.rlg.org/legacy/visguides/visguide1.html. NDLP Project Planning Checklist, National Digital Library Program, Library of Congress, available at: http://lcweb2.loc.gov/ammem/prjplan.html Planning Digital Projects for Historical Collections in New York State, New York Public Library, available at: http://digital.nypl.org/brochure/

CHAPTER 2 SELECTION

How do we choose to take from our past certain things to remember? How do we decide which of life's many stories we wish to tell? It is a mysterious process, and one that occurs daily in museums, archives, and libraries. The documents, photographs, and objects that are the evidence for our stories, often come to their acid-free, carefully-controlled environments willy-nilly, by happenstance as much as by planning. They have been absent-mindedly or quite determinedly winnowed by their creators, his or her family, and by the erstwhile field mouse. They have been evaluated by graduate students, yard-sale goers, file clerks, and scholars, as well as by the keepers of family heirlooms. For most institutions, the creation of online collections will mean one more series of choices, one more set of evaluations, one more group of interpreters. Can these individuals move beyond subjective processes when making decisions about what to select for digitization? Probably not. However, they can ask certain questions that will more objectively guide their selection process. This chapter will discuss the issues of selection and help you define your selection criteria. Once the initial planning phase has been completed, it is time to select materials for digitization. The analysis of collection materials done during the project planning phase should provide a strong foundation for determining your selection criteria. These selection criteria apply to both the preliminary selection of collections and then the more detailed selection of material within collections, including both the physical and intellectual aspects of selection that need to be considered. Each category covers both the macro and micro selection processes. Determining your Selection Criteria How does one choose the best materials to digitize? Acknowledging that content selection is most often driven by subjective responses, the following provides some framework to help you make those selection decisions more objectively. Below are the central elements to be considered in selection with some questions to help you assess material. These central elements focus on seven areas that form the framework for your selection criteria: audience, impact on your institution, intellectual control, intellectual property rights, preservation, and technical considerations. Each section provides guidance and questions that should help you to think through defining your selection criteria.

Chapter 2 -- Selection -18-

Audience • Who are the expected users? Who is the intended audience? • Will the material be of interest to a large public? • Will the original materials be appropriate for multiple levels of users or a specific

audience? • Will the project make materials available to a population that otherwise would be unable

to use the collection, (e.g, disabled population, home-bound, or international users)? Impact on your institution • Is the scope of the project within the range of your staffing and budget or will you need

additional funding to successfully complete the project? • Will the product have immediate utility? • Will digitization increase the demand for the materials or for other, related materials

(and if so, do you have staff to handle the demand)? Intellectual control • Will the digitization provide better indexing and better bibliographic control of the

material? • Will digital capture enhance use through a contextual presentation? • Will the project raise the knowledge base of staff about the materials within the

institution? Intellectual property rights • Do you hold copyright on the materials you plan to digitize? If not, do you know who

holds copyright, and can you get the copyright holder to grant permission for its digitization? (For more information on this, see Legal Considerations)

Preservation • Will the digitization aid in the preservation of deteriorating materials by diverting

resources to their conservation or decreasing the wear and tear on originals by providing a digital surrogate?

• Will the materials hold up under the handling and processing required by digitization? Will special handling of the material to prevent damaging it be necessary? Will that special handling be costly? Do the materials require special technology considerations in order to digitize them without damaging them?

Technical considerations • Will your knowledge of technology be sufficient to meet the needs of the material? • Will your technology allow for quality reproduction of materials? • Will the Web site have visual appeal online?" Value • Does the project duplicate materials available at another repository or are they unique

to your collection? • Will the resulting digital collection have enduring value? • Will the project make the content more broadly available?

Chapter 2 -- Selection -19-

• Will digitization give the collection "added value"? • Will digitization improve legibility of originals? • Will the project provide educational material that can be used in resource-based

learning? • Will the digitized collection have the potential to attract funding, either through external

grants (i.e., it meets the criteria of funding agencies) or in terms of raising revenue (i.e., is it marketable)?

• Will the project generate institutional prestige? • Will the project be in keeping with policies at the institutional level? Harvard University has created a decision-making that provides step-by-step guidance through the selection process. This can be a good place to start when defining your selection criteria and incorporates many of the ideas outlined above. Selection for Digitizing: A Decision-Making Matrix, Harvard University Libraries, available at: http://preserve.harvard.edu/bibliographies/matrix.pdf

An Example of the Selection Process: The small and modestly funded Historical Society of Lower Turkey Fork in rural northwest North Carolina has a collection of some 2,500 photographs donated by a local auto mechanic who shot photographs of his family for some 25 years. The Society repository also has a collection of about two linear feet of manuscript papers and photographs of a turn-of-the-century land developer who negotiated the purchase of many tracts of land from the local Indian tribes. The third largest collection is an oral history collection of audiotapes (150 of them) and photographs of soldiers who fought in WWI all gathered in the 1940's and no longer associated with any documentation. A fourth collection is a group of approximately 150 glass-plate negatives of life on Lower Turkey Fork in the late 1890's shot by a Belgian anthropologist and photographer who studied rural farm methods. Also in the collection is a journal of his activity, photographic practices, and encounters while in Turkey Fork. Remaining collections include an assortment of letters, documents and photographs, and 500 objects including art, historical objects, and four turn-of-the-century horse carriages kept in a barn behind the historical society. The staff wants to digitize their holdings but wonder what to select for digitization. While opinions may differ, the most likely candidate for first selection might be the Belgian photographer's work and his journal. The collection size is modest, and it is likely to be of wide-ranging interest. The collection is currently inaccessible because of the fragile nature of the glass plates. Audience, preservation, and value have all informed this choice. The mechanic's collection is too large and not likely to appeal to a general audience beyond Turkey Fork. Transcription of the WWI soldiers' tapes would be very labor intensive, and their age and composition would require special handling. Also, they have problematic documentation. The scattered letters, documents, photos and objects appear to be too unfocused to be of sufficient interest at this time. They chose the Belgian photographer's work for digitization. Certainly, you could expand on this analysis and you may choose differently based on your perception of the selection criteria. Remember in applying your selection criteria, though, it requires some objective guides that go beyond initial response.

Chapter 2 -- Selection -20-

Documenting your Selection Criteria As part of the selection process, you should record the criteria that you are using to choose materials for digitization. This documentation process serves several purposes. First, it allows you to revisit the original materials to ensure you have consistently applied the selection criteria. Once the digitization project is underway, you may decide to change individual items selected. A well-documented selection criteria will guide any changes you make in your selection and remind you of the decisions made during this process. In addition, your documentation allows for more productive teamwork because all members of the team will follow the same protocol. Finally, documentation will provide a framework for the next digitization project, allowing for consistency across digitization projects. Conclusion One of the most important services performed by archives, libraries, and museums is selection, choosing from the many products of the living those few items which will best tell their stories. Digitization means that cultural caretakers will find themselves conducting another series of selections among their collections. Every institution knows its own audience best and thus will have its own set of selection criteria based upon its audience's needs. Further Reading Columbia University. “Selection Criteria for Digital Imaging Projects” available at: http://www.columbia.edu/cu/lweb/projects/digital/criteria.html "Guidelines for Selection" compiled by P. Ayris (UCL) as part of the joint RLG and NPO Preservation Conference, Warwick, 1998 http://www.rlg.org/preserv/joint/ayris.html Harvard University. “Selection for Digitizing: A Decision-Making Matrix” http://preserve.harvard.edu/bibliographies/selection.html Hazen, Dan, Jeffrey Horrell, and Jan Merrill-Oldham. Selecting Research Collections for Digitization, Council on Library and Information Resources, 1998. Available at: http://www.clir.org/pubs/reports/hazen/pub74.html Oxford University. “Assessment Criteria for Digitization” http://www.bodley.ox.ac.uk/scoping/assessment.html

CHAPTER 3 LEGAL CONSIDERATIONS

This chapter outlines the important legal issues a cultural institution should consider when beginning a Digitization Program, offers suggestions and recommendations for activities you can undertake to ensure compliance with the law, and provides resources for further investigation of this complex topic. Intellectual property rights management is the most pressing legal concern for institutions proposing to digitize their collections. Intellectual property rights include copyright, trademarks, patents, publicity rights, privacy, and trade secrets, but it is copyright that will mostly concern this audience. Today, copyright protection begins as soon as the original work is fixed in a tangible medium of expression. It is no longer necessary to register or to publish the work to copyright it -- unpublished work is now fully protected by copyright. A copyright holder controls the rights of reproduction, modification, transmission, display, and performance of the copyrighted material, whatever its format. Digitization can involve all of these activities, therefore copyright is an important concern when starting a digitization project. Cultural institutions are primarily interested in two issues surrounding copyright: how they can legally digitize material in which they may not hold the copyright, and how they may ensure that no one else can use the materials they have digitized without their approval (tacit or otherwise). Definitions of Copyright The purpose of copyright law, as mandated in the Constitution, is to "promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries" (Article 1, section 8 of the United States Constitution). Copyright protection, which arises automatically when an original work is fixed in a tangible medium of expression, (registration or publication not required) can prohibit or permit certain uses of the work without the permission of the copyright holder. Prohibited use of material is referred to as "infringement" and is actionable. Because copyright is a property right, the ownership may change many times over the course of the property's life. In addition, physical ownership does not translate to copyright ownership. For instance, just because a collection is in a repository does not mean that repository holds the copyright to the items in that collection. Copyright often still resides with the author, the publisher, the photographer, the artist, and/or a combination of one or more of these to name a few of the possibilities. Because publishing an item on the internet involves at least the rights of reproduction, display, and transmission, copyright ownership must be determined before it should be included in a digitization project. To avoid copyright infringement, permission to digitize and present online should be obtained in writing by the copyright holder for the material that is still protected under copyright law. In order to address these issues, institutions need to determine the copyright owner, the duration (time) of the copyright and the classification of the work. Copyright can vary from format to format and can affect the copyright status and your ability to legally digitize the materials.

Chapter 3 -- Legal Considerations -22-

=

To complicate matters, sometimes works may have two or more copyrights. For example, a work of art may be copyrighted and a photograph of the work of art may also be copyrighted AND the book in which the image appears may be copyrighted. Multiple copyrights are often difficult to determine, but, if there is indication that they exist, permission should be obtained from all holders of copyright. Tools exist, though, to help you navigate the various layers of complex rights situations. For more information about the United States Copyright Law, see:

United States Copyright Office, available at: http://www.copyright.gov/

Legal Information Institute, Cornell Law School, available at: http://www4.law.cornell.edu/wex/index.php/Copyright Library of Congress Copyright Office, Copyright Basics, available at: http://www.copyright.gov/circs/circ1.html

What Can Be Copyrighted? The first challenge to copyright is to understand what can be copyrighted. Any original work, excluding federal documents created by a federal government employee within the scope of their employment, may be copyrighted. This may include any of the following:

Art (pictorial, graphic, textual, and sculptural) work Literary works Dramatic works Musical works Sound recordings Motion pictures and other audiovisual works Pantomimes and other choreographic works Architectural works

“Orphan works” is a term for those items that cannot be cleared of copyright but for which there is no recourse for discovering copyright ownership. A good source of current information concerning copyright and “orphan works” is Copyright and Art Issues (http://darkwing.uoregon.edu/~csundt/copyweb/). Peter Hirtle discusses the real challenges that institutions face with orphan works in “Adopting ‘Orphan Works’”, available at: http://www.rlg.org/en/page.php?Page_ID=20571#article3 Copyright Term One of the primary ways copyright of a work is discussed is the work’s copyright term. The copyright term refers to the length of time during which the copyright is honored. Copyright terms vary based upon the circumstances in which a work was created, whether the item is published or unpublished and other variables. For example, current federal copyright statutes determine that the copyright term for published works is life of the creator plus 70 years in most cases. For works of hire, anonymous works, or works that cannot be tied to a single creator’s life, the copyright term is 95 years from publication or 120 years from creation, whichever is shorter. Variables such as date of creation and copyright statutes make copyright terms more complicated, though, so copyright terms can

Chapter 3 -- Legal Considerations -23-

be of various lengths. For more details on copyright terms as applied in a digitization project, see below. Copyright Issues for Digitization There are three issues that dominate copyright questions for cultural institutions embarking upon a digitization project:

• Is the work in the Public Domain? • Does my action fall under Fair Use? • Am I respecting the Moral Rights of the creator?

The Public Domain The Public Domain is defined as “all entities, information, and creative works that are available for use by anyone for any reason without restriction.”1 A public domain work is a creative work that is not protected by copyright and which may be freely used by everyone. Reasons that a work is in the public domain and not protected include:

(1) the work is a work of the U.S. Government employee in the course of their duties; (2) the work was created before copyright laws were established; (3) the author failed to satisfy statutory formalities to establish the copyright; or (4) the term of copyright for the work has expired.

Works in the public domain may be used freely by anyone. It is assumed that many of the materials in special collections are old and in the public domain, but often this is not the case. Determining if the material you wish to digitize is free of copyright restrictions is a critical first step in the digitization process and linked to the selection process. In general, works published in the United States prior to 1923 are in the public domain. Works published later may be in the public domain but will require some research first. Generally, the copyright holder is the person who created the work. Since registration is not currently required to document this ownership, it can be difficult to determine copyright ownership. If, however, ownership of copyright is transferred to another party, as in published or film companies, there must be a written assignment. This type of copyright is then easier to trace. When the copyright term expires, the work is in the public domain. There are several resources available to track the various changes in copyright law, and to use as a tool in trying to assess the copyright status of an item. Laura Gasaway, Director of the Law Library and Professor of Law at the University of North Carolina, Chapel Hill, has created a chart to help determine whether or not any material in question is in the public domain. The chart can be found at http://www.unc.edu/~unclng/public-d.htm, and it is reproduced here as it existed in March 2007.

1 Zorich, D. “Why the Public Domain is Not Just a Mickey Mouse Issue,” NINCH Copyright Town Meeting, Chicago Historical Society, January 11, 2000. http://www.ninch.org/copyright/2000/chicagozorich.html

Chapter 3 -- Legal Considerations -24-

=

WHEN U.S. WORKS PASS INTO THE PUBLIC DOMAIN

DATE OF WORK PROTECTED FROM TERM Created 1-1-78 or after

When work is fixed in tangible medium of expression

Life + 70 years1(or if work of corporate authorship, the shorter of 95 years from publication, or 120 years from creation2

Published before 1923

In public domain

None

Published from 1923 - 63

When published with notice3

28 years + could be renewed for 47 years, now extended by 20 years for a total renewal of 67 years. If not so renewed, now in public domain

Published from 1964 - 77

When published with notice

28 years for first term; now automatic extension of 67 years for second term

Created before 1-1-78 but not published

1-1-78, the effective date of the 1976 Act which eliminated common law copyright

Life + 70 years or 12-31-2002, whichever is greater

Created before 1-1-78 but published between then and 12-31-2002

1-1-78, the effective date of the 1976 Act which eliminated common law copyright

Life + 70 years or 12-31-2047 whichever is greater

1 Term of joint works is measured by life of the longest-lived author. 2 Works for hire, anonymous and pseudonymous works also have this term. 17 U.S.C. § 302(c). 3 Under the 1909 Act, works published without notice went into the public domain upon publication. Works published without notice between 1-1-78 and 3-1-89, effective date of the Berne Convention Implementation Act, retained copyright only if efforts to correct the accidental omission of notice was made within five years, such as by placing notice on unsold copies. 17 U.S.C. § 405. (Notes courtesy of Professor Tom Field, Franklin Pierce Law Center and Lolly Gasaway) Another resource created by Peter Hirtle at Cornell University, “Copyright Term and the Public Domain, 1 January 2007” is available at: http://www.copyright.cornell.edu/training/Hirtle_Public_Domain.htm

Chapter 3 -- Legal Considerations -25-

Fair Use Cultural institutions may wish to digitize materials that are not in the public domain and whose copyright they do not own. In this case, they should examine whether the material and the way they wish to use it may be covered by fair use. Fair use is an exemption under U.S. copyright law that allows one to legally use copyrighted material without explicit permission of the copyright owner.

§107. Limitations on Exclusive Rights: Fair Use

Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include – • the purpose and character of the use, including whether

such use is of a commercial nature or is for nonprofit educational purposes

• the nature of the copyrighted work; • the amount and substantiality of the portion used in relation to the copyrighted work as a whole • the effect of the use upon the potential market for or value

of the copyrighted work. The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors. Educational use alone is not enough to constitute fair use. In today's fast moving digital environment many of the "fair uses" are under debate. What is "fair"? As long as the use is "fair," i.e. does not infringe on the rights of the holder of the copyright, the user is not likely to be in violation of copyright; however, continuous use, selling the item or any portion of it, and/or charging for use, is most likely beyond "fair use."

Misunderstandings often take place when the copyright of manuscript material comes under consideration. For example, would the 1862 diary of a Civil War soldier who died in 1863 still be protected by copyright? To most researchers’ surprise before January 1, 2002, the answer was yes. Unpublished works created before 1978 did not enter into the public domain until January 1, 2003 at the earliest. For most manuscripts, the term of protection is life of the author plus 70 years with January 1, 2003 being the earliest expiration date. This meant that works created by an author who died in 1933 and before would enter the public domain on January 1, 2003. If the death date of the soldier was unknown, the expiration of copyright would remain the same. Unpublished works where the death date is unknown are protected for 120 years from the date of creation or until December 31, 2002, whichever was later.

Chapter 3 -- Legal Considerations -26-

=

Fair use guidelines that exist on the web include:

• North Carolina State University’s Scholarly Communication Center’s Copyright Tutorial Series on Fair Use, available at: http://www.lib.ncsu.edu/scc/tutorial/copyuse/fairuse1.html

• Stanford University Libraries’ Copyright and Fair Use, available at:

http://fairuse.stanford.edu/Copyright_and_Fair_Use_Overview/index.html

These sites should be consulted when analyzing material proposed to be digitized that may fall under the fair use category. Moral Rights Works of art and photographs may also enjoy moral rights. Although an institution may own copyright to a piece of literature or art, the creator of that work may retain a moral right to the work. These moral rights grant protection from derogatory treatment of the work. These rights are granted only to the original author (and cannot be transferred, although they can be waived) and are applicable only to works of visual art which exist in single copies or in multiples up to 200. The two statutory moral rights are:

(1) the right of attribution, meaning the right to have the author's name attached to or deleted from the work; (2) the right of integrity, meaning the right to prevent mutilation or distortion of the work which would prejudice the author's honor or reputation.

For cultural institutions this means that it is essential to make sure the original artist’s name is linked with his or her creation (appropriate metadata use can help a great deal with this) and that works be used in their entirety where possible and not be amended (digital copies should not be significantly cropped or edited). It is also important to remember that not all moral rights fall within U.S. copyright laws. In other countries such as France, moral rights may survive even after copyright expires. Moral rights require that in digitization as in other activities, the creative work being reproduced be treated in a manner so as not to compromise the artistic integrity beyond recognition or used in a context that may be objectionable to the creator or the creator’s estate. Permission to use Copyrighted Material To use non-public domain material legally and when that use is not a fair use, you must obtain permission from the rights holder(s). If you do not know who holds the rights to the material in question, you may find the answer online through one of the following resources.

• The Getting Permission Page by Georgia Harper, Office of General Counsel, University of Texas System, available at: http://www.utsystem.edu/OGC/IntellectualProperty/PERMISSN.HTM

• The Copyright Clearance Center, available at: http://www.copyright.com

• U.S. Copyright Office, available at: http://www.copyright.gov

Chapter 3 -- Legal Considerations -27-

When asking for permission to use copyrighted material, you should describe clearly:

• the work you want to use (with a copy if possible); • the scope of the project; • where and how you intend to use the work (e.g., names of key contributors,

approximate size of the project, URL, anticipated life of the project, how many users may access it, and how it is going to be distributed);

• any future use you envision; and • the specific rights you seek (e.g., presentation, publication, general use, etc.).

In addition, you should ask the copyright owner for instructions on the wording of credit lines, the copyright notice related to their material, any other conditions they might have, and any fees that might apply. You should also ask for confirmation that they have the authority to grant permission. If they cannot confirm their authority, request that they direct you to the appropriate rights-holders. You may request them to assure in writing that your use will not, to their knowledge, infringe the rights of any other third party. The McCain Library and Archives at the University of Southern Mississippi, in creating their digital archive “Civil Rights in Mississippi” have created an excellent toolkit for pursuing copyright permissions, available at: http://www.lib.usm.edu/~spcol/crda/ipp/index.html. This toolkit includes sample letters, check lists, workflow information, and a model framework for digitization of material that will necessitate extensive copyright permissions. If you are contemplating a collection that consists of materials which necessitate copyright permissions, this site will be an invaluable resource. It is important to address copyright in the planning phase of your digitization project. It will affect the timeline of the project if permissions need to be sought and may have a strong impact on selection. Devise a system for determining copyright from the beginning of your digitization project. Make sure to document all your efforts to trace the rights-holders, since if they prove to be untraceable or unresponsive and you decide to go ahead with the project, your documentation can help to prove “good faith best efforts,” or “due diligence” if original rights-holders appear and initiate legal proceedings at a later date. In addition to assuring that they have not infringed copyright, many institutions have agreements to protect both the donors of collections and the collections themselves from users of materials who may abuse copyright. A user agreement serves to remind patrons of copyright ownership and describe the specific use of the collection or the item. Rights management statements in metadata help institutions keep track of copyright for items and collections as well. Other Intellectual Property Rights Concerns Copyright is the single most important property rights issue a cultural institution will face when planning their digitization project, but other rights do exist and need to be considered. These less noticeable intellectual property rights include rights of privacy, rights of publicity, and certain laws governing patents, trademarks, and trade secrets. Rights of privacy and publicity on the Internet are largely self-regulated in the United States. Patents and trademarks for digitized material, manipulations of digitized material, and the creation of databases are controversial topics presently being debated in courts and boardrooms all over the country and may become increasingly important to cultural institutions involved in digitization as time passes.

Chapter 3 -- Legal Considerations -28-

=

For a comprehensive treatment of these rights management issues, consult The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials (http://www.nyu.edu/its/humanities/ninchguide/IV/), which is updated as events unfold in this arena. If there is any question regarding the use of a particular item, do not publish it digitally until all rights issues have been resolved and all inquiry has been exhausted. Conclusion Publishing your institution's materials on the Internet and hoping for the best is not a good idea. Repositories seeking to digitize materials will want to remain sensitive to copyright restrictions and to provide digital images of only those materials in the public domain and materials for which they either hold copyright, have permission from the copyright holder to use, or have exhausted all avenues possible to find the rights-holders (in which case large disclaimers stating that copyright information is not known should be affixed to all images of the material in question). Additional other property rights should be taken into account and fully scrutinized as well. When considering public access, strive to make sure that all digitized material is handled with responsibility, sensitivity, and sensibility with respect to intellectual property rights. Further Reading Bolinski, Dorissa, Christopher Mautner, and Timothy McLain. Creating Acceptable Use Policies. California: Classroom Connect, 1998. Copyright and Art Issues http://darkwing.uoregon.edu/~csundt/copyweb/ Copyright Clearance Center http://www.copyright.com Cornell University Legal Information Institute, Copyright Law Materials http://www.law.cornell.edu/topics/copyright.html Franklin Pierce Law Center. “The IP Mall” http://www.ipmall.fplc.edu/ Gassaway, Laura N. "When U.S. Works Pass into the Public Domain." http://www.unc.edu/~unclng/public-d.htm Harper, Georgia. Copyright Crash Course http://www.utsystem.edu/ogc/intellectualproperty/cprtindx.htm Hoon, Peggy. “Scholarly Communication at NC State” http://www.lib.ncsu.edu/scc/main.html Indiana University, Copyright Management Center http://www.copyright.iupui.edu/, Bloomington. Indiana. McCain Library and Archives, University of Southern Mississippi, “Civil Rights in Mississippi: Intellectual Property and Privacy Information” http://www.lib.usm.edu/%7Espcol/crda/ipp/index.html.

Chapter 3 -- Legal Considerations -29-

McCord Hoffman, Gretchen. Copyright in Cyberspace: Questions & Answers for Librarians. Neal-Schuman Publishers, 2001 Minow, Mary. “Library Digitization Projects and Copyright” available at: http://www.llrx.com/features/digitization.htm Nebraska Library Commission. “Copyright Handbook: Issues for libraries and schools” http://www.nlc.state.ne.us/libdev/copyright/copyright1.html. PDInfo, Copyright and the Public Domain, http://www.pdinfo.com/copyrt.htm Simpson, Carol Mann. Copyright for Schools: A Practical Guide, Third Edition. (Professional Growth Series). Ohio: Linworth, 2001. Society of American Archivists. "Basic Principles for Managing Intellectual Property In the Digital Environment: An Archival Perspective" http://www.archivists.org/statements/managing-intproperty.asp Stanford University Libraries. Copyright and Fair Use http://fairuse.stanford.edu/Copyright_and_Fair_Use_Overview/index.html Templeton, Brad. “10 Big Myths about copyright explained” http://www.templetons.com/brad/copymyths.html U.S. Copyright Office Home Page http://www.copyright.gov/

CHAPTER 4 DIGITAL PRODUCTION

Digital production is probably one of the easiest functions of creating a digital project --- and the most fun! It is exciting to see long-stored items, fragile materials, and negatives come to life on a screen. While scanning or photographing is easy, it can be deceptively so. And if you are looking for a high quality product, it can get complicated. Once the initial thrilled wears off, the processes in production become repetitive rapidly and even boring, causing mistakes to be made. Aside from the physical handling of original materials, digital production is time-consuming and redundant work, and therefore the imaging should be done correctly the first time. It is very difficult and expensive to go back and scan or photograph again or to recover documentation that did not accompany the original production image. To this end, NC ECHO follows the “scan once methodology” (this covers scanning, digital photography, and digital audio). In basic terms, this means that in creating digital objects, the digital production should also be done at the highest level of quality that an institution can afford. The higher the quality, the longer the life of the object and the more versatile its uses will be. As you plan for digital production and are determining the level of quality your institution can support, consider the future uses of the digital images. Do not anticipate returning to re-digitize. Many originals could suffer from the handling and exposure to the bright light required by digitization. For instance, it is reported that scanning exposes a document to four times the destructive light as one photocopy. Therefore it is best to simply "scan once," to create a master image, and make any future duplicates from that master image. This chapter of the Guidelines specifies the equipment, standards, and techniques required to conduct the digital production portion of a digital project. It discusses digital production stations, the “scan once methodology,” purchasing decisions for hardware and software, imaging and audio standards, basic steps to follow, documentation, and quality assurance. Setting up a Digital Production Station One of the major considerations in beginning a digitization project is to make best use of workspace. While some considerations listed here may not be practicable for your institution, we provide the best possible scenario and reasoning so that you can make the appropriate decisions according to your physical configurations. In an ideal world, institutions would have the resources to have designated digital production station that will not only be perfect for scanning or digital imaging with a camera but also not deprive you of space already devoted to other activities. This ideal is rare, so institutions are often faced with multi-purposing space or reconfigurations that put additional strains on an already limited resource.

Chapter 4 -- Digital Production -32-

In order to make the best use of existing space, begin with the following questions:

1. What will the space be used for? This can include not only the physical work of producing digital images, but the creation of metadata and the preservation of originals. Will the space be someone’s office? Can other staff members use the station where it is located? What other work will have to be accommodated besides digital production?

2. What type of materials will be captured? Having a clear vision of the formats of the original material intended for digital capture is important in terms of space because those formats may have space requirements themselves or their capture may have physical implications. Large flat surfaces in the production area are a must, as is the ability to secure the digitization area or a portion of it.

3. What sort of equipment will be used? It is important to recognize that different capture methods require variations in the physical configuration of your workspace. Scanners require table-top locations; digital cameras require space for tripods and other set-up requirements similar to a photography studio to optimize the capture.

4. How many staff will be working at one time? The number of people working in one space simultaneously effectively will be determined by the configuration of the workspace. It is important to consider the maximum capacity for a workspace prior to establishing schedules for your digital production staff and catalogers.

Once you have considered the above questions and surveyed your existing physical facilities, consider the following as “best practice” for digital production workspace configurations:

• Be healthy, be safe: This includes not only the physical surroundings but the workstations, chairs, and equipment selected for digitization. Also be aware that such things as cords stretched across well-traveled areas or overloaded on outlets present safety risks and are violations of safety regulations. Scanning and metadata capture also require long periods of sitting so an ergonomic chair and work table are essential.

• Lighting the area: Digitization of images requires a great deal of viewing and visual

discrimination and manipulation. Lighting should be standardized so that visual judgments are made consistently. Techniques to help with this include painting the room a neutral color, eliminating extraneous lighting from outside the work area, eliminating overhead lighting to decrease glare but using enough light to assure that eye-strain does not become a health factor, and a background desktop color that is mid-gray to balance your images across the monitor screen. You will also want to use the same lighting when you calibrate the equipment as you will have while scanning or photographing.

• Contentment of personnel: The people working on the project form the core of it.

Check in with them to make sure that they are comfortable and happy. Encourage regular breaks from the workstation, comradery, and teamwork. Ease boredom through radio or music. Consider the use of headphones if conflicts arising personal taste become an issue. Set achievable goals and be sure that personnel feel proud of

Chapter 4 -- Digital Production -33-

those achievements. Hold regular team meetings to ensure open communication and ask questions about their comfort. Pay attention to any ergonomic complaints they might have. Above all, accommodate their individual work patterns as much as possible while maintaining cohesive project team.

• Stability: Try to maintain a stable environment for project team members as much

as possible. Not only will this improve the digital capture of your materials, but will provide less apprehension or chaos among project staff.

Establishing an environment that is functional and pleasant will make the experience of a digital project all the more rewarding and will increase your efficiency in producing the digital images The “Scan Once Methodology” It is expensive for institutions to go back and re-digitize their holdings. Few ever do so. In addition, many originals could suffer from the handling and exposure to bright light required by digitization. Therefore, it is best to simply “scan once,” create a master image, and make any future duplicates from it. Step One – Create a Master Image

The highest quality copy of a digital image, often called the mater image, is expected to be a quality surrogate of the original. As such, it should represent the un-manipulated original and be created at a high resolution and stored in an uncompressed format (usually TIFF). High resolution equals large amounts of information captured, and large amounts of information captured usually equal a higher quality digital image. The higher the quality, the longer the life of the digital copy and the more versatile its uses. It is the master image that holds the promise of versatility and longevity. From it, high quality prints or publications might be made as well as derivatives for a variety of uses.

Step Two – Create an Access Image

Access images are lower resolution copies taken from the master by using a “save as” function and changing the storage format and resolution. Access images may be of varying quality and are generally manipulated for better display upon the screen or page (cropping, re-sizing, etc.) Additional images, such as “thumbnails” (even lower resolution copies) may also be created from the master or access image. These thumbnails allow for even quicker downloads of pages, and faster retrieval of large numbers of images. Suggested resolutions, bit depths, and storage formats of each of these types of digital reproductions (master, access, and thumbnail) are outlined below. Images created from the master are often referred to as derivative images.

Step Three – Storing the Master Image

The master image is the copy to be maintained for the long-term. As such, it should be stored appropriately. Master images take up a great deal of space, and most institutions will not wish to store them for the long-term on computer hard-drives. Some institutions maintaining large amounts of digital images will wish to work with a form of tape or server backup, while those institutions engaged in more modest digital products may choose to store master images on CDs. If an institution decides

Chapter 4 -- Digital Production -34-

to use CDs as a storage medium, it is suggested that two copies of each CD be prepared and stored separately. One will serve as the “master” CD and the other will be the “use” CD from which access images, copies for users, etc. may be prepared. CDs used in this way should be “refreshed” regularly, that is copied from the old CD to a new CD (approximately every 5 years).

Getting the Equipment Selection of the necessary equipment can have the greatest impact on the quality of images for a digital project. The development of scanning and digital camera technology has led to a proliferation of equipment varying in quality and availability. This section provides the necessary information to make an effective decision for your institution. Before any equipment is purchased, consider the following overall questions:

1. What can your staff and your physical environment accommodate? 2. What can your current technology support? 3. What type of material are you digitizing (photos, documentation, art images, artifacts, etc.)? 4. What financial restrictions do you have? 5. How will you provide storage for you project?

Hardware: Digital Capture There are basically six types of digital production devices.

Flatbed scanner - The most commonly used type, it accepts a broad range of formats and varies in quality and price. Flatbed scanners are typically modeled for a scan area of 8” x 11,” but larger flatbed scanners are available. They can be purchased with transparency adapters which handle negatives and slides very easily. High end scanners have less problem with "flare" and now come with front side USB and fire wire connectors which are much easier to use, especially with digital cameras. Sheet-fed scanner - Similar to the flat-bed scanner. It is used for batch work and should never be employed with originals because of potential jamming which could damage or destroy the originals.

Drum scanner – The drum scanner produces high quality images but is quite expensive. Because materials are affixed to a rotating drum, they are not recommended for cultural heritage materials but are suitable for surrogate negatives and transparencies. There are now drum scanners that are sometimes called roll scanners and instead of a rotating drum, they utilize a conveyor belt arrangement which is less damaging to the original materials. Again, they are quite expensive. Reprographic stand scanner – Also known as an overhead scanner, these scanners are quite expensive but allow for digitization of books and oversized materials with minimal damage to the original. The reprographic stand scanner has the a camera mounted over the scanning area, decreasing the amount of pressure placed on a book spine or allowing for a large scanning area.

Digital camera - Good for 3-dimensional objects, digital cameras vary widely in quality and price. It also has a problem with "flare," or bright patches on the images.

Chapter 4 -- Digital Production -35-

Lens are geared toward the capture of 3-dimensional scenes and may introduce distortions to flat materials. If a digital camera is necessary, it works best in a controlled, studio-type environment.

Film scanner – Specifically designed to digitize transparent materials such as 35 mm film, the film scanner is particularly good for roll film, but less productive for slides. It too has a problem with "flare."

Pros and Cons of Digital Capture Devices

Scanner type Pros Cons

Flatbed scanner • Highly addressable • Inexpensive • Many units can handle both transmission and reflection materials • Flexible software drivers • Most good up to 600 dpi of real resolution. • Low learning curve

• Low productivity, frequent document handling • Tendency toward streaking and color misregistration • Prone to inflated marketing claims

Sheet-fed scanner • High productivity • As good as or better than flatbed scanners • Many automatic features

• Unsuitable for fragile, bound, wrinkled, 3-D, or inflexible objects • More expensive than flatbed scanners • May not handle all sizes of documents

Drum scanner • Very high image quality • High resolution • Low noise • High dynamic range • Good tone/color fidelity • Few artifacts • Very flexible software drivers • Variable sampling rate

• Expensive • Low productivity • Frequent handling • High operator skill level • Handles limited document types; must be mountable on drum

Reprographic stand scanner

• Very high image quality • High resolution • Low noise • High dynamic range • Good tone/color fidelity • Few artifacts • Flexible software drivers

• Expensive • High operator skill level • Frequent document handling, although minimized impact on document.

Chapter 4 -- Digital Production -36-

Camera • Can handle a variety of document/ object types (3-D, bound, glass plates, non-flat, oversized) • Unlimited field size • User-controlled lighting. • Rapid capture for area arrays • Non-contact capture • May have interchangeable lenses • Generally good image quality

- Good models expensive. • Limited sensor size • Low productivity for linear array types • Nonuniformity artifacts common • Area array devices prone to low dynamic range due to flare • Moderate skill level required

Film scanner • Highly productive for roll film • Low flare/ good dynamic range for linear arrays

• Low productivity for sheet film or slides • Potential for high flare in area-array devices • Dust/scratch artifacts common • Image quality characterization difficult due to lack of targets

For detailed information about selecting the appropriate scanner, see Don Williams, “Selecting a Scanner,” Guides to Quality in Visual Resource Imaging, available at: http://www.rlg.org/legacy/visguides/visguide2.html. The above table was adapted from his documentation. Hardware: Computers Select the computer that will be used in the digital production. It is recommended to devote one computer to this and below are outlined some guidelines on the best selection for this.

• has as much Random Access Memory (RAM) as possible (at least 512 mb). More memory allows the computer to process large amounts of image data more quickly.

• has a processor that is optimized for image manipulation. • supports high-speed data input through serial connections USB 2.0, or IEEE 1394

“Firewire.” • has an ISO 9660 compliant CD-RW burner to create archival storage CD-ROMs of

your digital images. If you are going to be purchasing a new computer to act as your digitization station, it is recommended that you review trade publications such as PC Magazine to help make an informed decision. In making these decisions, it is recommended that you involve your technology support as much as possible. Not only can technology personnel provide help in making decisions, but they will be better able to perpetuate their support throughout your digitization project. Digital camera reviews can be found at http://www.dpreview.com/.

Chapter 4 -- Digital Production -37-

Hardware: Purchasing In purchasing hardware, consider these issues: what are the resolution capabilities? Is the scan bed large enough to handle your originals? How long does it take to scan one image at your master image specifications? Does the manufacturer have a good reputation for service and durability? Optics quality is important. Manufacturers' claims sometime may be unreliable, especially relating to the number of pages scanned per minute and the maximum possible resolutions. Look for reviews, ask those using the equipment, and play close attention to actual rather than interpolated resolution. A scanner's speed is directly related to the associated computer's capabilities. The higher or faster the RAM, Hard disk space and CPU speed, the better. Software Some kind of software usually accompanies the digital production device. For a scanner, this is the scanning software and for a digital camera, this is the software that provides the interface to download images from the camera to the computer. A second kind of software is used to manipulate the scanned image. This is image manipulation software. It may come with the scanner, but it will usually allow for only the very basic editing of an image. Remember, you get what you pay for. Manipulation software is mounted on the hard drive of a computer and is used to orient the image; crop it; adjust brightness, contrast, and resolution; transform; flip; or otherwise manipulate the image. The de facto standard for image manipulation is the software package, Adobe PhotoShop. It can import the scanning software so that you are able to scan and manipulate the image within the PhotoShop umbrella application. There are several versions of PhotoShop, ranging from PhotoShop Elements (about $40.00) to PhotoShop Creative Suite Premium (about $1,200). Other imaging software is adequate for basic tasks (Paint Shop Pro, Deskscan II, etc.). It is recommended that you look for software that allows you some flexibility for advanced manipulation and saves the image in all the common formats (i.e., TIFF, JPEG, GIF). It is also recommended that the software allows conversion from one format to another. If the project will require the processing of a large volume of images, it is best to consider additional software that allows batch processing such as PhotoShop, Debabelizer or ImageMagic that will enable the automatic processing of files and the standardization of compression. When selecting image manipulation software, institutions should look for

Ability to work directly with scanner software through TWAIN or other plug-ins Support for a wide variety of file formats Tools for controllable image optimization (i.e, color adjustment or color spaces) Usable documentation and reliable technical support Extensibility Ability to create macros for frequently applied functions Batch processing

Software: Purchasing

Manipulation software. How versatile is the software? What storage formats does it support? What are the options for manipulating the image? Can you turn off some of the options or does the software force you to

Chapter 4 -- Digital Production -38-

“improve” the image? Scan software. What are its resolution capabilities? What are the save file options? Can you set the default? Does it allow you to change the default settings or must you change them each time you scan or state a scanning session?

Purchasing Equipment The main factors to consider in purchase: Cost Scanners and digital cameras can range anywhere from $100.00 (or less) to thousands of dollars. Generally you get what you pay for. Scanners in the mid-range of several hundred dollars are likely to be adequate for most scanning projects. Look carefully at warranties, maintenance reputation, reliability, good documentation, flexibility of the scanning platform and non-proprietary interface cards.

Installation Installing the scanner should be very direct. With only a few exceptions, the scanner is a plug and play peripheral. Be very careful to purchase a scanner that does not require a proprietary interface card, as this card may create incompatibility in other computer functions. A USB interface has become the standard (although SCSI2 is still better and fire wire connectors are almost as popular as they are faster). SCSI 2 allows attachment of other devices to the computer with few complications (tape-drives, Zip-drives, CD Rom drives, etc.) but also requires a special hardware card. The other devices may be required for storage and for transport of large files as the institution's digital collections grow. Installation does not impact digital cameras, although you will want to be assured that accompanying hardware will work on your computer platform.

Destination of the image Web? File? Print? If the use will be for Web images alone, an inexpensive capture device may suffice. If archiving or migration is of concern, aim for the higher-end machines. Since it is recommended to “scan once,” most institutions, no matter the size will want to factor in both master and access images. Resolution needed A 4 x 5 photograph will be fine on a 600 dpi scanner. A 1 x 2 contact print will need a higher resolution, more in the range of 1200 dpi, and will require a more expensive scanner. Number of items to be scanned If you plan to process large collections, the 30 seconds or more needed to scan one image can add up to an enormous drain on resources. Consider buying a faster scanner or buy two scanners (this won't help if your staff is small!). A "single pass" scanner is the faster scanner but may not capture all the information. Format of items to be digitized Slides, photographs, color, grayscale, half-tone print, graphics, text, three-dimensional objects, etc. will all need to be treated differently for best results. Can the scanner handle a variety of formats? If there are three-dimensional objects or large oversized flat materials to be digitized, a digital camera will need to be purchased. Slides and film require more sophisticated scanners, and the purchase price will be higher if a stand-alone system is

Chapter 4 -- Digital Production -39-

purchased. Additional Tools Some tools will come with the scanner. These often include masks for transparencies and negatives. These are strongly recommended, as a dark surrounding field for transparencies and negatives produces the best scanned image. Compressed air, and/or a soft brush will be useful for photographs and to keep the bed of the scanner free of lint. Tripods and other equipment are necessary for a digital camera to create a stable digitization station. These would be items that would have to be purchased in addition to your camera. And of course, add to this list of tools cotton gloves for those handling originals. If you are purchasing an expensive capture device, company representatives should demonstrate its capabilities. You should also negotiate a trial period in which you can evaluate the results of digitizing a full range of materials. What Can be Digitized and How Below is a table showing major types of materials that can be digitized: the type of file (master, access, and thumbnail) and suggestions for corresponding resolution, storage format, and bit-depth. These suggestions are based upon standards and best practices being followed by some of the nation's major digitization projects. The resolutions, abbreviations, bit types etc. mentioned in this table are discussed later in this section. FORMAT TYPE MASTER IMAGE ACCESS IMAGE THUMBNAIL

IMAGE TEXT (printed documents)

Scan at 200-300 dpi grayscale. Uncompressed TIFF Intel (IBM) byte order Bit depth 8

8 bit grayscale JPEG, 4-6 on 1/10 scale (medium) File resolution 200 dpi Unaltered image size

Generally not used for text files.

PHOTOGRAPHS

Scan at 4000 pixels on long side OR 600 dpi Uncompressed TIFF Intel (IBM) type order Color scan RGB color 24 bit; Black and white scan 8 bit grayscale

8 bit grayscale, 24 bit color JPEG 8-10 on a 1/10 scale (high) File resolution 300 dpi unaltered image size

4 bit grayscale, 8 bit color JPEG 4-5 on a 1/20 scale (medium) 72 dpi

DOCUMENTS (manuscript materials)

Scan at 4000 pixel on long side OR 600 dpi Uncompressed TIFF Intel (IBM) byte order Color scan RGB color 24 bit Black and white scan 8 bit grayscale

8 bit grayscale, 24 bit color JPEG 8-10 on a 1/10 scale (high) File resolution 300 dpi unaltered image size

4 bit grayscale, 8 bit color JPEG 4-6 on a 1/10 scale (medium) 72 dpi

Chapter 4 -- Digital Production -40-

MAPS, DRAWINGS, BI-TONAL

Scan at 300 dpi Intel (IBM) byte order RGB color, bit depth 24

8 bit grayscale, 24 bit color JPEG 8-10 on a 1/10 scale (high) File resolution 200-300 dpi unaltered image size OR reduced to equivalent of 8 x10”

Optional for bi-tonal maps & drawings 4 bit grayscale, 8 bit color JPEG 4-6 on a 1/10 scale(medium) 72 dpi

OBJECTS

Use a digital camera at 300-600 dpi Uncompressed TIFF, RGB color, bit-depth 24

8 bit grayscale, 24 bit color JPEG 8-10 on a 1/10 scale (high) File resolution 300 dpi Unaltered image size

4 bit grayscale, 8 bit color JPEG 4-5 on a 1/10 scale (medium) 72 dpi

Digital Audio standards Many digitization projects are interested in including digital audio, whether digitizing analog audio media or creating new digital media. Audio files provide depth and variety to digital projects. Transferring analog audio to a digital media is a relatively simple process. The conversion involves four devices: an analog audio playback device, an analog-to-digital converter, a computer to process the digital signal, and a device for digital file storage. Other devices can include a mixing device. There are several audio software programs available to allow manipulation of the audio, including volume adjustments, tracking, equalization, noise reduction, and compression. For master files, these methods are used sparingly, but for derivative files can help to provide enhanced access to the audio file. Digital audio files can be recorded in many formats, such as WAV, AIF, and MP3. The most important aspect in selecting a file format is to choose one that is non-proprietary, with a high potential for future readability. Uncompressed formats will provide maximum audio fidelity. The WAV file was developed by Microsoft and is in widespread use. WAV is readable by virtually all audio software programs. AIF file type was developed by Apple Computer and is also used widely. Both WAV and AIF are uncompressed and accepted for long-term file storage. MP3 file format has emerged as the file type of choice for many applications. This file format Is highly compressed for electronic transfer. It is recommended that institutions use WAV for master files but can use MP3 for access files and delivery on the web.

Audio File Storage Requirements

Requirements Sample Rate Bit Depth Pros Cons

Minimum 44.1 kHz 16-bit Maximizes storage space Lowest level of processing time

Concerns over migration quality Limits ability to enhance source file for delivery

Chapter 4 -- Digital Production -41-

Recommended 44.1 kHz 24-bit Accurate reproduction of source material Increased dynamic range Increased ability to enhance source file for delivery Current professional audio standards

Requires 50% additional storage space Requires additional processing time

Optimal 96 kHz 24-bit Increased frequency range Further increased ability for enhanced source file for delivery Highest recommended current quality

Dramatic increased storage space and processing time May require compression for delivery

The above table and text are based upon the Digital Audio Best Practices, version 2.0 published by the CDP Digital Audio Working Group in November 2005 (available at: http://www.cdpheritage.org/digital/audio/documents/CDPDABP_1-2.pdf) For detailed information about audio digitization, please see their guidelines. Elements of a Digital Object 3 Types of Scan Scanners generally support three types of scans and present these options to their users.

• Bi-tonal - also known as line art or "black and white," it is best for printed text and high contrast graphics. While once a popular type of scan, it is not used as often today.

• Grayscale - provides a range of shades of gray in an image and delivers a better quality of scan than black and white, it is best for continuous tone documents and black and white negatives

• Color - duplicates the range of possible colors in an image with the higher the range the more accurate the scan at duplication, it is best for photographs and any document with color. Digital cameras produce color only.

File Formats

Chapter 4 -- Digital Production -42-

Digital images are stored in five major types of formats. It is the type of format and level of resolution, which is the difference between the “level” of scans. In order to save space and "move faster" over the Internet, some formats drop information from an image. Later the software analyzes what it did not drop, infers what must have been discarded and partially reconstructs the original image. This process is called "compression." Digitizers of fragile originals do not compress master images but attempt to maintain as much original information as possible.

• TIFF (Tagged Image File Format) is a storage format that does not compress the images and thus does not "drop" or lose information from an original digital capture. It is used for master images. TIFF is the preferred file format because it is designed for all platforms and is ubiquitous. Any image editing program produced in the last 10 years can open TIFFs so it will be around for a long time.

• JPEG (Joint Photographic Experts Group) is a compressible storage format and does drop some pixel information so that images might be stored in less space and be retrieved faster. It is often used for access or thumbnail images that are presented by the Web.

• GIF (Graphic Image File Format) is a compressible storage format and does drop some pixel information so that images might be stored in less space and be retrieved faster. It is sometimes used for access and thumbnail images that are presented by the Web. It is best for images with large areas of one or more colors. It is a proprietary format.

• PNG (Portable Network Graphic) is a compressible storage format, but does not drop some of the pixel information. It does 24-bit color but it does not allow saving the metadata as TIFF and JPEG do. It is still relatively new and not supported by all image viewers yet.

• JPEG 2000. This format is not related to the regular JPEG. It uses compression algorithms with an option to not lose pixel information and to communicate metadata and structure within the code stream. For more information see, http://www.jpeg.org/jpeg2000/

LEVELS OF SCAN FILE FORMAT USED FOR ALTER ?

Master image TIFF Long-term storage or print

Do not alter, or resize, or compress

Access image JPEG or JPEG2000 Screen display or print

Taken from the master, it is altered for presentation over the Web or other uses.

Thumbnail JPEG or GIF Screen display Taken from access, reduced size but not altered otherwise.

Master images must be of the highest quality. Web images need not require such stringent quality controls. But, before compromising on image quality, consider the cost of migrating the image. Because migration is costly, it is far sounder to migrate a high quality (master) image than one of lesser quality. All digital images will have to be migrated, if kept long enough. Their caretakers will have no choice. While the primary use of images in North Carolina ECHO is focused on Web access, repositories need to be mindful of future use, remembering the fragile nature of the originals. Publishing on the Web will result in requests for high quality copies of the images, so consider all possible needs before you produce your digital master image. Remember the advice, it is better to "Scan Once, Save Twice!"

Chapter 4 -- Digital Production -43-

Basic Production Steps Scanning The basic steps in scanning an image will be determined by the format of the material to be scanned, but all formats (color, B&W, and Bi-tonal) share common scanning techniques. While a full scanning manual is beyond the scope of this document, scanning an individual image might look like this:

1. Align material on the clean scanner bed, mask if necessary. (Because old documents

"flake," cleaning after each image may be required.) 2. Preview the scan. 3. Crop the image, leaving sufficient margins (white space). 4. Using the scanner software, set the resolution (dpi/ppi) and/or printer scale (dpi/lpi). 5. Scan. 6. Save at high resolution a raw image using the TIFF format. 7. Transfer the TIFF master to a file on your computer (with accompanying

documentation). 8. Pull up the image in the manipulation software. 9. Using the manipulation software, crop carefully. CAUTION: do not over-crop. (Master

files should maintain margin, showing to future users that the whole image has been digitized.)

10. Adjust histogram. (The graph of brightness values vs. number of pixels having that value, histograms are included in the image manipulation software; be careful this procedure reduces the amount of original information.)

11. Set gray mid-point (if used). 12. Adjust image size, if needed. 13. Make adjustments (tone, sharpness, noise, etc.) for the clearest image possible. 14. Adjust resolution needed for access 15. Check for quality against original. 16. Write second (derivative) file to TIFF or to JPEG. (This is the access or Web image.) 17. Change resolution (dpi/ppi) and write third (derivative) file to GIF or JPEG. (This is

the thumbnail image.) 18. Store the master image in a secure format. (See Digital Preservation of this guide) 19. Add the access and thumbnail images to your Web site. (See Metadata and

Presenting your Digital Project of this guide) Digital Cameras

1. Align objects on copy stand directly under the camera. 2. Clean the camera lens. 3. Preview the image through the viewfinder or the LCD viewer on the camera before

you take the picture. This will give you a more accurate idea of what will be captured in your digital image. If you camera is wired directly to your computer, you may be able to preview the image on the computer screen.

4. Check the lighting in the room to be sure that it is correct. Turn off overhead lights and close curtains.

5. Check your camera settings for appropriateness of the object you are capturing. 6. Save the image as a high resolution TIFF. 7. Transfer the TIFF master to a file on your computer (with accompanying

documentation).

Chapter 4 -- Digital Production -44-

8. Pull up the image in the manipulation software. 9. Using the manipulation software, crop carefully. CAUTION: do not over-crop. (Master

files should maintain margin, showing to future users that the whole image has been digitized.)

10. Adjust histogram. (The graph of brightness values vs. number of pixels having that value, histograms are included in the image manipulation software; be careful this procedure reduces the amount of original information.)

11. Set gray mid-point (if used). 12. Adjust image size, if needed. 13. Make adjustments (tone, sharpness, noise, etc.) for the clearest image possible. 14. Check for quality against original. 15. Write second (derivative) file to TIFF or to JPEG. (This is the access or Web image.)

Change resolution (dpi/ppi) and write third (derivative) file to GIF or JPEG. (This is the thumbnail image.)

16. Store the master image in a secure format. (See Digital Preservation of this guide) 17. Add the access and thumbnail images to your Web site. (See Metadata and

Presenting your Digital Project of this guide) Digital Audio

1. Check the quality of the analog audio to ensure that they will not be damaged through conversion (stickiness or shedding require consultation with a conservator before conversion).

2. Hook up devices (audio playback device, analog-to-digital converter, and computer with audio software)

3. Choose WAV file at maximum allowable kHz and bit depth and write to digital file. 4. Make adjustments and changes to digital audio file using audio software.

Image Size and Proportion When trying to determine the size of the image as it will appear on a monitor, confusion often arises from the method of measurement. What is the difference in dpi, ppi, and lpi? The original image may be measured by inches, centimeters or millimeters. The screen image is measured by ppi (pixels per inch) or dpi (dots per inch). • DPI - Early manufacturers of laser printers devised the convention of DOTS PER INCH

to suggest the quality of the print of an image. The term dpi is used when preparing an image for printing.

• PPI - PIXELS PER INCH is a more accurate measurement for describing the image in its digital form on the monitor screen. The word pixel comes from "picture elements." Ppi is used to indicate the resolution of a photograph. Pixels do not have a size in a computer nor do they have a digital size. The computer converts pixels to numbers, and this array of numbers is recognized as an image. The size of the image or the scanned object determines the pixels. The horizontal measurement is always given first. The pixels in the image should not exceed the size of the screen. If it does, then users must scroll up and down to see the image, never being able to see all of it at once.

• LPI - LINES PER INCH is sometimes used interchangeably with dpi. For example, FotoLook software, which comes with some Agfa scanners, uses lpi instead of dpi as a measurement.

Chapter 4 -- Digital Production -45-

LANDSCAPE VIEW 640 x 480 pixels

PORTRAIT VIEW 480 X 640 pixels

A longer horizontal dimension indicates a "landscape" view. A longer vertical dimension indicates a "portrait" view. In the example above, if you want the entire image to show on the 640 x 480 pixel computer screen, you would have to resize the portrait view to reduce the vertical dimension to 480 pixels or less. The information in a pixel is fixed and does not change. What can change is the array of numbers of pixels. Increasing the numbers of pixels increases the size and resolution of an image. All scanning is a "sampling" of portions of the original. The higher the resolution of this sampling, the more real information you have to work with. However, the higher the ppi, the longer it will take the computer to load the image to the screen.

Chapter 4 -- Digital Production -46-

Consider the following examples when figuring out the proportion of computer screen image to original.

If the original image is 4 inches high, and you want a screen image which is 1 inch high, to get an image that is 1/4 of your original, scan the image at 25%. To get good detail select 300 ppi or higher.

When you are ready to place the image on the Web page, use a photo application such as Photoshop to reduce the resolution to 72 ppi, the standard Web image resolution. Your image will be 1 inch high and will read well on the Web. It will also load quickly, as the lower ppi allows for more rapid loading.

Let's say you want a print of your image, and you capture it in at 600 dpi (remember, dots per inch for printing). The image will print satisfactorily, but the screen resolution will be over-kill. Typically, the resolution of the access image should be about 1/2 the resolution you want to make a print. Note that the 300 ppi (remember pixels per inch for screen) used for the image above is 1/2 the print resolution of 600 dpi. Keeping your image resolution at 1/2 your print resolution is a good rule of thumb and will lessen the confusion often found between print resolution and image resolution. To re-cap: the print size of an image is important only for printing. If you go by the print pixel size and carry that image to the screen, it will usually exceed the screen size. For example, a printed image of 3 x 2 inches scanned at 300 ppi will be 900 x 600 pixels on the screen and the same image scanned at 600 ppi will be 1800 x 1200 pixels! Therefore, the image intended for a printer is far too large to fit most screens and must be re-sized with your photo application by lowering the resolution. COLOR information in an image is dependent on the number of colors or shades of gray that can be carried by a pixel. This carrying capacity is referred to as the pixel's dynamic range or its bit-depth. The standard minimum carrying capacity is 8 bit and currently the maximum (today) is 42 bits. However, most image viewing software cannot display 42 bits yet. A BIT is the smallest storage unit in a computer. It is the unit of measurement for determining the range of color or shades of gray found in an image. The greater the dynamic range or bit-depth, the greater the subtlety of color or gray. Remember the trinity of three types of scan: bi-tonal, grayscale, and color? There is a preferred bit-depth for scanning each of these.

Chapter 4 -- Digital Production -47-

Bit Depth

TYPES OF SCAN

PREFERRED BIT-DEPTH THIS MEANS

bi-tonal 1 bit each pixel is either black or white

grayscale 8 bit each pixel can be 1 of 256 shades of gray

color 8 bit or 24 bit 8 bit: each pixel can be 1 of 256 shades of color or 24 bit: each pixel can be 1 of 16.8 million color possibilities

One bit bi-tonal is obviously more suited to line drawings and text and 8 bit and 24 bit color more suited to images where the full range of colors are needed. Eight bit grayscale is the de facto standard for black and white photography and many graphics. One bit bi-tonal may not be satisfactory for some text and line drawings because it does not capture enough information. Experimentation with grayscale may also be necessary. Likewise, some badly faded color photographs may be better scanned in 8 bit grayscale. Some will wish to know why their institution should digitize at a higher resolution than a computer screen is able to present. The answer lies in potential and multiple uses. High quality printers (the type used in books and magazines, for example) use higher resolutions than do computer screens. If a publisher spots an image in a digital collection, he or she may wish to use it in a printed work. If the original has only been scanned for access (low resolution or compressed), then it will necessarily need to be scanned again at higher resolution and an uncompressed format. This means re-handling and another exposure to a bright light source. More handling and more light means more damage to the original. And who knows? Higher resolution computer screens and delivery systems that can handle the correspondingly large files may be a common part of homes and offices in the not-too-distant future. Signal to Noise or Image Quality Features The signal to noise ratio is the primary measure of image quality. If the signal to noise (S/N) ratio is high, the image quality will be high. However, measuring signal to noise is largely a subjective process. "Noise," or degradation of the image, is not generally a good thing. The more noise, the poorer the image. The aim in a good scan is to decrease the noise. There are several ways to do this that use standard image quality features found in most image manipulation software. The most common noise reduction features are: Despeckling Despeckling a photo makes global changes to the entire photo, smoothing color transitions in an image, which can help remove graininess. Descreening Often scanned images of half-tone printed objects end up with ripples or patterned "stars." These are called a moire effect. Descreening reduces this effect by defocusing the original scanned image. Low pass filtering Low pass filtering "averages" pixel information along lines of great contrast,

Chapter 4 -- Digital Production -48-

"smoothing" or "softening" an image. Decreasing contrast or balancing gamma (generally lowering it) "Gamma" is the contrast affecting the mid-level grays or midtones of an image. Adjusting the gamma of an image allows you to change brightness values of the middle range of gray tones without dramatically altering the shadows and highlights. Decreasing gamma "drops" the brightness or contrast. Adjusting and Optimizing Color Palette Some screen images will require color manipulation to match the original. If a change to a master image is required, it should be saved as a part of the metadata associated with the digital copy. Be careful that features are not used that can increase noise. The following features should be used with caution as they increase the noise of a scan: Increasing the contrast or increasing the gamma "Gamma" is the contrast affecting the mid-level grays or midtones of an image. Adjusting the gamma of an image allows you to change brightness values of the middle range of gray tones without dramatically altering the shadows and highlights. Increasing gamma "raises" the brightness or contrast. Aggressive color management and manipulation Color is generated for scanned images by "mixing" pixels of various basic color values, such as red, green and blue. By changing these values of individual pixels, the color mix of the overall image can be altered. Sharpening Accentuates the differences between adjoining areas of significantly different hue or tone. Reminder: Image quality is generally measured by evaluating some or all of the following features:

• Tone Reproduction • Resolution • Color Reproduction • Noise • Detail and edge reproduction • Artifacts including, nonuniformity, dust and scratches, streaks, color

misregistration, aliasing, contouring/quantization Monitor Display When images are viewed on a monitor, the monitor resolution will often vary with the type of monitor used. Standard monitor resolutions follow:

Monitor No. of pixels x No. of lines Quality

VGA 640 x 480 Low

SVGA 800 x 600 Medium

Chapter 4 -- Digital Production -49-

XGA 1024 x 768 High

SXGA 1280 x 1024 High

Monitors should also be calibrated for best results. Software packages like Adobe Photoshop often include a basic monitor-calibration tool. Quality Assurance Quality assurance needs to be performed throughout the creation of your digital images. While it may seem like a daunting task, digital images should be checked to assure that they are at the level of quality specified by your institution. Documents can be skewed and not noticed by the staff member creating the digital images. Always use more than one set of eyes to assure that your images are consistent and of high quality. It is recommended that you establish a system early for quality assurance and do the work on a regular basis. Digital objects tend to pile up, and you don’t want to have a large number to go through at one time. That situation can lead to sloppy checking. Some institutions choose to look at a sampling of the images once initial production set up has been established. A good quality assurance program will help you to stick to the “Scan Once Methodology” by assuring that the master image is accurate. Optical Character Recognition When a computer scans a text, all it duplicates are graphical bits on a virtual page. In other words, it creates a digital image or copy of the page. A user can not edit or search in the newly-created document. If that image is passed through an Optical Character Recognition program (OCR), the software converts the shapes it recognizes into individual letters, creating a text document. However, OCR recognizes and converts few documents perfectly. It makes frequent errors, especially if the original image is blurred, faded, or otherwise unclear. Work with OCR is even more labor intensive than straightforward imaging of pages, requiring a great deal of editing and quality control. It does, however, produce a much more versatile digital product. OCR’d documents must be proofed, word by word. Such things as unusual proper names, blemishes on the document, uneven light, tables, borders around text, offset fonts, superscripts and subscripts can throw the word recognition off. There are various devices available for capturing images for OCR, but a desktop scanner will also work. Because a desktop scanner digitizes the image by dividing it into hundreds of pixel-sized boxes per inch and represents each box with either a 1 or a 0, the OCR program organizes the patterns of dots into characters. This allows for the computer to translate character images into editable text. In addition to OCR, there is the PDF (Portable Document Format) which is an open and universal file format that preserves the fonts, images, graphics, and layout of any source document, regardless of the application and platform used to create it. Governments and enterprises around the world have adopted PDF to streamline document management, increase productivity, and reduce reliance on paper. PDF is available to anyone who wants

Chapter 4 -- Digital Production -50-

to develop tools to create, view, or manipulate PDF documents. A good place to research PDF’s is http://www.pdfzone.com. Documenting the Digital Object Many institutions are now realizing the importance of documenting the process of digital production. There are many reasons documentation is wise. One of the most critical reasons is that as technology changes, migration of earlier formats is a reality. The more documentation that is available, the easier and less costly the migration will be when it occurs. And it will occur. NC ECHO has created a preservation metadata standard that helps you document your digital production by encouraging you to record information about each digital object you create (http://www.ncecho.org/presmet/index.htm). Some aspects of the preservation metadata remain relatively stable while others will change for each image. But preservation metadata is not the only documentation you should consider. Other documentation that can inform your digital product and management include:

• Document planning decisions

Were all images in a collection scanned? If not, what was the thinking behind the selection?

• Document capture, editing, and processing decisions How were images manipulated for presentation? Were images batch processed, etc.?

• Keep institutional decision memos. These memos will allow you to revisit decisions made previously.

• Document revisions of institutional decisions. As with documenting initial decisions, any revisions need to be monitored and should be available for review.

• Administrative guidelines. Administrative guidelines and training materials provide a valuable framework for this digital project as well as future ones. Learn things once and benefit from those lessons in the long term.

• Workflow guidelines/ workflow revisions. This should clear up and issues with responsibilities and help you track particular images.

The Historic American Sheet Music Project from the Rare Book, Manuscript, and Special Collections Library at Duke University has created some documentation as an example: (http://scriptorium.lib.duke.edu/sheetmusic/procedure.html) Conclusion Digital production is the fun part of a digitization project, and it can be done relatively easily. Unfortunately, the ease can be deceptive. To produce a "down and dirty" quick digital image takes almost no time and effort at all. With just a bit more time, effort, and storage space, an image can be created that would • Better preserve the original by reducing the amount of times it would need to be

handled in the future • Be more easily migrated into new technologies and storage media • Serve multiple purposes such as providing sources for publishing as well as Web access.

Chapter 4 -- Digital Production -51-

To support the long-term viability of a digitization project, the production process should be thoroughly documented to help future caretakers and users of the images. Once there is a documented digital master, a variety of processes (including some that are automated) can be used to manipulate the access and thumbnail images for better presentation over the Web. These images are not expected to be surrogates of the originals, but "overviews" or "quick views" of them, pointing the way to either the original or master digital image. The scanner or camera awaits -- and it only takes a little more time and effort to give those images a better chance at varied use and long-term viability. Further reading Besser, Howard. Procedures and Practices for Scanning. http://sunsite.berkeley.edu/Imaging/Databases/Scanning/ CDP Digital Audio Working Group, Digital Audio Best Practices, version 2.0, November 2005, http://www.cdpheritage.org/digital/audio/documents/CDPDABP_1-2.pdf Research Libraries Group and Digital Library Federation. Guides to Quality in Visual Resource Imaging, http://www.rlg.org/legacy/visguides/visguide6.html Kenney, Anne and Steven Chapman. Digital Imaging for Libraries and Archives. Ithaca: New York, Department of Preservation and Conservation, Cornell University Library, June 1996. Kenny, Anne R. and Oya Y. Rieger. Moving Theory into Practice. Research Libraries Group, 2000. See also: http://www.library.cornell.edu/preservation/tutorial/index.html Williams, Don. “Selecting a Scanner,” Guides to Quality in Visual Resource Imaging, http://www.rlg.org/visguides/visguide2.html

CHAPTER 5 METADATA

Metadata is like interest --- it accrues over time. To stretch the metaphor further, wise investments generate the best return on intellectual capital. Carefully designed metadata results in the best information management in the short- and long-term.1

Overview How do we find the materials in our libraries, archives, museums, and historical societies? The descriptive tools that allow special collections to be accessed are in myriad forms. Yet, libraries, archives, museums, and historical societies--indeed all cultural heritage institutions--are dependent upon these tools of access to make them viable. Among the many information repositories, libraries have the longest history of providing accessibility in a standard format. The broad acceptance of cataloging conventions such as the Anglo-American Cataloging Rules and the MARC21 format allows users to move easily from library to library. In contrast, historical societies, museums, and archives, have often used locally developed cataloging and access tools, reflecting the special nature of their holdings. Archives, for example, hold materials in many different formats (e.g., manuscripts, oral histories, photographs, objects, and films). Historical museums are even more idiosyncratic, and art museums are hybrids, combining many objects, archives, and library materials. From institution to institution (and individual collection to individual collection), their access tools vary in descriptive elements and formats. The uniqueness of special collections has made the development and implementation of broad, uniform practices difficult, preventing broad cross-collection access. Recent advances, however, offer hope for greater, more uniform access in the near future. Digitization has been a clear part of those efforts, and every digital project must address metadata issues to provide the best access to their materials and to ensure that their collection information is available in the larger arena of digital access. Today, there are several good descriptive systems available for use in cultural institutions. The most widely adopted is Dublin Core, a general descriptive system used by many multi-partner digitization projects to manage their electronic resources. Other systems include Encoded Archival Description (EAD), which is a system of encoding archival finding aids, and the Text Encoding Initiative (TEI), a system for encoding textual documents primarily from the humanities and social sciences. These systems are generally favored by large, established institutions. Other descriptive systems have been developed for specific formats. Often, these individual systems can be related to each other through the descriptive elements that they share (e.g., creator/author or subject). This process is often referred to as "crosswalking." Shared collection access methods (e.g., searching by subject across the holdings of several archives or across an archive, museum, and library) were difficult to

1 Anne J. Gilliland-Swetland “Setting the Stage” in Introduction to Metadata (Getty Standards Program), 7/5/2000, http://www.getty.edu/research/conducting_research/standards/intrometadata/.

Chapter 5 – Metadata -54-

accomplish in the pre-digital age. With computers, the dream of shared access is rapidly becoming a reality. The uniform description of resources (what librarians have always called cataloging) in an electronic form is one of the first steps in creating shared access. Describing a resource is a difficult process, but an important one if the resource is to be accessible to the user. The more conformity to uniform practices, the more likely the resource will be located and used. The choice of a "cataloging system" is actually a choice of "metadata" formats. What is Metadata? Metadata is informally defined as “information about information” or any data associated with a resource that describes that particular resource. A more general definition that is useful for cultural institutions is “structured information about any information resource of any media type or format.”2 In this context, an information object is anything that can be addressed and manipulated by a human or a system as a discrete entity. The essential aspect of a metadata system that describes an object, then, is its ability to provide a structured format for information about that object. Metadata itself is essentially a modern term for the bibliographic information that libraries traditionally entered into their catalogs or registry information on collections that museums have entered into their systems; however, the term metadata is most commonly used to refer to descriptive information about World Wide Web resources. Cultural heritage institutions have been creating metadata for as long as they have been collecting cultural materials for their preservation and presentation to the public. The impact that the digital environment has had on metadata is the creation of electronic information in structured formats. The creation of metadata for digital resources is an important part of any digitization project and must be incorporated into a project’s workflow. Metadata should be created and associated with the digital resource to support the discovery, use, management, reusability, and sustainability of that resource. Metadata relating to digital resources is most often divided into five conceptual types (with some overlap among the five):

Descriptive metadata: information used for the indexing, discovery, and identification of a digital resource. Analytical metadata: information about the subject and context of a digital resource. Structural metadata: information used to display and navigate a digital resource; also includes information on the internal organization of the digital resource. Structural metadata might include information such as the structural divisions of a resource (i.e., chapters in a book) or sub-object relationships (such as individual diary entries in a diary section). Administrative metadata: information needed for the management of the digital resource, which includes information regarding access, display, rights management.

2 Priscilla Caplan, Metadata Fundamentals for All Librarians, (Chicago: American Library Association, 2003), p. 3.

Chapter 5 -- Metadata -55-

Preservation metadata: information about the digital image for preservation purposes, including the resolution at which the images were scanned, the hardware/software used to produce the image, compression information, pixel dimensions, etc., important for migration and long-term sustainability of the digital resource. This includes the technical aspects of the digital asset, including how it was created, what equipment was used, etc.

"Finding" or "accessing" holdings is the most visible role of metadata in the electronic environment. Today’s users are coming to the digital resource from their home, work, school, etc., at any time of the day, and often without the assistance of a librarian, archivist, curator, museum educator, or other cultural heritage professional. In addition, digital resources present their own unique characteristics, and cultural institutions need to consider these characteristics as they try to integrate management of these resources into their traditional holdings. Metadata for digital resources needs to provide information that:

• certifies the authenticity and degree of completeness of the content; • establishes and documents the context of the content; • identifies and exploits the structural relationships that exist between and within

information objects; • provides a range of intellectual access points for an increasingly diverse range of

users; • provides some of the information that an information professional might have

provided in a physical reference or research setting; • provides information about the digital resource to the information professional to aid

in the resource's sustainability. Unfortunately, there is no uniform metadata solution for all cultural materials. The metadata for text is different from the metadata for visual images. Further, the elements used to describe an object can change and grow as more becomes known about that object. Metadata should be thought of as a dynamic process. New metadata schemes for different formats of cultural materials or for different needs in managing those cultural materials emerge. It is important to stay current as the field of metadata grows and changes. How do I select the best metadata standard for my materials? As indicated above, there is a wide variety of metadata standards available to cultural institutions. Selection of a standard should be based on the needs of the repository and its users. Deciding which metadata system to use for a collection can be a very individualized process and a daunting one. Here are some general guidelines that can be followed while making choices about metadata systems: What is the purpose of the metadata process? I s there an institution similar to ours that is using a particular metadata documentation standard? Are they happy with the standard they chose? What would they do differently if they had it to do over? What is the reputation of the selected standard? How widely used is it? How old is the standard? Is it likely to be around for some time? I s m y pract ice and experience compatible with the standard? Can I understand the elements as they relate to my collection?

Chapter 5 – Metadata -56-

What sort of system am I going to be using to maintain my metadata? Does it have pre-defined fields, need additional resources, or do I need to develop a metadata system on my own? I f I select a specific standard, will my metadata be compatible with larger systems? (i.e., NC ECHO or one of the local major university's initiatives.) Recommended Metadata Standards for North Carolina After a review of the most prominent metadata systems, consortial requirements, the descriptive tools being used by the state's largest digitization projects, and the types and holdings of institutions throughout the state, NC ECHO issued the following policy on metadata:

North Carolina ECHO recommends that North Carolina institutions wishing to participate in the statewide digitization project follow the metadata standards of at minimum North Carolina Dublin Core, while acknowledging that some participating institutions may additionally employ the more robust descriptive systems such as MODS, EAD, TEI and others.

NC ECHO chose Dublin Core because it can be used to describe a wide variety of digital resources. It is the base line of metadata standards. In its simplest form, it provides a basic level of access that involves the completion of only seventeen fields of information. In addition, Dublin Core is relatively easy to crosswalk from other metadata systems, so existing descriptive systems (even if they are pretty minimal and homegrown) can conform to the Dublin Core fields, which are extremely basic. To learn more about the Dublin Core consult its web site (http://www.dublincore.org/). Dublin Core Elements Dublin Core is composed of 17 element sets (see table below). They are familiar points of description and access to most workers in and users of cultural institutions.

The 17 Dublin Core Metadata Element Set* Summary

TITLE DC.Title

The name of the object. The title of a book, name given a work of art, name of manuscript collection, map name, etc. If item is unnamed, give the item descriptive title. Omit articles such as 'the', 'a', 'an', etc., which often come at the beginning of a descriptive title.

CREATOR DC.Creator

The person(s), family(ies), organization(s), or corporate body(ies) primarily responsible for the creation of the object, collection, item being described.

Chapter 5 -- Metadata -57-

SUBJECT DC.Subject

What the content of the resource is about or what it is, expressed by terms, including: topical, personal, corporate, or geographic for significant people, places, organizations, events, and topics reflected.

DESCRIPTION DC.Description

A textual description of the content of the resource, such as an abstract, tables of contents, or free-text account of the object. This information can be taken from the object or provided by the record creator and can include specialized information not included in other elements.

PUBLISHER DC.Publisher The institution or repository that makes the resource available on the Web.

CONTRIBUTOR

DC.Contributor

The person(s), family(ies), organization(s), or corporate body(ies) that made significant secondary contributions to the creation of the object, collection, or item being described.

DATE DC.Date The date of creation of the original item.

TYPE DC.Type The genre or nature of the resource, such as sound recording, image, physical object, collection, or text.

FORMAT

DC.Format.Extent The extent of the original item being described. Can be in number of pages or linear feet, dimensions, etc.

DC.Format.Medium The physical manifestation of the original object represented by a controlled vocabulary term.

IDENTIFIER DC.Identifier

A character string or record number that clearly and uniquely identifies a digital object or resource. The Identifier element ensures that individual digital objects can be managed, stored, recalled, and used reliably. NC ECHO recommends the use of This element may be the accession number, record number, ISBN number, or the URL (Universal Resource Locator or World Wide Web address).

SOURCE DC.Source

A reference to an aggregated resource from which the present resource is derived. The Source element is used to cite any other resource from which the digital resource was derived, either in whole or in part. Some digital resources are “born digital” and derive from no pre-existing resource; in these cases, the Source element is not used. Note the relationship between the Source element and the Relation element. Because the Source element shows a derivative relationship with another resource, it is used only for that purpose. Other relationships should be included in the Relation element.

LANGUAGE DC.Language

The language(s) of the intellectual content of the resource. This can be the language(s) in which a text is written or the spoken language(s) of an audio or video resource.

Chapter 5 – Metadata -58-

RELATION DC.Relation

The relation of the resource being described to other resources. Element includes a variety of refinements to express the kind of relationship that exists between the resource and the other objects.

COVERAGE DC.Coverage.Spatial The geographic location(s) associated with the

resource. DC.Coverage.Temporal The time period associated with the resource.

RIGHTS DC.Rights

A rights management or usage statement, a URL that links to a rights management statement, or a URL that links to a service providing information on rights management of the resource.

AUDIENCE DC.Audience The audience for whom the resource is intended. Not application in all situations.

PROVENANCE DC.Provenance Information about the custodial history or acquisition of the resource by the institution.

NC ECHO has a working group that examines the Dublin Core standard and provides implementation guidelines for NC ECHO participating institutions. These guidelines provide a general introduction to the Dublin Core standard and should assist institutions in analyzing their existing descriptive systems and adapting them to at least the minimal requirements of Dublin Core. Each element has been examined and specific implementation guidelines are included in the guidelines. In addition, the NC ECHO Dublin Core template provides an online tool for the creation of Dublin Core metadata. This web form will help with syntactic expressions and assure uniformity in the creation of HTML-coded Dublin Core so that institutions can concentrate on the content of the metadata rather than its computerized structure. The template and use documentation are available at http://www.ncecho.org/ncdc/index.htm Other Metadata Standards While Dublin Core is the base line, minimum recommendation for metadata standards, there are other standards that provide richer descriptive tools, retrieval possibilities, and other management capabilities for specific types of cultural materials. For example, Dublin Core is not as efficient a tool as some systems when describing relationships between materials and hierarchies of information. This can be significant in creating description for manuscript and archival collections. Typically, individual collections of manuscripts are composed of series of materials, and a series of material is composed of subseries of materials, and a subseries of material is composed of boxes of materials, and a box of material is composed of folders of materials, and a folder of material is composed of individual items. Another metadata standard, Encoded Archival Description (EAD), has been developed to address the need to describe relationships between materials and is discussed here in more detail. A brief list of other metadata standards follows. EAD - (Encoded Archival Description) Encoded Archival Description (EAD) is a metadata system that leverages the structure of archival description found in archival finding aids through its encoding standard. It is an Extensible Markup Language (XML) document type definition (dtd) that enables EAD-encoded finding aids to be searched, retrieved, displayed, and exchanged. EAD is platform-

Chapter 5 -- Metadata -59-

independent and is maintained by the Society of American Archivists. It is a recognized international standard. EAD is especially helpful in information retrieval because of its ability to identify particular areas of description in the finding aid and its ability to present information in a hierarchical fashion. By marking up a finding aid in EAD, the relationships between the series and subseries are maintained in the retrieval of the information about the collection. NCEAD is NC ECHO’s working group on the implementation of EAD in North Carolina. NCEAD has generated Best Practice Guidelines, tools, and supporting documentation to ease the implementation of EAD for North Carolina institutions. See http://www.ncecho.org/ncead/index.htm.

Society of American Archivists EAD Resources

• Encoded Archival Description Application Guidelines, Society of American Archivists, 1999, http://www.loc.gov/ead/ag/aghome.html

• Encoded Archival Description Tag Library Version 2002, Society of American

Archivists, 2002, http://www.loc.gov/ead/tglib/index.html

• Official EAD Web Site, http://www.loc.gov/ead/

• EAD Help Pages, EAD Roundtable, http://www.iath.virginia.edu/ead/ EAC (Encoded Archival Context) EAC is an emerging standard for the description of record creators. It provides sections on identity, description (both formal and informal), relationships, and record maintenance. The standard approaches cultural heritage materials from a new perspective. Rather than describing materials, it describes the creators and provides connections to the materials relevant to those creators. NC ECHO has a working group, NCEAC, that has examined the beta standard and adopted a union model for the NC ECHO project, entitled “North Carolina Biographical and Historical Information Online” (http://digitalnc.org/ncbhio/index.htm). This project includes content guidelines, input forms, and browse capabilities for existing records. Most importantly, the project relies on partner institutions contributing information about the people, families, and corporate bodies that have created the state’s cultural heritage materials. EAC Standards Documentation

• Encoded Archival Context Beta http://www.iath.virginia.edu/eac/

MODS (Metadata Object Description Schema) MODS is an XML schema developed by the Library of Congress. It is described as a bibliographic element set, but it may be used for a variety of different types of resources. MODS should be considered a richer metadata set than Dublin Core, with the advantages of the XML platform. It has been derived from the MARC standard, but provides a flexible platform for the description of digital objects. Library of Congress MODS site

Chapter 5 – Metadata -60-

• Metadata Object Description Schema http://www.loc.gov/standards/mods/

Visual Resources & Object Standards Categories for the Description of Works of Art (CDWA) CDWA was created by the Getty Art Museum for the description of works of art and is used throughout California Museums cataloging information for their holdings. See http://www.getty.edu/research/conducting_research/standards/cdwa/ Cataloging Cultural Objects (CCO) CCO was designed for the description of many types of cultural objects, including architecture, archaeological sites, and artifacts as well as functional objects from the realm of material culture. Like CDWA, though, it focuses on works of art and their visual surrogates and is not directly intended for historical objects, science and technology specimens, and the like. It focuses on the data content standard and recommendations of controlled vocabularies. The primary emphasis is descriptive metadata intended to describe a cultural work. That description is then used in systems intended to manage that data. CCO excludes administrative and technical metadata in so far as they do not impact the description of the object, and it is therefore recommended that CCO be used in conjunction with other standards to address all the metadata needs of an institution. Visual Resources Association Core Categories (VRA Core) This standard was created for the description of visual resources, such as photographs. It is a content standard and therefore does not provide structured environment to leverage the standard in a computer environment. However, like CDWA, it can provide some help in determining where to find(?) certain points of information regarding an item that you will use in your metadata. http://www.vraweb.org/vracore3.htm To address the issue of metadata for visual resources and objects, NC ECHO has collaborated with the North Carolina Museums Council (NCMC) and visual resources archivists to create a Metadata Working Group. This group is analyzing existing metadata standards, primarily CCO, to create basic content guidelines for the description of visual resources and objects. These guidelines, along with recommendations for implementation with various collection management systems, will be available soon. Text Encoding Standards Text Encoding Initiative (TEI) TEI is the standard system of encoding transcribed documents for presentation on the Web (often rare books, pamphlets, etc.). It is not used to mark up finding aids or to "catalog" digital resources as Dublin Core is. TEI is, however, one of the most prominent systems used to bring full-text resources (and not just images of those resources) to researchers via the WEB. The TEI (http://www.tei-c.org/) provides guidelines for the long-term preservation of electronic data, and a means of supporting effective usage of such data in many subject areas. It is the encoding scheme of choice for the production of critical and scholarly editions of literary texts, for scholarly reference works and large linguistic copora, and for the management and production of detailed metadata associated with electronic text and cultural heritage collections of many types. NC ECHO is working on recommendations for the implementation of TEI for various texts, focusing in particular on the structure of the “bibliographic information” located in the TEI header. These guidelines will provide both

Chapter 5 -- Metadata -61-

general information on the TEI standard as well as document-type templates to be used for the wide variety of materials that can be encoded using TEI. Oral Histories Oral histories present interesting issues for metadata. NC ECHO is working with an Oral History Metadata Group to provide guidance on metadata for institutions that maintain oral history collections. The group will produce recommended guidelines for collection description as well as item-level oral history description. Preservation Metadata Maintaining information about the creation and maintenance of your digital objects is an important aspect of digitization because it ensures the longevity of your work. NC ECHO has constructed a preservation metadata standard to aid in the long-term sustainability of the digital content created in digitization projects. The tools developed include a content standard as well as a Microsoft Access database tool available for institutions that might need it. See http://www.ncecho.org/presmet/index.htm for more information. “Crosswalking” "Crosswalking," the ability to move data across several different platforms, may be thought of as translating an element set in one metadata system to a related element set in another metadata system. This translation allows a user to search across the two systems. Crosswalking is also referred to as "mapping." As defined by a NISO White Paper, October 1998, a crosswalk is "a set of transformations applied to the content of elements in a source metadata standard that results in the storage of appropriately modified content in the analogous elements of a target metadata standard." For more detailed information on crosswalking, see “Issues in Crosswalking Content Metadata Standards” (http://www.niso.org/press/whitepapers/crosswalk.html). The crosswalking chart below demonstrates that many metadata systems share the same conceptual fields, even if those fields are not called the same thing in different systems. It is NC ECHO's goal to use crosswalks to tie together the different metadata standards employed by the state's cultural institutions. By creating consistent and standardized metadata throughout the digitization projects and representations of your collections, you are contributing to this goal. The inclusion of the metadata standards below does not provide the comprehensive

Chapter 5 – Metadata -62-

Crosswalking summary Manuscripts and

Archives Photographs Oral History Objects Maps MARC Dublin Core EAD

<archdesc>

Title Title Title Object name or Title Title 245 Title <unittitle>

Author/Creator Photographer Interviewee Creator/Maker Cartographer 1XX Creator <origination> Contributor Contributor Interviewer Contributor Contributor 7XX Contributor <origination>

Notes: Biographical, Scope & Content

Notes: Biographical, Scope & Content

Notes: Biographical, Scope & Content

Notes: Description of Object

Notes: Description of Map

520 545 Description

<abstract> <bioghist> <scopecontent>

Date(s) Date of creation Date(s) of interview

Date of creation

Date of creation

245 ╪f 245 ╪g 260 ╪c

Date <unitdate>

Material type Physical medium Physical medium

Medium of material, material type

Geospatial reference data

340 342

Format. Medium

<physdesc>: <genreform> <physfacet>

Volume(s) Number of photographs, Dimensions

Length of interview, number of tapes

Dimension Dimension 300 360

Format. Extent

<physdesc>: <extent> <dimensions>

Accession number, collection number ID number ID number ID number ID number

035 Identifier <unitid>

Language n/a Language n/a Language 040 Language <langmaterial> Access & Reproduction, Copyright

Permissions Copyright

Permissions Copyright

Permissions Permissions Copyright

540 Rights <userestrict>

Repository Repository Repository Repository Publisher Repository

500 710

Publisher <repository>

Subje

cts

Personal names Personal names

Personal names Personal names

Personal names 600 Subject

<controlaccess> <persname>

Corporate names

Corporate names Corporate names

Corporate names

Corporate names

610 Subject <controlaccess> <corpname>

Places Geographical names

Geographical names

Geographical names

Geographical name 651

Coverage. Spatial

<controlaccess> <geogname>

Topics Subject Subject Subject Subject 650 Subject <controlaccess> <subject>

Chap

ter 5

-- M

etad

ata

-6

6-

Chapter 5 -- Metadata -63-

Shareable Metadata The principle of shareable metadata goes to the heart of metadata for digitization projects that are published on the Web. Shareable metadata refers to the concept that metadata be generated that conforms to standards and is inclusive of data elements that allow for contextual understanding. Fields such as “repository” (DC.Publisher) and conformance to technical standards all comprise components of shareable metadata. NC ECHO promotes the creation of shareable metadata by its partner institutions through its various implementation and best practice guidelines. For more information about the concept of shareable metadata and the reasons for its application, see “Moving toward shareable metadata” by Sarah Shreeves, Jenn Riley, and Liz Milewicz, available at http://www.firstmonday.org/issues/issue11_8/shreeves/index.html. Controlled Vocabularies A controlled vocabulary is a set of terms used consistently and defined very carefully. It helps little if archivists, museum professionals, and librarians recognize the same metadata fields, but then choose to fill them with their own descriptive phrasing. That is where controlled vocabularies enter the picture. A controlled vocabulary is used when the search results need to be consistent. If indexing is to work, a controlled vocabulary is a must. Several different descriptive elements lend themselves to controlled vocabularies. Names of creators or contributors, genres or mediums, and subject listings all reap the benefits of controlled vocabularies. Other fields, such as Date and Language rely on data content standards that dictate the way that that information is entered. NC ECHO metadata guidelines provide instructions on these data content standards wherever possible. The best practice is to select terms from controlled vocabularies, thesauri, and subject heading lists to use as subject elements, rather than just using keywords. Employing terminology from controlled vocabularies ensures consistency and can improve the quality of search results. It also can reduce the likelihood of spelling errors when inputting metadata records. Recognizing the diverse nature of the statewide initiatives and the involvement of a broad range of cultural heritage institutions, controlled vocabularies have been expanded to include subject discipline taxonomies and thesauri. Several states are developing geographic-based lists of terms that may be helpful in achieving a level of consistency in terminology. Many of the thesauri, subject heading lists, and taxonomies are currently available via the web. Each controlled vocabulary or thesaurus also comes with its own instructions for use which should be consulted. It is the proper application of a controlled vocabulary that allows for appropriate, shareable metadata. Describing your digital project While metadata is essential to facilitate the use of the materials within your digital project, you should also consider the use of an overall description of your digital project. Primarily associated with the homepage to the project, the inclusion Project Dublin Core at that level will greatly facilitate the location and inclusion of your digital project in consortial and aggregated online resources. The NC ECHO Dublin Core Implementation Guidelines provides an appendix (http://www.ncecho.org/ncdc/ncdublincore2007.htm) that outlines the application of Dublin Core to a digital project as a whole. It is the creation of this metadata

Chapter 5 – Metadata -64-

that makes easy the inclusion of your digital project in NC ECHO’s Catalog of Online Collections and Exhibits. Conclusion North Carolina's cultural institutions could scan their entire holdings. They could post on the Internet a digital image of every item sitting on their shelves and in storage cases. They could fill computer server after server with good information, but if it takes a researcher six weeks of scrolling through screens to find what he wants, all of that scanning will have been performed in vain. Metadata, information about information, helps researchers find what they are looking for. If institutions use standard systems of metadata and apply them in standardized ways, they provide their researchers with tools that will help them identify resources within their institutions and will lead to the ability to search across repositories. Further Reading Caplan, Priscilla. Metadata Fundamentals for all Librarians. Chicago: American Library Association, 2003. Duval, Erik, et al. “Metadata Principles and Practicalities” in D-Lib Magazine, 8(4), April 2002. Hodge, Gail. Metadata Made Simpler. Annapolis: NISO Press, 2001. Hudgins, Jean, Grace Agnew, and Elizabeth Brown. Getting Mileage out of Metadata: Applications for the Library. Chicago: American Library Association, 1999. Introduction to Metadata: Pathways to Digital Information. Martha Baca, ed. California: Getty Information Institute, 1998. http://www.getty.edu/research/institute/standards/intrometadata Smith, Terence R. (1996). “The Meta-Information Environment of Digital Libraries.” in D-lib Magazine. July/August 1996. St. Pierre, Margaret and William P. LaPlant, Jr. Issues in Crosswalking Content Metadata Standards. 1998. http://www.niso.org/press/whitepapers/crosswalk.html Taylor, Arlene. The Organization of Information. Englewood, Co.: Libraries Unlimited, Inc., 1999. Weibel, Stuart (1995). “Metadata:The Foundations of Resource Description” D-Lib Magazine. July 1995 Zeng, Marcia Lei. "Metadata Elements for Object Description and Representation: A Case Report from a Historical Fashion Collection Project." Journal of the American Society for Information Science 50, no. 13 (1999): 1193-1208.

Selected Metadata Schemes CDWA: Categories for the Description of Works of Art.

http://www.getty.edu/research/conducting_research/standards/cdwa/

Chapter 5 -- Metadata -65-

Dublin Core. http://www.dublincore.org/. NC Dublin Core. http://www.ncecho.org/ncdc/index.htm

Encoded Archival Description.

http://www.loc.gov/ead/ NCEAD. http://www.ncecho.org/ncead/ MARC http://www.loc.gov/marc

Furrie, Betty. Understanding MARC Bibliographic: Machine-Readable Cataloging. The Library of Congress, 2003. http://www.loc.gov/marc/umb

Understanding MARC Authority: Machine-Readable Catalgoing, The Library of Congress, 2005. http://www.loc.gov/marc/uma/

METS http://www.loc.gov/standards/mets/

METS: An Overview and Tutorial http://www.loc.gov/standards/mets/METSOverview.html

TEI: Text Encoding Initiative http://www.tei-c.org/

Teach Yourself TEI, http://www.tei-c.org/Tutorials/index.html Seaman, David. The Electronic Text Center Introduction to TEI and Guide to Document Preparation. http://etext.lib.virginia.edu/tei/uvatei.html

VRACore http://www.vraweb.org/vracore3.htm

CHAPTER 6 DIGITAL PRESERVATION

The preservation of cultural heritage materials is a cornerstone of our work as professionals. Ensuring the longevity of the materials entrusted to our care is a concern no matter what other activities you undertake. Clay tablets, stored properly, will last thousands of years. Good rag paper, if kept away from pests and in the right environment, will last hundreds of years. The proof of their longevity lies in museums and rare book collections around the world. Someday, future curators, archivists, and librarians will know how long a floppy disk, a CD, or hard drive will last, but until then we have only estimates. The creation of digital surrogates in no way alleviates these concerns. Instead, it presents a new arena with preservation concerns. This chapter discusses the preservation of digital materials. It does not cover issues associated with traditional preservation. Those measures will not be discussed in light of the voluminous material already available to meet those challenges. The life of inadequately stabilized and housed original documents, artifacts, published works, or works of art will not be extended through the process of digitization. Digitization will do nothing to help the condition of the original. In short, digitization is not a preservation measure for originals, even though it may mitigate against further damage by providing access to surrogates. Institutions can invest a great deal of effort and expense in the digitization of their collections and in the presentation of these digitized collections online. Until recently, the digital world has not given high priority to the preservation and storage of its content. The very nature of the Web is one of impermanence and fluidity, and digital information presented via the Internet has grown with little regard for preservation. Fortunately, cultural institutions by their very nature think long-term, and preservation of digital material has emerged as a major concern in the cultural heritage community. It is necessary that digital objects created remain accessible for as long as possible both to the intended users and the wider community. Digitization projects are costly and time consuming and the digitization process can subject original materials to potentially damaging exposure to light and excessive handling. Because of these realities, it is essential to see that the digitization process need not be repeated. Doing it right the first time, following the “scan once methodology,” and properly preserving the digital items produced saves money, time and originals from additional handling and unnecessary wear and tear. More than any other aspect of a digitization project, digital preservation, by its very nature, requires vigilance on the part of cultural heritage professionals. Digital preservation is a new and developing field. Cultural heritage professionals must monitor research being conducted in the fast-paced world of digital technology. In addition, the issues surrounding the preservation of digital objects are of a more immediate imperative than traditional materials. This is due to the lack of stability in digital storage media and the necessary equipment required to interpret and to access digital materials. This chapter seeks to equip you with the ability to do both. It introduces the challenges of digital preservation and provides information and recommendations to guide the decision making for digital preservation strategies.

Chapter 6 -- Digital Preservation -68-

Digital Preservation Challenges At its most fundamental level, digital material is comprised of ones and zeros. Software programs are written to create and interpret this binary structure into a variety of forms that we use such as images, text documents, databases, video, sound files, etc. Without an appropriate machine, even the most simplistic of these are impossible to decipher. Consider the variety of computer hardware and software available readily and how fast popular computers and programs become obsolete. Hardware obsolescence Hardware obsolescence refers to the maintenance of the appropriate equipment to read the digital storage media selected. For instance, just five years ago, a 3 1/2-inch floppy disk was used for virtually everything digital anyone wanted to save or transport. Yet in today’s computers, 3 1/2-inch floppy disk drives need to be specially ordered in order to be installed on a standard PC. Even if your storage media remains unharmed over the ravages of time, eventually there will be no computer extant that has the appropriate mechanism to read that medium. Without the appropriate hardware, files contained on that floppy disk are lost – irretrievable. In order to prevent this kind of preservation problem, institutions and companies are maintaining working machines that will read the variety of medium that have been supplanted with current or advanced technologies. This kind of access, though, can be expensive. Maintaining the hardware requires space and expertise and outsourcing obsolete storage media for access requires funds that may not be readily available. Software obsolescence Closely connected to the issues of hardware obsolescence is the development of software applications. In interacting with those ones and zeros, software programs use their own systems for creation and interpretation. Often times, a file created in one software program cannot be interpreted by another because the program does not know how to read this particular configuration of ones and zeros. Updates to newer versions of software or changes from one to another can create obstacles to access for those digital objects. Fragility Both hardware and software obsolescence underscore the fragility of digital objects. Their very nature requires machines that are supplanted as technology advances and they require programs that are updated as new functionality becomes available. However, there is another kind of fragility that needs to be considered as well, for fragility is no just an operational matter. As with our more traditional objects, digital media have their own specific environmental concerns, and like the material it stores, degradation to the digital media can be more subtle and is often not identifiable to the naked eye. Keeping Current Digital preservation is a fast-growing area in research for information science and technologies. The Commission on Preservation and Access (CPA), the Digital Preservation Consortium, CLIR (Council on Library and Information Resources), and the Digital Library Federation (DLF) are working to develop methodological frameworks, and to ensure continuing research and development in the little understood areas of digital preservation.

Chapter 6 -- Digital Preservation -69-

A series of publications and additional initiatives can be found on the preservation pages of DLF, available at: http://www.diglib.org/preserve.htm.

Cornell University has created a thorough tutorial on digital preservation management available at: http://www.library.cornell.edu/iris/tutorial/dpm/eng_index.html.

Additional resources can be found in the Further Reading section of this chapter and the Resources section at the end of these Guidelines. Creating a Digital Preservation Strategy How can an institution keep up with all the various types of digital files, programs, and computers being used in-house much less keep abreast of emerging technologies? How much preservation is enough? How much is too much? What are the deciding factors for your institution? These are all issues that need to be considered when devising a digital preservation strategy. Here are four places to start: 1) Software/hardware migration. Because of the issues of obsolescence, all digital

objects must be migrated at some point, at the very least to a file format that the latest technology can recognize. If you have chosen to preserve the whole system, then operating systems and functional software must be migrated as well. Full system migration must be carried out frequently to ensure access and usability. In order to protect your digital assets, you want to formulate a migration policy that is implemented on a regular basis rather than a reactionary action to new software or hardware. After migration, it is crucial to test your documents to ensure that functionality has been preserved.

2) Physical deterioration of digital media. As with other formats, all digital media

deteriorates over time. This process will be more rapid if storage conditions are bad, such as a damp basement, or as a pile of CDs stacked one on top of another. Correct storage (e.g. in racks that enable the disks to be stored separately) and an environmentally controlled location will help to optimize stability and protect digital information from loss. Digital media should be checked and refreshed regularly to ensure that the data is still readable, and this process should be part of your preservation policy. Preserve your data on a medium where the hardware exists to transfer to a later medium if the original becomes obsolete. Remember that it is costly to use a data recovery agent to move files from an obsolete medium, so make sure your preservation policy will prevent this from happening, and migrate while the process is still straightforward. Digital media should also be part of an institution’s disaster preparedness plan. See Project Management for more details on disaster plans.

3) Metadata. Information about the creation and maintenance of your digital objects is

crucial to their preservation. The NC ECHO Preservation Metadata for Digital Objects (http://www.ncecho.org/presmet/index.htm) provides the basic elements that should be recorded to inform you on the properties of your digital objects. Many collection management or digital asset management systems have incorporated this into their structure, but other digitization projects will be created outside of these systems. It is important to assure that this metadata is recorded somewhere.

4) User needs and preferences. This is a complex issue that may cause certain formats

to become effectively obsolete even while they remain technically functional. User

Chapter 6 -- Digital Preservation -70-

acceptance—and its decline—will be one of the key “trigger events” that will compel migrations to new delivery versions of digital collections.

The Digital Preservation Process An institution can easily become overwhelmed by the avalanche of issues that impact the process of planning for digital preservation. This section seeks to addresses several issues at the core of digital preservation including digital storage during the digitization process, migration of digital material, storage media for the short term and long term, and trends for the future. It includes guidelines on what is currently considered minimum recommended practice and best practice for digital preservation with the understanding that these standards are fluid and require revisiting often. The very best result that cultural institutions can hope to achieve for the long-term sustainability of digital material will be accomplished through good digital preservation planning and vigilant management. There are essentially five main storage applications that occur during the digitization process: production, data transportation, presentation to the public, backup or archiving, and migration. Production The production or creation of digital material generally requires sufficient hard disk capacity to store working files while they are being manipulated and developed. If the collection is considerable and there is a large production environment, a Redundant Array of Inexpensive Disks (RAID) may be the most appropriate however storage for active files can generally be handled by a large hard drive. Be warned that determining what will reside on your hard drive and what will be forwarded to a server in networked environments can often be difficult because multiple versions of files can become confusing. It is important to outline the various processes you need to perform on the same images and then determine how many “active” files you need at any one time. This demonstrates the role of file management and file naming in the preservation process. Data Transportation Generally, moving digital information is handled by portable storage devices such as recordable CDs (compact disks) and more recently by DVDs (Digital Video Disks or Digital Versatile Disks). The capacity of the CD and DVD is greater than that of the tape drive, an early favorite for data transportation, though its transfer rate is slower than tape. Another feature of the CD is its compatibility across platforms. The CD-R (a CD which can only be written upon once) is a secure format; its "write once mechanism" does not allow overwriting. CD-RW (CD Read/Write) is less secure but more versatile. Presentation to the Public Most institutions making images of their collections available to the public via the Internet make use of in-house servers or rent space on commercial servers.

Chapter 6 -- Digital Preservation -71-

Backup/Archiving Digital collections should be backed up in a format that is easily accessible and stored remote from the original source on a routine basis. When evaluating storage for backup, the inevitable dilemma is between speed and cost. Most managers prefer tape for backup, as it may be used at non-peak hours, when speed is not an issue. For small networked systems, tape backup is the common practice. Remember that no digital media is considered permanent. Migration In the context of digital preservation, migration refers to the shifting of digital objects from old media formats and software programs to newer ones. Migration of backed-up digital material needs to be as easy and cost-effective as possible for institutions to buy into a system. The continual drain on fiscal resources to repeatedly upgrade equipment and software can be borne by some institutions, but others will find it difficult to stay abreast of continual migration. Decisions must be made in every institution concerning what information will be saved and migrated and what will not based on a combination of cost effectiveness, intellectual necessity, and moral and professional obligation. When making these tough decisions, refer to the four preservation strategy issues discussed above. Types of Digital Storage Media Longevity of a digital medium depends on many factors – the type of media (CD, DVD, tape, etc.), how often and the way in which the media is handled, and how the media is stored. It is important to keep in mind that even with proper maintenance and great luck, no digital format is permanent or archival. The very best result that cultural institutions can hope to accomplish is long-term sustainability of digital material through good preservation planning and vigilant management. The storage media is an essential part of that process. There are two types of digital storage media - portable and non-portable. Each has advantages and disadvantages for long-term storage. Portable Media CD (CD-R and CD-RW)

CD-R or Compact Disk Recordable is a format that requires a CD-ROM drive to read and to write. The CD-R format is an inexpensive way to store digital object masters, which typically require many megabytes of storage. Currently the CD-R disk will store 650 MB, though approximately 100 MB of those bytes should remain free to allow for manipulation of the data. These disks, unlike the common music CDs, are more susceptible to scratches, to fingerprints and to extremes in temperature and light. They should be handled and stored with great care. They are also susceptible to other destructive agents. If writing on the disk, only a water-based felt-tip pen should be used. An alcohol-based felt-tip pen can migrate through the protective layer and possibly affect the integrity of the data. CD-R conforms to the ISO standard 9660, which is an established standard that allows a file system to be used under a variety of operating systems. The standard

Chapter 6 -- Digital Preservation -72-

applies only to the data track of a CD-ROM and not to audio tracks or any other media, such as erasable-optical drives. Thus CD-Rs may be read by any of a variety of operating systems such as UNIX and MS-DOS. CD-RW or Compact Disk-Rewritable format allows the re-writing of information on the disk, requiring a special CD-RW drive. These disks will not run on a standard CD-R drive unless they are specially formatted to run on that drive. Unless an institution requires the convenience of re-write, the CD-R format is the better choice. Currently many prefer the CD-R (Compact Disk-Recordable) format for archival storage, though there is debate regarding its archival quality. To store data on a CD-R generally requires that the data be gathered on a hard drive and then “written” to the CD-R.

DVD – (DVD-R and DVD-RW)

DVD technology (Digital Video Disk or Digital Versatile Disk) is a recent addition to the growing optical disk technology market. DVD is backwardly compatible, so it may be used to read CD disks. But CD-R and CD-RW drives cannot read a DVD disk. A DVD-ROM drive will be needed to read DVD-R disk, but some DVD-R disks do not play on some machines. DVD-ROM is different than DVD-VIDEO. The former handles data, while the latter is reserved largely for the commercial video market. A DVD-R disk will hold approximately 4.7 gigabytes. The enormous storage available on a DVD makes it appealing as a storage device; gold standard DVDs have been developed to meet archival standards.

Improving the lifespan of CD’s and DVD’s

Always Avoid Never Store media in controlled archival environment

Damage to the upper and lower surfaces and edges of the disk

Attach of fix anything to the surface of a disk

Store media in jewel case or protective sleeve when not in use

Scratching and contact with surfaces that might result in grease deposits (e.g. human hands)

Write on any part of the disk other than the plastic area of the spindle

If using sleeves, use those that are of low-lint and acid-free archival quality

Exposing disks to direct sunlight

Wear gloves when handling the master disks

DAT Tape, DLT Tape, ZIP® and JAZ® drives

Tape, ZIP® and JAZ® drives are all magnetic media, and magnetic media is NOT recommended for long-term storage. Tape is, however, an excellent intermediate medium, particularly for transport of data and for backup.

Chapter 6 -- Digital Preservation -73-

Improving the lifespan of DLTs

Always Avoid Never Keep tape in its protective case when not in use

Placing tape near magnetic fields

Stack tapes horizontally

Move tape in its case Moving the tapes about Put adhesive labels on the cartridge

Store the tapes in appropriate archival environment

Exposing disks to direct sunlight

Touch the surface of the tape

Store tape vertically Put a tape that has been dropped in a drive without first visually inspecting it to make certain that the tape has not been dislodged or moved

The above charts, modified from tables in the NINCH Guide to Good Practice (available at, http://www.nyu.edu/its/humanities/ninchguide/XIV/), can assist in making sure digital storage media lasts as long as possible.

Non-portable media

Network Servers (drives) The minimum storage space recommended for network servers changes every three to six months. Suffice it to say that if a server is required, it should be purchased to be adequate for the first two years of the project. Depending upon the size of their digital holdings, larger institutions may need to upgrade on an annual basis, especially if production levels of digital materials are high. Hard Drives (PC disk drives) It is recommended that institutions purchase the largest hard drive they can afford. If it is possible to purchase two hard drives, this will provide a more flexible storage system. If managers of digital projects use hard drives for image storage, they should defragment them on a regular basis to maintain optimum performance. Hard drives are not recommended for long-term storage. The following recommendations are based on available hardware in the medium price range:

Minimum Processor: Pentium II (300-450 MHz)* Recommended Processor: Pentium IV (up to 512 Mhz) Minimum configuration: 40-80 GB for one hard drive and expansion slot for additional hard drive. Recommended configuration: two hard drives, 40-80 GB each, and/or a shared storage option.

Chapter 6 -- Digital Preservation -74-

Other Digital Storage Concerns The amount of storage required depends on a number of inter-related issues including but not limited to the size of your digital holdings, your institution’s budget, and your institution’s digital preservation strategy. There are a number of storage issues other than cost, amount, and media permanence that cultural repositories should factor into decisions regarding the “long term” storage of digital materials. These include labeling, file management, and metadata issues. As soon as digitization projects get going, the number of images piles up. As noted above, CDs and DVDs hold immense amounts of data, and it is unlikely that contents lists on the jewel cases will be palatable in the long-run, as more and more storage devices are used. Therefore, in addition to the labeling that would occur on the outside of a CD storage case, managers of digital projects should maintain “preservation metadata” for each image. The information necessary is explained in the NC ECHO Preservation Metadata for Digital Objects (http://www.ncecho.org/presmet/index.htm).

It is also recommended that the file naming conventions follow these standards:

• Attempt to conform to ISO 9660 naming standard (a standard that defines a file system usable under a variety of operating systems)

• Establish a file naming convention and the extensions later made to it • Base names on accession numbers or unique IDs • Avoid case sensitivity

NC ECHO recommended storage standards

• Master file storage: o Minimum recommendation: Gold CD-R o Best practice recommendation: Redundant Hard Disk storage and/or Hard

Disk with Tape Backup • CD names are simple date/time stamps (e.g., 19990412_1628) • ISO 9660 standard is used as strictly as possible

Conclusion Digital preservation seeks to achieve longevity of the digital object with all its original properties intact. This is a daunting task and one not easily tackled or mastered. Many questions in the field of digital preservation remain unanswered, and many more questions will emerge as technology relentlessly forges ahead with new developments. Whether your institution has only the means to preserve the minimum content of your digital creations or can afford to preserve the whole discovery and display system, policies should be put in place to ensure the long-term sustainability and accessibility of the digital content you have chosen to be preserved. Further Reading Benford, Gregory. Deep Time: How Humanity Communicates Across Millennia, New York: Avon, 1999.

Chapter 6 -- Digital Preservation -75-

Conway, Paul. "The Implications of Digital Imaging for Preservation." In Preservation of Library and Archival Materials, 2nd ed. Edited by Sherelyn Ogden. Andover, MA: Northeast Document Conservation Center, 1994. Conway, Paul. Preservation in the Digital World, available at: http://www.clir.org/pubs/abstract/pub62.html. Development of a Testing Methodology to Predict Optical Disk Life Expectancy Values (Summary) http://palimpsest.stanford.edu/byorg/nara/nistsum.html Digital Preservation Coalition http://www.dpconline.org Digital Projects Guidelines. Attachment 11, Arizona State Library, Archives and Public Records, available at http://www.lib.az.us/digital/dg_a11.html “Long-Term Usability of Optical Media - The National Archives and Records Administration and the Long-Term Usability of Optical Media for Federal Records: Three Critical Problem Areas” http://palimpsest.stanford.edu/bytopic/electronic-records/electronic-storage-media/critiss.html Rothenberg, Jeff. “Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation,” http://www.clir.org/pubs/reports/rothenberg/contents.html (January 1998).

CHAPTER 7 PRESENTING YOUR DIGITAL PROJECT

Web sites open new venues for making cultural resources and information available. They have the potential to undo existing limitations of location, hours, and even the fragility of documents and artifacts. Materials can be open to audiences around the world twenty-four hours a day. These opportunities also bring responsibilities and challenges. With more access, there is less control over who is using the materials or how they are used, “The one with control is not the one with the message but the one with the mouse," said John Gehl and Suzanne Douglas in an article for the now defunct electronic magazine The World and I. This can mean increased reference questions and developing new ways of interacting with users. Web page creators don’t know if their readers are more expert in their subject than themselves. They don’t know if their viewers are Asian, African American, Scots-Irish, Hispanic, Native American, German, rich, poor, deaf, blind, physically impaired, man, or woman, child or adult. This democratization of access suggested by Gehl and Douglass is seen in the now famous cartoon by Peter Steiner for the New Yorker, “On the Internet, Nobody Knows You’re a Dog”1 Still there are some instances when web pages need to provide access for specific audiences and for special audience needs and in all instances it is important to provide good navigation and clear context. If web pages are to be useful they need to be read. Web designers Jakob Nielson and Donald Norman recognized that “the Internet follows a kind of Sheer Design Darwinism: Survival of the easiest” which means “usability is not a luxury on the internet; it’s essential to survival.” To assure that the pages are read, it is now common to find the most important information "front-loaded." This means that the primary content is placed in an abstract or overview at the beginning of the document so the reader may make a quick judgment whether to read further. The intent of this "front loading" goes beyond just the quick browse, it is also used by automated search agents (including spiders and search engines) when they roam the Internet in search of specific content. Whether the need is the human interface or the technological convenience, the Web site must have usability. This chapter of the Guidelines addresses the issue of audience, access for the disabled, web design, software, and interoperability. A final section discusses the choice and implementation of collection management systems as they apply to the presentation of digital projects. Audience - "The one with the mouse..." Who is the User? While it is true that the Internet provides an anonymous status for users, web page creators must design for some audience. There are a series of questions that will help you to define

1 Available at: http://www.unc.edu/depts/jomc/academics/dri/idog.html

Chapter 7 -- Presenting your Digital Project -78-

your audience. Reflecting on decisions made during Planning will help you when you are ready to look at presenting your digital project on the web.

• Who is expected to use the Web site? • How is the user expected to use the information? • What information is expected to be used frequently? • Are there different age-ranges in the audience that need to be accommodated? • What is the educational range of the user? • Do any of your users have disabilities? Have you accommodated them?

The designer and developers of web pages need to be particularly vigilant in addressing the needs of a variety of users from the first grader to the older researcher. The school-age user will have needs that may not be consistent with the adult researcher; the graduate student may have research requirements not shared by the lawyer or businessman. How can the needs of these populations and others be met through one web site? This is just one question that local project managers will wish to ask when trying to decide how best to present their digital images to the public. Access for the disabled

People with disabilities are perhaps the single segment of society with the most to gain from the new technologies of the electronic age. Yet they have among the lowest rates of use of these technologies. As a result, the potential benefits of computers and the Internet to the disability community are a long way from being realized."2

In planning web sites, we need to keep in mind the kinds of disabilities that can affect web site access. These include color blindness, repetitive stress injury, deafness, tinnitus, blindness, memory impairments, cognitive disabilities, and seizure disorders. According to the 2000 U.S. Census, 21.1% of North Carolina’s population over five years old were identified as having some kind of disability.3 Nationwide, disabled people who own computers comprise only one-quarter of the total disabled population. The digital divide is real for the disabled; it is a yawning chasm. Key Tips for Accessibility

Describe all graphics using simple ALT text. Don't use columns. Keep link text brief. Keep navigation simple. Don't jump around. Provide alternatives to all controls and applets. Make use of the “noframes” tag to direct user to no-frame pages. If tables are used, provide alternate pages with no tables. Avoid reliance only on style sheets. Provide closed captions or transcripts for all audio. Provide ASCII and/or HTML alternatives for proprietary formats.

2 Stephen Kaye, Computer and Internet Use Among People with Disabilities, 3/1/2000. 3 Health and Disability in North Carolina, 2003, available at: http://www.schs.state.nc.us/SCHS/pdf/HDReport.pdf. See also, The 2000 Census report at http://www.census.gov/population/cen2000/phc-t32/tab01-NC.pdf for information about North Carolina’s disability statistics.

Chapter 7 -- Presenting your Digital Project -79-

Avoid scrolling marquees. Provide titles for objects. Consider size and readability of fonts for various age groups.

For expanded Web guidelines see the W3C Web Content Accessibility Guidelines 1.0 at http://www.w3.org/TR/WAI-WEBCONTENT/. The Web carries great promise for the disabled; but it must become a more accessible medium, if they are to fully benefit from that promise. Project managers are encouraged to make special efforts to make their materials as accessible as possible to the disabled. There are a variety of resources available to help you make this possible.

Unified Web Site Accessibility Guidelines (http://www.w3.org/WAI/GL/central.htm) Established by the W3C (The World Wide Web Consortium), the mission of this site is "... to lead the Web to its full potential. It includes promoting a high degree of usability for people with disabilities. The Web Accessibility Initiative (WAI), in coordination with organizations around the world, is pursuing accessibility of the Web through five primary areas of work: technology, guidelines, tools, education & outreach, and research & development." This Web page is currently the most definitive site for clear guidance on making Web pages accessible. Bobby (http://bobby.watchfire.com/bobby/html/en/index.jsp) A free service that assists Web authors in providing approve sites for users with disabilities. Service will identify and repair significant barriers to access by individuals with disabilities. Sites that display a "Bobby Approved" icon are ADA compliant. Viewable With Any Browser (http://www.anybrowser.org/campaign/) A guide to technical standards and programs that assure pages will be able to be viewed by any browser.

Web Page Design When designing a web page, you do not have to begin from scratch. There are many good examples available for Web page design and once you have made initial design decisions those examples will help refine your ideas. As you gather design ideas, consider your potential audiences, what kinds of navigation you want to use, your site structure (that is how the pages relate to one another), and ways of providing context for your materials. Context is particularly important for primary sources on the web. At a minimum, primary sources should be identified with a title/description, date, and creator. Some key tips follow:

Use Design Grids The design grid you chose need not follow a set fashion or technology, but should follow the best practices of well established sites which have similar content. There are many good examples both on the Web and available in books. For example, the Yale Web Style Guide (http://www.webstyleguide.com/index.html?/) has a series of design grids

Chapter 7 -- Presenting your Digital Project -80-

that are excellent models. Also the software FrontPage has a series of templates or design grids from which to choose. Browsers are Important! You can't know what your viewers are using for Web browsers. Some viewers will use Netscape, others Microsoft Explorer. Some viewers will be using a Macintosh computer and others a PC. Color is particularly browser sensitive. What looks good on your machine may be grotesque on another machine if the color is not "browser safe” (http://www.primeshop.com/html/216colrs.htm). When planning a Web design, be sure to create something that is "browser neutral": a page that will look good when viewed by most any browser. Viewers Are Impatient! Don't tax your viewer's patience with large, slow-loading images. The average time a viewer will wait for a page to load is less than one minute. If you want your pages to be viewed, keep your image size small, preferably around 72 dpi (see Digital Production). JPEG images are generally the favored image type for the Web. They load progressively and generally keep the viewer's attention until the loading is complete. Use Design "Systems" For small sites (e.g., a handful of Web pages), it makes sense to design and code pages individually. For anything bigger than that, managers should be thinking about building a design system (using a template or database mechanism) to build their Web site. This makes it simpler to apply global changes and migrate the underlying data independent of the Web site's design and layout. In addition, most institutions will want their sites to grow gracefully, and it is worth the time upfront to build sites systematically, as this will save time when changes or additions are needed later. Design for Different User Pathways Recognize that users won't necessarily follow the paths that you expect them to on your site. Design it so that users will get results (or at least an explanation of how best to get results) no matter which path they take. Make the default view the one you think will be most broadly useful (since most people will use the default), but allow the user the flexibility to meet their own needs, using your resources in more complex (or simpler) ways. Provide many different approaches to your resources and allow users to choose which best fits their needs. Functionality such as browsing, search, results grouping and sorting and so on are all viable ways to achieve different pathways. Design for Printing Many users will want to print pages from your site. Design your site to make printing will be easy and efficient. Provide contextual information (name, address, URL of your institution or project) on every Web page so that printouts will display this information. Links to Homepage Always keep in mind that users can and will arrive at your site at almost any point via external links or search engines. You need to help those users get to your homepage. Always provide contextual information and links to homepages on every page of the site.

Chapter 7 -- Presenting your Digital Project -81-

Software Professional Web designers and teachers concur that there is no software yet that will substitute for learning a little basic HTML. It is relatively easy to learn and worth the effort. There are a number of great introductory books available, including HTML For Dummies. Two excellent online resources for web designers, from beginners to the most advanced, are the WebMonkey site (http://webmonkey.com/) and the W3Schools HTML Tutorial (http://www.w3schools.com/html/default.asp). There are several authoring environments (Web page creating software) available. The clear favorite for functionality, integration of other applications (e.g. flash and fireworks), site management tools, generation of good HTML code, and standards compliance among the pros, teachers, and even WebMonkey is Macromedia's Dreamweaver. Dreamweaver is the gold standard authoring environment and so can be occasionally overwhelming, but it comes with respectable documentation, including texts of popular and readable O'Reilly books on HTML. The Netscape browser comes with a fairly basic but serviceable page authoring environment called Composer. While some Web design professionals would like more functionality and better site management tools, they seem to agree that Composer is a fine way to get a page or two up on a server quickly or make some minor changes on existing pages. Netscape is available for all versions of Windows, Macintosh, and Linux. The latest version of Netscape is available on the Web (http://channels.netscape.com/ns/browsers/default.jsp). Interoperability Issues Interoperability refers to the ability of different systems to communicate with one another. One example of interoperability is the ISO standard ISO 23950, or better known as ANSI Z39.50, or the Profile for Access to Digital Collections (http://www.loc.gov/z3950/agency/profiles/collections.html). This standard defines the way two computers share bibliographic data, images, and multimedia data. Developed in 1995, the Z39.50 standard is based on client-server architecture and does not depend on individual systems. It is now the standard for the Internet and has grown to include a whole range of Z39.50 profiles, among them the CIMI Profile: Z39.50 Application Profile for Cultural Heritage Information (http://www.cimi.org/public_docs/HarmonizedProfile/HarmonProfile1.htm). Collection Management Systems Collection management systems have been used by museums for many years. These systems provide a structured way to input metadata, often include a component for digital images, and more recently include a publishing component that allows you to present materials online. This has developed into a popular choice for digitization projects, including presentation of digital collections, metadata, and accompanying images. There are a wide variety of collection management systems available with scalable functionality and price tags. Below are some questions you should ask when considering the acquisition of a collection management system:

Chapter 7 -- Presenting your Digital Project -82-

System considerations:

1) What is the platform the system needs in order to operate? Do you have that platform (including any additional plug-ins) available to you?

2) How easy is it to facilitate online publication of the collection management system? 3) Does the system have a customizable functionality, including search, retrieval, and

browse capabilities? Are these functions written in a script that you or someone on your staff can easily manipulate?

4) Does the system comply with national or local standards? If not, can it be customized to encourage standards compliance?

5) Does the system facilitate metadata importation/exportation? 6) Does the system provide free-text searching (for textual documents)? Cross-

collection searching? 7) Does the system provide a variety of outputs (such as pdf versioning) that may be

desired by your audience? 8) Does the system come with adequate accompanying document for your

implementation? 9) Are there others in the NC ECHO community or nationally who are using the system

for you to consult? Workflow considerations:

1) Does the system have a customizable metadata and image input to facilitate your individualized workflow?

2) Does the system have controlled vocabulary capabilities? How flexible is that functionality?

3) Does the system include quality control features to facilitate that part of the process? 4) Does the system have multiple user capability for simultaneous data/image input? 5) Does the system have built-in error correction? 6) Does the system handle rights management information?

Publication considerations:

1) Does the system have a customizable web interface? Is the interface written in a script that you or a staff member can easily manipulate?

2) Does the system provide customizable functionality and display to provide contextual information?

3) Does the system come with adequate accompanying documentation for your audience?

4) Is the functionality of the system clear to a wide variety of audiences, including digital novices and technology sophisticates?

Conclusion Librarians have long been accustomed to the mantra "meeting user needs." They and their colleagues in cultural institutions cannot stop chanting this phrase when they face a Web page editor. The needs of the user come first. Be aware of your different audiences and design for their particular requirements. Remember to present your materials in such a way that the disabled may have access to it. But no matter the audience, it is a good idea to place an abstract and overview of the important issues covered by your Web page at the beginning of the page. This helps with the retrieval of information. Keep in mind that different browsers present pages in different ways. Make sure your page is "browser

Chapter 7 -- Presenting your Digital Project -83-

neutral" and looks good and makes sense whether it is being presented by Netscape, Explorer or others. And, no matter how good your page looks, if it takes a long time to load, no one will ever see it. Keep the images small, cut out the animation, and make that page run faster. Further Reading Apple Web Design Guide, http://www.geo.tu-freiberg.de/docs/apple/web_design/intro.html Fleming, Jennifer and Richard Koman. Web Navigation: Designing the User Experience. Cambridge, MA: O'Reilly and Associates, 1998. Gray, Douglas E. Preparing Graphics for the Web, http://www.dsdesign.com/articles//gif.htm Lynch, Patrick J. and Sarah Horton. Web Style Guide: Basic Design Principles for Creating Web Sites. 2nd edition, Yale University Press, 2002 http://info.med.yale.edu/caim/manual/contents.html Niederst, Jennifer. Web Design in a Nutshell: A Desktop Quick Reference. 2nd edition. Beijing, Sebastopol, CA: O’Reilly, 2001. Nielsen, Jakob. Designing Web Usability: The Practice of Simplicity, 1999.

CHAPTER 8 Targeting the K-12 Audience

The creative uses students and teachers can make of the wide variety of digitized resources available from North Carolina’s cultural institutions is boundless. Many students have little or no opportunity to leave their immediate geographic area to visit North Carolina’s libraries, museums, and various cultural institutions. Sharing digitized collections with K-12 students and their teachers can make a significant contribution to the ultimate goal of life-long learning. Digitized special collections of North Carolina's libraries, archives, museums, historic sites, and other cultural institutions can offer students and teachers in the K-12 community a wealth of resources for research activities to enhance learning. Typically, students and teachers will investigate sites, looking for material relevant to a research question, a lesson plan, or an instructional unit. As part of the research process, students will gather information from one or more Web sites, analyze and synthesize their findings, and incorporate them along with multimedia elements into an authoring tool to share with others. Cultural institutions that choose to provide digitized resources for this audience will need to consider a variety of issues in addition to the North Carolina Standard Course of Study, including developmental needs, searching and navigation capability, and infrastructure limitations. The following guidelines may be helpful when deciding whether to serve this audience and how to provide the most effective access to digitized resources. Teacher Resources The addition of teacher resources related to a Web site’s content can promote the most effective use of information provided. Teacher materials could include suggested activities, lesson plans, and examples of student projects. LEARN NC provides a rich resource for lesson plans, available at: http://www.learnnc.org/lessons/. Cultural institutions may want to contract with teachers or partner with local colleges of education in order to create lesson plans and relevant instructional activities to accompany digitized resources and submit those plans to the LEARN NC portal. If a cultural institution offers professional development opportunities such as workshops or online courses, it would be useful to include this information in a teacher resource area as well. Web Site Design The developmental needs of students are an important consideration when designing access to digitized resources. The following suggestions reflect the experience and feedback of educators who frequently work with students as they use Internet-based resources.

Chapter 8 -- Targeting the K-12 Audience -86-

Orientation/Introduction It is helpful for students and teachers to view a brief introduction to the purpose and organization of the Web site as well as a general description of the types of resources (e.g., print documents, oral histories, artifacts, local photographs) that can be found throughout the site. Introductory text also might include a timeline or information on the historical period covered by the resources. This type of information can help students and teachers to determine quickly if the site is relevant and how to get started using it. The Introduction should be obvious when entering the home page or easily accessible from a link to a secondary page. Locating Information on the Site Students and teachers need multiple options for finding relevant information. Search options could include menus and an index with hyperlinked main topics and sub-topics. (A word of caution: avoid “fly-out” menus that have very small print and that often do not stay open). Simple keyword searching features such as Google’s free search engine also can be added to a site. Terms within text that might be unfamiliar to students can be highlighted and hyperlinked to short definitions. Multimedia Multimedia such as music, audio, video, VR, and graphic files is an important element of a Web site to engage students, thereby enhancing their learning. These files should be provided in several formats to meet the needs of various software applications. For example, students often use HyperStudio and Microsoft PowerPoint for multimedia authoring. AVI or WAV sound files as well as GIF or JPEG graphic files work best in these programs. RealPlayer and QuickTime are commonly used for viewing video files. Images should appear thumbnail size whenever possible (with an option for enlarged view) to minimize load time. Technical Issues Equipment and browser capability varies considerably among North Carolina’s schools. It is best to assume that access will be with older versions of Internet browsers. In addition, many schools still use the Macintosh platform, creating challenges when designing Web pages that will display properly on this platform as well as Internet Explorer. Compromises in design frequently have to be made, such as in the brightness of colors used, in order to have the Web pages display effectively on both platforms. In addition, schools must address the needs of diverse users, including students with physical disabilities. Web design for the K-12 environment should, as much as possible, follow Section 508 Standards of the Rehabilitation Act as amended in 1998. Curriculum Alignment North Carolina Cultural resources and primary sources have particular relevance to the Social Studies Curriculum due to the emphasis on North Carolina’s history and geography in grades four and eight, as well as to information literacy skills in the Information Skills Curriculum. The following is a brief overview of the Information Skills and Social Studies curricula. It is important to note that cultural resources and primary sources also can have relevance to other curricular areas such as the Arts and Sciences. Complete information

Chapter 8 -- Targeting the K-12 Audience -87-

regarding the North Carolina Standard Course of Study is available at the Web site of the North Carolina Department of Public Instruction, available at: http://www.ncpublicschools.org/curriculum. The Information Skills Curriculum follows an integrated and holistic approach whereby:

Classroom instruction in all subject areas requires students to access, analyze, evaluate, organize, and use information from a wide variety of resources (print, non-print, electronic). Students must be able to synthesize information and construct meaning to solve problems, make decisions, and communicate ideas and information in a variety of formats (print, graphical, audio, video, multimedia, web-based) to meet academic and personal needs, practicing and refining these skills at all grade levels enables students to be effective learners and to make the connection between classroom learning and resources (print, non-print, and electronic), whether accessed in the classroom, library media center, or community. This practice is known in educational literature as resource-based learning.1

The goals and objectives of The North Carolina Social Studies Standard Course of Study closely parallel the national social studies curriculum standards. In 1992, the Board of Directors of the National Council for the Social Studies (NCSS), the primary membership organization for social studies educators, adopted the following definition: Social studies are the integrated study of the social sciences and humanities to promote civic competence. Within the school program, social studies provides coordinated, systematic study drawing upon such disciplines as anthropology, archaeology, economics, geography, history, law, philosophy, political science, psychology, religion, and sociology, as well as appropriate content from the humanities, mathematics, and the natural sciences. The following strands of The North Carolina Social Studies Standard Course of Study provide a framework for studying and analyzing social studies at each grade:

Individual Identity and Development - In each society, individual identity is shaped by one's culture, by groups, and by institutions.

Cultures and Diversity - There are similarities as well as differences between and among cultures. Culture helps people to understand themselves as both individuals and as members of a group. As cultural borrowing becomes more prevalent, the differences between cultures become less defined.

Government and Active Citizenship - Power structures have historical foundations but continue to evolve. How people create and change structures of power, authority and governance, and the role and the relative importance they assign to the individual citizen varies over time and place. Examining civic ideals and practices across time and in diverse societies enables students to recognize gaps between the practice and the ideals of civic responsibility.

Historic Perspectives - Seeking to understand the historical roots of present day cultures enables students to develop a perspective on their own place in time. Knowing what things were like in the past and how they changed and developed over

1 C.A. Haycock, Resource-based learning: a shift in the roles of teacher, learner. NASSP Bulletin, 75 (535), 1991, pp. 15-22.

Chapter 8 -- Targeting the K-12 Audience -88-

time in a variety of societies and cultures provides students with a broader view of their own history.

Geographic Relationships - Studying places and the people who inhabit them as well as their interactions and mutual impact on each other enables the student to develop a spatial perspective on their place in the world going beyond personal location.

Economics and Development - Students recognize that having wants/needs that exceed resources available generates a variety of solutions in different circumstances. How people organize for the production, distribution, and consumption of goods and services varies over time and space.

Global Connections - Connections between cultures have existed for centuries, but in modern times they have become increasingly diverse and have had a greater impact on the quality of life in North Carolina, the nation, and the world.

Technological Influences and Society - Technological changes over time have had significant impacts on the development of cultures. As technology has spread over place and time, it has influenced and been influenced by people and their perceptions.

While there is the potential for students at a variety of grade levels to be able to use digitized resources on the NC ECHO Web site, the social studies curriculum in the following grades would have particular relevance to these resources.

Grade 2 - Regional Studies: Local, State, US, and World

Grade 3 - Citizenship: People Who Make a Difference

Grade 4 - North Carolina: Geography and History

Grade 8 - North Carolina: Creation and Development of the State

Grade 10 - Civics and Economics

Grade 11 - United States History

In addition, digitized cultural resources would have relevance for various social studies elective courses, such as African American Studies, American Indian Studies, and Contemporary Issues in North Carolina History.

Criteria Cultural institutions that are considering the K-12 audience should review the criteria used to evaluate Web sites for the North Carolina schools developed by the Evaluation Services Section of the North Carolina Department of Public Instruction, reproduced below.

Chapter 8 -- Targeting the K-12 Audience -89-

Criteria for Evaluating Web Sites North Carolina Department of Public Instruction

Evaluation Services

Content

Accuracy:

• Error-free information • Current information • Updated frequently • Recent "last" update • Objective, balanced presentation of information • Bias-free viewpoints and images • Correct use of grammar, spelling, and sentence structure • Primary outlink (link that takes you to additional site) content is relevant, authentic, and

appropriate • Authority • Expertise/reputation of author/designer • Contact information for author/designer • Expertise/reputation of host site

Appropriateness:

• Concepts and vocabulary relevant to students' abilities • Information relevant to the North Carolina K-12 curriculum • Interaction compatible with the physical and intellectual maturity of intended audience

Scope:

• Information of sufficient scope to adequately cover the topic for the intended audience • Logical progression of topics within original site (site being evaluated) and primary outlinks • Information offered not easily available in other sources

Presentation:

• Site follows good graphic design principles • Screen displays uncluttered and concise • Captions, labels, or legends for all visuals • Legible text and print size appropriate for the intended audience • Graphics and art functional, not merely decorative • Information presented through text, motion, still images, and sound • Information presented in a manner to stimulate imagination and curiosity • Product advertising not intrusive

Technical Aspects

Navigation:

• Ready access to site; site not overloaded • Images load within reasonable timeframe • Intuitive icons, menus, and directional symbols that foster independent use

Chapter 8 -- Targeting the K-12 Audience -90-

• Inlinks (links that take you to locations within the original site) that allow easy navigation throughout the site

• Standard multimedia formats • Logical options for printing/downloading all or selected text and graphics

Copyright North Carolina public schools and their districts have copyright guidelines defined in local school board policy, and North Carolina educators are expected to abide by the Fair Use Guidelines of copyright law. Educators who develop instructional presentations for distance education are expected to follow the limits and special conditions for using digital resources as outlined in the TEACH Act that became law in 2002. The Information and Computer/Technology Skills curricula of the North Carolina Standard Course of Study for K-12 emphasize the awareness of copyright law, adherence to copyright law and guidelines, and respect for the ownership of ideas and information, including the citing of copyrighted resources. Students in North Carolina schools frequently use the Internet for research, following guidelines outlined in a district-level developed "Acceptable Use Policy" (AUP). An AUP is a mutually agreed upon document that provides guidelines for students and teachers regarding access and ethical use of the Internet. It is generally signed by students and their parents and agreed to by teachers and administrators. Developers of digitized resources also need to be aware that requirements for Internet safety policies and filtering measures defined by the Children’s Internet Protection Act and the Neighborhood Children’s Internet Protection Act can impact Web site design and access. In order to use digitized images from sites such as NC ECHO, students and educators are expected to determine if they meet Copyright Guidelines for Fair Use. For this reason, cultural institutions need to provide information on their Web sites that digitized images may be lawfully used by addressing questions such as:

• Are all the digitized and posted resources cleared with the copyright holder? • What digitized images are in the public domain? • How will users know what is in the public domain versus copyright materials? • Does the agency sponsoring the site have a policy on the use of their digitized

images? • Is an e-mail link available on the site for requesting permission to use copyrighted

resources for educational projects? Note: If images are protected by copyright, students and educators are expected to obtain permission in writing to use the resources. Schools often have a form letter for copyright permission. Conclusion The K-12 audience is an important one for the cultural heritage community. These guidelines are meant to help you with creation of resources that are easily integrated into the K-12 community. Students and teachers are creative users of online material, but there are requirements that must be followed and ways to make that use more beneficial.

Chapter 8 -- Targeting the K-12 Audience -91-

Further Reading Crash Course in Copyright: The TEACH Act Finally Becomes Law. University of Texas. 13 November 2002. http://www.utsystem.edu/ogc/intellectualproperty/teachact.htm Internet School Library Media Center: Copyright for Educators http://falcon.jmu.edu/~ramseyil/copy.htm LEARN NC, http://www.learnnc.org/ Rehabilitation Act: Section 508 Standards http://www.section508.gov/index.cfm?FuseAction=Content&ID=12

CHAPTER 9 PROJECT EVALUATION AND ASSESSMENT

Counting web hits is not enough.

Creating a digital project involves multiple steps and considerations including evaluation potentially formatively during the development of the digital resources and summatively to assess continuing impacts. Thus, an evaluative component needs to be planned for during project initiation in order to identify potential improvements as well as to identify the impacts of the digital project over time. This helps in understanding costs and benefits as well as whether the presentation and interpretive framework are appropriate for users. This chapter deals with the components of project evaluation. It concludes with suggestions on different evaluation methodologies that can be employed in order to provide a fruitful understanding of the effectiveness of a project. Qualities of Evaluation In order to construct a good evaluation framework, project planners need to understand why evaluation is so important and what the general characteristics of evaluation are. In beginning to contemplate evaluation consider these questions: why are you doing evaluation, what do you want to find out, and what will you do with the answers? Answering these questions is the first step in developing an evaluation plan and provides a perspective on the qualities of evaluation. Three reasons why evaluation matters:1

1) To improve performance by helping project staff manage the process of developing, planning, and implementing prototypes and systems.

2) To provide evidence for usability, cost-effectiveness, and added value of projects, including systems, output, and configurations developed.

3) To contribute to the overall learning from the project. And six things to remember about evaluation:2

1) Evaluation results from design not accident. 2) Evaluation has purpose. 3) Evaluation is about quality. 4) Evaluation is more than measurement. 5) Evaluation doesn’t have to be big. 6) There is no one right way to evaluate.

There are a wide variety of different answers to the second question: what do you want to find out. Thinking about this in the planning process, though, will help you develop appropriate evaluation approaches. For instance, you may want to discover whether or not your workflow is appropriate and efficient for the production of the digital project. On the

1 Cedars Evaluation Plan, available at http://www.leeds.ac.uk/cedars/documents/ABA03.html 2 Adapted from Danny P. Wallace and Connie Van Fleet, ed. Library Evaluation: A Casebook and Can-do Guide. (Englewood, Co.: Libraries Unlimited, Inc., 2001), pp. 3-4.

Chapter 9 -- Project Evaluation -94-

other hand, you may want to find out if the website is pleasing to look at or easy to use. Also, some funding agencies require a focus on identifying desired outcomes and basing an evaluation plan on identifying actual outcomes. Deciding what you want to find out determines what kinds of measurements you will take and what questions you will ask of the data collected. Finally, who your intended audience is for the evaluation is an important consideration. Are you reporting administrative details or are you reporting user feedback? This has an impact not only on the form that that dissemination plan comes in, but in the direct results that can take place from the evaluation process. Evaluation is only as useful as the actions that can be implemented as a result, including decisions that no actions need to take place. When does evaluation take place? This question can be deceptive. Most would answer that you complete a project and evaluate it at that stage. But, because evaluation is grounded in the goals and objectives (or desired outcomes), the evaluation process begins at the planning stage when those goals and objectives (or outcomes) are outlined. Evaluation is an iterative process, potentially done at every stage of a project to ensure the project is still heading in the right direction. What is being evaluated, though, does change. Early evaluation strategies could focus on productivity; middle evaluations often examine prototypes of the project; final evaluations often look to end-users for feedback. Each comprises a step in the evaluation system of a digital project. Formative and Summative Evaluation A summative evaluation is defined as providing information on the efficacy of the project. In other words, does the project do what it is designed to do? Summative evaluations are typically quantitative and use numeric scores to assess achievement, though, case examples of project impacts could be used to provide a context for such quantitative assessment. In contrast, formative evaluation is done to elicit improvements that can be made to a project. It is more complex than summative evaluation. As Robert Stakes notes, “When the cook tastes the soup, that’s formative; when the guests taste the soup, that’s summative.” Making a distinction between formative and summative evaluation allows you to identify the goals of the evaluation measurement. Both formative and summative evaluation measurements should be planned and carried out over the life of a project. This binary approach to evaluation delineates the different uses and constructions that are carried out through the life of a project integrating evaluation strategies. Cost of evaluation Evaluation isn’t free. Thus, the costs of evaluation including human resources need to be incorporated into the operating budget for the project. Costs can be considered as both direct and indirect. Direct costs can include materials, administrative and staff time, consultant fees and participant honorariums. Indirect costs include facilities costs, telecommunications, utilities, and other basic support expenditures. Indirect costs are less easy to identify and are therefore often left out of budgeting considerations when planning a project but are important to consider. Another kind of cost that needs to be considered is the indirect human costs of work disruption and the impact that evaluation can have on staff morale. In planning evaluation

Chapter 9 -- Project Evaluation - 95 -

strategies, it is important that all staff have an understanding of the plan, the value of the evaluation approach, and that the project and not the personnel are what is being evaluated. Ultimately, evaluation will not happen if it isn’t a part of someone’s planned project tasks. Evaluation measures There are numerous evaluation measures that can be used for digital project and a variety of ways that those measurements can be divided. The first way to think about different measurements is to consider what the basis for measurement is:

Transaction-based measures: number of hits, transaction logs, etc. Time-based measures: service hours, peak levels, duration at the web site Cost-based measures: cost-benefit analysis for producton, return on investment for

the project User-based measures: activities, group use, user satisfaction

Another common way to delineate evaluation measures is based upon the audience participating in the evaluation. Are they experts, developers, end users, or a mix? The presence or absence of the evaluator has an effect and the measurement tool as well. Some measurements are obtrusive (interviews, focus groups, etc.) while others are unobtrusive (online surveys, spontaneous feedback, transaction log analysis). Finally, the types of data collected also affect the evaluation plan. This can be divided into quantitative and qualitative evaluation measures. Each measurement, though, requires not just the collection of data, but the analysis of that data to discern meaning for the evaluation process. Qualitative data, such as in-depth interviews with people using the digital resource, is often critical to making sense of quantitative data, such as number of users. What can we count and what will it tell us? Libraries are infamous for their focus on statistical evaluation frameworks. In the past, we have focused on counting things. This is also true in the virtual environment. Counts of things can tell us about our performance.

Web hits: not just the home page and not just the numbers but who and where and how.

Reference statistics: how many request do you receive as a result of the project? How do you find out why someone comes to you? Availability and reliability: how many times does the site go down, or are you

providing 24/7 access? Re-usability: what are the numbers of links to your project? Cost-effectiveness: evaluation of process, quality assurance, cost per item to

digitize. In planning for the next digital project, this will be especially important to know.

What do they think and what will they tell us? Qualitative evaluation measures more challenging but often yield more useful information for digital project effectiveness. The qualities that are examined in qualitative measueres

Chapter 9 -- Project Evaluation -96-

include effectiveness, efficiency, reusability, learnability, and satisfaction. This last is the least tangible and speaks to questions about ease of use, organization, labeling, visual appearance, content, quality of data, and ‘sense making’ of the materials and presentation. Qualitative measures also demonstrate the difference between obtrusive and unobtrusive measurements. In general, research comparing different approaches to survey administration indicates that an online survey is more likely to result in honest answers than a face-to-face interview; however, the face-to-face interview environment allows the interviewers to increase the utility of the information being given by providing opportunities for clarification and expansion.

Interviews: an evaluation interview is a structured social interaction between an evaluator and a subject who is identified as significant to the evaluation process. In interviews, the evaluator initiates and controls the exchange to obtain comparable information relevant to an understanding of the project. Surveys: surveys can collect both quantitative and qualitative information about users of a digital resource. In digital projects, these surveys tend to be provided online and can either be prompted or be voluntary. Demographic data is a useful analytic tool to include in surveys, but open-ended questions can solicit user opinions. It is recommended that a survey contain a combination of controlled and open-ended questions to provide the best mix of data for analysis. Focus groups: focus groups consist of the gathering of a group of people and asking them for their attitude, opinion, etc. The primary characteristic of the focus group is the interactive nature of the group, where participants are encouraged and free to talk to other group members. The disadvantage to focus groups is at the heart of the group as well, because evaluators may have less control over the interview process than they would in a one-on-one interview.

Conclusion Because evaluation is an iterative process, an evaluation plan for digital projects should include more than one approach. Evaluation is time-consuming, though, and needs to be considered in the budgeting for a project before the project begins. The most important aspect of evaluation is the results that come from it, and what those results can tell you about your project and future ones.

Dimensions of Evaluation3

Dimension Continuum Types of measurement Observing user interactions Collecting

user opinions Test audience Expert developers End-users Presence of Evaluator Obtrusive Unobtrusive Timing of data collection Synchronous Asynchronous Types of data Quantitative Qualitative Timing of analysis & reporting Synchronous Asynchronous

3 Adapted from http://www.rlg.org/preserv/diginews/diginews3-3.html [12/20/2005]

Chapter 9 -- Project Evaluation - 97 -

Further Reading Jeng, Judy. “What is Usability in the Context of the Digital Library and How can it be Measured?” in Information and Technology Libraries, 24, June 2005, pp. 47-56. Mathur, Mira. “Advent of Digital Libraries and Measuring their performance: a review” in DESIDOC Bulletin of Information Technology, 25, 2005, pp. 19-25. Rieger, Robert and Geri Gay. “Tools and Techniques in Evaluating Digital Imaging Projects” in RLG DigiNews, 3(3), 1999, http://www.rlg.org/preserv/diginews/diginews3-3.html Wallace, Danny P. and Connie Van Fleet, ed. Library Evaluation: a casebook and can-do guide. Englewood, Co.: Libraries Unlimited, 2001.

CHAPTER 10 PROJECT MANAGEMENT

Digitization requires close management because of the rate of change inherent in digital projects, the complex nature of digitization processes, and the high level of training required of a digitization project’s staff. In the technology domain, change and unpredictability are facts of life, and often represent opportunities rather than disasters for a well-planned project. Your management goal should be to create a flexible, adaptable system whose staff and procedures can accommodate change. Any plan to digitize collections must consider the changes this type of endeavor will bring to the workplace – how this new set of tasks will impact the organization. Institutions and their administrators should acknowledge at the outset the long-term benefits of short term increases in training, equipment costs, and disruptions in routines. While equipment costs often draw the greatest attention at budget time, support expenses are usually greater and have more long-range implications for the institution. Even so, as anyone who has ever bought a computer knows, equipment is not a one-time expense. The rapid turnover in technology requires near-constant migration and upgrades. If an institution is to be successful at moving to a digital environment, it must learn early how to allocate resources for the long haul. This chapter seeks to illuminate the myriad of factors that affect digitization project management and to identify methods for successfully addressing these issues. Most successful digitization projects:

1. Define goals and objectives, 2. Establish a working staff/team, 3. Agree on a plan of action, 4. Agree on a timetable and end product, 5. Monitor project process, 6. Re-assess and revise goals as unforeseen situations develop, and/or 7. Control the process and the outcomes, and 8. Assess the outcomes/results.

Managers should work with other institutional administrators and boards to balance organizational expectations against strained resources, realizing an economy of scale. All involved must learn how to coordinate access issues with preservation issues and how to maintain currency while keeping costs in check. In most institutions resources are always overburdened, and moving collections toward digital access will not decrease this burden, but managers may find that redistributing the load can be both stimulating for workers and rewarding for users. Human Resources A project’s long-term success depends on the accurate assessment of the required human resources. Institution staff varies in their areas of expertise and different types of projects

Chapter 10 -- Project Management - 100 -

require different skills. Most digitization projects in cultural institutions will require that the following task sets be addressed:

Conservation: A crucial aspect of any digitization initiative will be a conservation assessment of the analog materials. Under some conditions this may show that before some material can be digitized, it will require conservation intervention.

Digitization/Encoding: This can involve digital imaging, keyboarding, Optical Character Recognition, character or full-text encoding, or a combination of these.

Metadata/Cataloging: The creation of metadata records for the digital material is a specialized task. This work may also involve cataloging the analog material or searching for information to enhance the metadata record where it is absent from the analog version.

Technical Development/Support: This falls into two distinct areas: (1) the creation or implementation of specific IT solutions for creating, managing, delivering, or preserving the digital material, and (2) the provision of IT support for project hardware and software. This latter area includes workstations, desktop applications, network services, and capture devices.

In smaller institutions staff may carry out tasks in more than one area. For example, the digitizer may also handle technical development, or the project manager may take on metadata creation. A digitization project staff may include any combination of the following: advisory board, project manager, curatorial staff, archive staff, library staff, volunteers, interns, catalogers, systems analyst, programmer, web designer, or photographer. Above all, digitization projects involve a team approach, even if that team is very small. A variety of skills and expertise are required to execute a successful digitization project. Below are some tips for hiring new staff for a digitization project. Staffing decisions are some of the most critical decisions managers will have to make when planning a digitization project. Whether the manager is hiring new staff or is faced with the re-tooling of existing staff, or both, the strategies are similar. When hiring, the possibility of identifying the specific skills required might be easier, but the applicant pool is generally a small one. Identifying specific skills within the existing staff can be challenging, but other variables such as work-habits and attention to detail may be easier to forecast. Your institution’s organization may make it possible for you to employ volunteers and student assistants or interns. Depending on their level of skill, these types of workers generally can provide assistance with cataloging, digital production, arranging and organizing, physical facility maintenance, specific research, and other similar tasks. Contracted services can also be very useful for grant-funded projects and other short-term projects and can pull in expert staff for brief periods of time. These short-term personnel can be useful in raising the training levels of career staff and in introducing stimulating and alternative work methodologies. Remember, the scale of the digital project will depend on the funding or staff allocated to it. It makes no sense to undertake a large scale project if the funding and staffing is limited by inflexibility in any area and is likely to remain so. Start small and the environment may be more flexible and responsive than you imagined.

Chapter 10 -- Project Management -101-

Training It is very important to keep in mind that while your institution’s digitization initiatives may begin as finite projects, ultimately digitization will be necessarily incorporated into the long term, ongoing operations of your organization. Plans to digitize must reflect the institution's ability and desire to hire and train individuals at the highest level of quality. Since the market for competent staff is highly competitive, it is often only the larger institutions that can afford the skilled and fully-trained workers for their digital imaging efforts. Other institutions should be willing to support both formal and informal on-the-job training. Their employees need to be able to do their routine jobs while learning to be proficient "digitizers." Training within a digitization project is often not so dependent on the accumulation of knowledge that results in being “trained” as it is on the ability of the trainee to be in a constant “state of training” and to be able to work directly with others to reach common goals. The rapid change in technology and in practice within digitization projects requires a constant re-training and re-positioning of staff. In most cases, especially in smaller institutions, this will mean that project managers will need to spend a good bit of time keeping up-to-date on developments in the field, learning about emerging standards and best practices, and then take what's valuable, incorporate it in the project plan, and train other staff in its implementation. The flexible employee who has a fast learning curve will flourish in the type of environment required by digital undertakings where the less flexible, linear learner may have difficulty. Above all, training should accommodate the various learning styles of staff. Changes in project staff can also present training issues. As staff move from one job to another, skilled workers can come and go. Therefore it is recommended that the project manager be trained in all aspects of the project so that staff changes do not necessarily mean a hiatus in work production and as new staff members begin on the project, an effective training system is in place. The creation of a training manual for each project may be useful. With background information on the project and step-by-step instructions for each task, it often saves time in the long run and ensures that everyone on the project gets the same information. For it to be successful, training should be firmly established in a good workflow design. Project managers will find that creating a training manual for digitization projects can lead to the institution’s digitization program manual as an institution moves from tackling digitization case by case towards folding digitization into its ongoing operations. Workflow and Quality Management Once your institution has established what work needs to be accomplished, what staffing is needed with what skills, and what training is needed, you will want to ensure that these variables are managed effectively and that the people hired or charged with working on the project are utilized wisely. Management’s challenge is to determine the skills and attributes of employees and to determine how these employees can most efficiently contribute to the overall success of the project. As stated above, in smaller institutions staff may carry out tasks in more than one area. Even in those cases, adopting a structured management scheme encourages efficiency. The following diagram and description, taken from Chapter II of The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials

Chapter 10 -- Project Management - 102 -

(Version 1.1 of the First Edition, published February 2003, http://www.nyu.edu/its/humanities/ninchguide/), illustrates a simple management structure that can be used by digitization projects and programs of any size. The steering group functions as an executive board and includes all constituents who are directly involved in the project, even if not employed by it, such as curators, archivists, subject specialists and education officers. In practice it is common for the steering group to be an existing committee within an institution. The advisory committee is a broader-based group, providing general advice on the project's focus and direction. Members can include the steering group with additional appointments from external organizations bringing particular areas of expertise, such as evaluation, to the initiative. There may be more than one advisory committee, or the advisory committee may be broken down into sub-committees each of which supplies more focused technical, academic or editorial decision-making support. In a small project scenario, the steering committee may consist of only a few upper management people and the advisory board may be only these plus a couple members of a Friends organization. The project manager may also be the digitizer and encoder or also be the cataloger. In reality, there may be only two employees working on the project, but having a structure in place and known to all those working on a project creates an environment where everyone involved knows where the buck stops, who has authority over what tasks, and who is responsible for what tasks. A comfort zone in which people can relax and work is created. It is essential to have a single project manager who is responsible for the project, overseeing its daily management. In most cases the project manager provides the necessary project management experience, supplemented by internal or external advice. An institution needs to assign both accountability and authority to the project manager position, to avoid the process being bogged down by myriad interactions with the advisory group or groups to deal with daily operations.

Chapter 10 -- Project Management -103-

Project Steps and Timelines The steps in a digitization project generally follow the 8 common elements listed above, but there are several processes found within those elements. As the digitization project gets underway, the project manager should outline those processes to utilize staff time efficiently and to assure that no one process gets missed. The timeline for digitization of collections will naturally be determined by the institutional goals for their digital projects, by the staffing of the institution, and by the fiscal resources available. A general table of the steps in the digitization process based on the model established by the Library of Congress, National Digital Library Program, follows. It may need to be adjusted depending on the goals of the individual institution.

The Digitization Process

Steps % of Time (approximate)

Collection Preparation 20%

Establish Project Goals and Objectives Select items or collection(s) Research copyright, use restrictions, other - record information appropriately Plan the project - a small project or a small collection? - a selected project across collections? - the audience? - type of access? metadata? - see Project Planning and Selection chapters Develop the work-plan with staff and admin. Hire or re-assign staff Determine division of labor and roles of staff Train staff in proper handling, etc. Define work space

Organization 20%

Determine the structure and/or arrangement of material Prepare the material - Organize: reformat material if necessary - Preserve: repair or adjust material - Describe: develop finding aid, catalog, or database Determine name and subject authorities - LC, AAT, ULAN, etc. Apply consistent digital naming conventions Establish processes for physical "handling" - fragile material - oversized material, etc. Establish access and use guidelines

Chapter 10 -- Project Management - 104 -

Outsourcing (optional) 15%*

Determine the costs of contracted services Establish reputation of service Allocate portions to be outsourced Prepare RFPs, if necessary Draft work statement Draft timeline Evaluate proposals Workflow management to outsourcing vendor

*Note that there is time involved in outsourcing.

Digital Capture Process 15%

See Digital Production Chapter Scan Conduct post-processing - create multiples (access, thumbnails, etc.) - name files - convert text, format, create headers, compress, set up for Web

Quality Review 10%

Inspect 10% of records for accuracy Inspect 10% of images for quality Check technical requirements and standards Give feedback to administration, contracted services Record assessment Make adjustments where necessary

Archive/File Management 10%

Determine archival storage method Record all necessary information for migration purposes

Prepare for Web Access 20%

Prepare HTML files Create indexes Assess quality of Web creation - Web accessibility for disabled - Consistent with current standards Test, re-design, if necessary Establish distribution network (internal and external) Prepare educational modules, if applicable

Assessment 5%

Qualitative assessment Quantitative assessment

Your institution may find it helpful to use some sort of simple workflow chart. The following is adapted from a document designed by Jan Blodgett, College Archivist at Davidson College, Davidson, NC:

Chapter 10 -- Project Management -105-

A chart such as the above can be modified in any number of ways to reflect the activities, staffing, and time frames your digitization project requires. It will also be necessary for the project manager to recognize that any workflow chart or project planning table is not set in stone; it should allow for changes to be made due to unforeseen circumstances. Having a chart or table allows the project manager and project work team to map out work and see exactly what needs to be done when and by whom. By having all staff aware of each others’ responsibilities and deadlines, the effects of any changes can be more easily understood by all involved. It should be noted that just because a project has been completed and mounted on the Web, does not mean that staff may ignore the digital product. It is at this point that

Digitization Project Workflow

Task Staff Timeframe Completed Identify documents Archivist 2 weeks

Pull documents 1. Use document removed cards 2. Place in scan file cabinet

Project assistant 5 hours week to be done semi-weekly

Scan 1. Clean scanner 2. Calibrate 3. Scan 4. Save following file name procedures 5. Move folder to catalogers cabinet/flag for cataloging

Project assistant 3 hours /day Use document workflow sheet for daily work

Cataloging/Metadata 1. Prepare Dublin Core records 2. Create text file for web pages and transfer to web staff

Archivist 2 hrs/day Use document workflow sheet for daily work

Image Manipulation 1. Create jpg and thumbnail images 2. Store images in appropriate folders on server 3. Create backup cds for tiffs. 4. Update index for backup cds

Project Assistant 3 hrs/day

Web site development 1. Using template, create pages for each image. 2 Include contextual information on page and Dublin Core in header.

Project Assistant 2 hrs/day

Chapter 10 -- Project Management - 106 -

concerns about site maintenance and data migration begin to pay off. Even if digital products were self-maintaining, they probably would continue to draw the attention of staff. Most digital collections made available online cause an increase in requests for the material and increase the reference duties of the host institution. Outsourcing Outsourcing is an attractive option for some institutions because the expense of an in-house digitization project can be considerable if the required infrastructure is not present. It may be the only possibility for institutions wishing to digitize unusual or over-sized materials (large architectural drawings, maps, or poster collections). Whatever the size of the originals, outsourcing is a particularly appealing alternative if the digitization project is a one-time-only endeavor. If, however, an institution acknowledges that digitization will be an ongoing process, then outsourcing loses some of its luster. No one wants to become overly dependent upon another entity for a core activity. Because the decision to outsource is a difficult one, a consultant may be useful. Whether an institution works with a consultant or not, it certainly will want to consider doing a cost/benefit analysis of outsourcing versus in-house digitization. Full outsourcing may involve sending materials off-site to a location where they are scanned and then returned to the collection. This will work if the items are sturdy and involve limited preservation concerns. (There are some special cases-- e.g., text encoding where photocopies of originals can be sent offsite.) All too often, however, the material to be digitized needs special handling, is rare or fragile, or simply cannot leave the physical premises of the institution – conditions inherent in most special collections. For this reason, full outsourcing is not an option for most institutions. The key to a successful outsourcing project is the link of the metadata with the object. Managers will need to establish policies and procedures for tracking the material outsourced and matching it when it is returned. Some vendors will provide metadata at request. Whether hybrid or full outsourcing, managers can contact a commercial digitization vendor who will estimate the cost of the venture and negotiate a timetable. But even when it is outsourced, a digitization project is never completely "delegated." Managers will still be managing, even if from a distance. Disaster Preparedness It may not happen today, but something will go wrong sometime. Pipes can break, electrical shorts can become fires, and computers have been known to crash. Because of this, digitization projects and their products should be included in existing institution disaster plans, and project managers need to take special precautions to protect not only the originals but their digital surrogates. The best way to accomplish this is with backups. Automatic tape drive backups can help preserve Web productions, while master digital copies can be easily duplicated on CDs and stored off-site, such as with a partner institution. If it is important that a project site remains up and running during “difficulties,” institutions may wish to explore mirror sites. Such sites duplicate information on several servers that are separated geographically. All of these servers are connected to the Internet and, if one goes down, the others are supposed to remain up and running. While few cultural institutions may wish to go this far in disaster preparedness, all will want to make some initial plans, create backups from the very first of every project, and explore off-site

Chapter 10 -- Project Management -107-

storage of duplicate master images. Conclusion Because of the high level of staff training required, the complex nature of the individual projects, and the almost constant change inherent in digitization, managing digital projects is a challenge. A well-trained, flexible staff, some sort of training manual, and a clear workflow plan with timelines and goals built-in are the best tools to help managers accomplish their tasks. These workflow plans also may demonstrate where it might be more economical to outsource portions of a project. Unless the entire project is outsourced, some space will need to be set aside for digitization activities and conventional preservation and safety concerns should drive this allocation of space. All project managers should factor disaster preparedness into their plans from the beginning of any project. Project managers should also embrace the reality that their first or pilot digitization projects are most likely just the beginnings of digitization for their institutions. Developing long-term philosophies for how digitization is to be incorporated into the everyday workflow of your institution cannot start too soon and should be on a manager’s mind from day one of the first digitization project attempted. Further reading Building an Emergency Plan : A Guide for Museums and Other Cultural Institutions. Compiled by Valerie Dorge and Sharon L. Jones. Los Angeles : Getty Conservation Institute, c1999. Digital Projects Guidelines. Arizona State Library, Archives and Public Records http://www.lib.az.us/digital/ Handbook for Digital Projects: A Management Tool for Preservation and Access. Northeast Document Conservation Center. First Edition. Maxine K. Sitts, editor. 2000 http://www.nedcc.org/oldnedccsite/digital/dman2.pdf The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials (Version 1.1 of the First Edition, published February 2003, http://www.nyu.edu/its/humanities/ninchguide/) RLG Tools for Digital Imaging http://www.rlg.org/preserv/RLGtools.html SOLINET. Disaster Mitigation and Recovery Resources http://www.solinet.net/preservation/preservation_templ.cfm?doc_id=71

GLOSSARY

AACR2: Anglo-American Cataloging Rules, 2nd Edition. Content rules used in the creation of cataloging records. AAT: Art and Architecture Thesaurus; a publication of the Getty Information Institute. a thesaurus for terms to describe art and architecture. access point: a name, term, phrase or code that is used to search, identify or locate a file, document, record, or object. acquisitions information: information about the acquisition of the collection or objects by the repository. Acrobat: Adobe’s electronic document format. Documents can be created from within a word processor, from postscript, or from scanned pages. The documents are highly portable, yet maintain the look of the original. Acrobat is especially useful in this area because Adobe makes the reader available for free. administrative information: information regarding the administration of the collection or object. May include acquisitions information, provenance, use restrictions, access restrictions, copyright ownership, citation information, and general processing information. Administrative information can refer to all or part of a collection. administrative metadata: metadata primarily intended to facilitate the management of resources. angled brackets: an SGML/XML syntax convention to set apart a tag, < >. ANSI: American National Standards Institute, an organization which accredits other standards development organizations.

APPM: Archives, Personal Papers and Manuscripts: A Cataloging Manual for Archival Repositories, Historical Societies, and Manuscript Libraries, by Steven L. Hensen. Published by SAA as a supplemental set of cataloging rules. APPM has been superceded by DACS. ASP: Active server pages – web pages that use scripts to access information from a database stored on the server. See also html. attribute: modifier for the meaning of elements, named properties of an element that may carry different values depending upon the context in which they occur. authority control: the process of verifying and authorizing the choice of unique access points, such as names, subjects, and forms and assuring that the access points are consistently applied and maintained in an information retrieval system. See also controlled vocabulary. authority file: a group of authority records searchable by all established headings and cross-references. authority record: an entry that contains information about an access point. An authority record establishes the form of the heading, determines cross-references and relationships of the heading to other headings. biographical/historical note: highlights of the life and activities of a person, family, or corporate body that generated the document described therein. A biographical/historical note is intended to provide contextual information for researchers. bit depth: see dynamic range.

Glossary -128-

BMP: Windows Bitmap. Usually uncompressed but can be compressed (lossless). Up to 32 bit depth. Standards for Windows Imaging. Large file sizes. Not supported in some browsers and some non-Windows applications. boilerplate text: standardized text used for labels and other text used for all digital files (i.e. copyright notice, citation format, etc.). CCO: Cataloging Cultural Objects. CDWA: Categories for the Description of Works of Art, a metadata standard for describing works of art for the purpose of art historical scholarship. Developed by the Getty Information Institute. close tag: the tag that closes an element, also called an end tag. component level: an EAD expression for the hierarchy of nested information in a finding aid. compression: the re-encoding of data to make it smaller. Most image file formats use compression because image files tend to be large and consume large amounts of disk space and transmission time over networks. controlled access: a list of index terms for a finding aid. controlled vocabulary: formal limits on a vocabulary, useful for consistent use of vocabulary terms. copyright term: the length of time during which the copyright is honored. crosswalk: an authoritative mapping from the metadata elements of one scheme to the elements of another. DACS: Describing Archives: a Content Standard, published and officially endorsed by the Society of American Archivists as an output-neutral content standard for archival description.

DAT: digital audio tape, a magnetic tape originally designed for use in audio applications, but now popular for storing data. Capacities range up to 12 gigabytes. DCMI: Dublin Core Metadata Initiative, an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models, responsible for the maintenance of the Dublin Core metadata scheme. derivative image: an image that has been created from another image. Usually involves a loss of information. Techniques to create derivative images include sampling to a lower resolution, using lossy compression techniques, or altering an image with image manipulation software during image processing. descriptive metadata: metadata primarily intended to serve the purposes of discovery, identification, and selection. digitization: the conversion from printed paper, film, or other media formats to an electronic format where it object is represented as either black and white dots, color or grayscale pixels, or 1s and 0s. DjVu: An electronic document format primarily useful for scanning documents. Key features are support for different resolutions and compression types for photo areas of an image versus text. Uses a variant of JBIG2 compression from binary image data and wavelets for continuous areas such as photos. For more information, see http://www.princtonimaging.com/djvu/ DLT: digital linear tape, a fairly new high end tape format. Capacities range up to 35 gigabytes. download: to transmit a file from one computer to another. Usually implies retrieving a file from a remote computer to a local one, or from a large computer to a smaller one. FTP is a commonly used command for this.

Glossary -129-

dtd: document type definition, the formal specifications and definitions of the structural elements and markup to be used in encoding specific types of documents in SGML/XML. Dublin Core: metadata element set created to facilitate the discovery of electronic resources. Consists of core 15 elements and is typically used in conjunction with HTML. Maintained by the DCMI (http://www.dublincore.org/) DVD: digital video disk, an optical storage medium that can store up to 4.7 gigabytes (single layer) 8.5 GB (double layer), 9.4 GB (double sided, single layer), or 17 GB (double sided, double layer). Transfer rates and seek times are similar to those of CD-ROMs for currently available drives. The DVD specs include higher level specs for audio and video capabilities. dynamic range: the number of colors or shades of gray that can be represented by a pixel. The smallest unit of data stored in a computer is called a bit. Dynamic range is a measurement of the number of bits used to represent each pixel in a digital image. Also called bit depth. EAD: Encoded Archival Description, an SGML/XML dtd for the description of archival finding aids that reflects the hierarchical arrangement of archival materials. EAD provides a framework for information storage, retrieval and display on the World Wide Web. Maintained by SAA with web support from the Library of Congress (http://www.loc.gov./ead/) electronic document: a document that consists of 1s and 0s and requires hardware and software for access. Documents become more useful when stored electronically because they can be widely distributed instantly and allow searching. Best practice for the preservation of electronic document is still underdevelopment. HTML and PDF are well known electronic document formats.

element: an essential building block of metadata schemes that serves to identify and surround the content of sections of the metadata. Elements are constructed of a open tag and a close tag. Elements may contain other elements, attributes and values, PCDATA or be empty. encoding rules: the syntax or prescribed order for the elements contained in the metadata description. end tag: see close tag entity: an independent file that is used to include external information. finding aid: a tool used to communicate the contents of an archival collection, the finding aid typically includes administrative information, contextual information, scope and content information, intellectual organization and physical location information for archival and manuscript materials. FPX: Flashpix, a file format that is 8-24 bit depth and uncompressed. Developed by Kodak. Flashpix can be compressed and audio can be embedded in images. It supports text fields, stores various resolutions in one file, has consistent color, but is not supported by most software. G4 Compression: A compression technique used in Fax Group 4. It produces very good results for black and white images, and is frequently used as an option in TIFF files. It is also used in Adobe Acrobat (PDF) files. GIF: Graphics Interchange Format. An 8-bit image file format that is commonly used on the Web. GIF uses LZW compression, which makes it good for color and grayscale images, but it does not compress as well as G4 for black and white. LZW is “lossless” which means it will not compress as well as JPEG, but will retain all of the images quality. PNG is designed to replace GIF.

Glossary -128-

grayscale: an image type that uses black, white, and a range of shades of gray. The number of shades of gray depends on the number of bits per pixel. The larger the number of shades of gray, the better the image will look, and the larger the file will be. HTML: hypertext markup language; most common procedural markup language found on the Web. An international standard for coding text to make it appear with formatting on web pages. HTML includes the structure of documents (title, headings, etc.) and the formatting (bold, fonts, and font size). For example, <b>Headline</b> would make the word Headline appear in bold. HTTP: Hypertext Transfer Protocol. The protocol designed to convert HTML code so web browsers can interpret and display web pages. ICR: Intelligent Character Recognition. The processes of recognizing handwritten characters. Similar to OCR, but more difficult since OCR is from printed text. Used for forms you fill out that are then scanned to gather information you have provided on the form. image capture: using a scanner, digital camera, or other device to create a digital representation of an object. image file format: when a page is scanned, the page can be stored in a number of file formats. The type should be chosen based on the desired use of the image, and the software that will be used. Different file formats commonly use different methods of compression as well, and some types of images compress better using some formats rather than others. image manipulation: making chances (i.e. tonal adjustments, cropping, moiré reductions, etc.) to an image using image processing software; altering the image from its original digital capture.

instance: the text and tags (excluding the dtd and related files) of an individual SGML/XML-encoded document, such as a single EAD-encoded finding aid. interoperability: the ability of multiple systems, using different hardware and software platforms, data structures, and interfaces, to communicate, exchange, and share data. ISAD(G): General International Standard for Archival Description, a general framework for archival description developed by the International Council on Archives. ISBN: International Standard Book Number, an identifier for nonserial print publications. ISO-9660: International Standards Organization 9660, Information processing – Volume and file structure of CD-ROM for information exchange. A file system format standard developed for CD-ROMs using the CD-XA encoding standard. It is supported by Microsoft operating systems, UNIX, and Macintosh. JBIG: A “lossless” image compression format for binary (black and white( images. Compresses better than G4 by up to 25 percent. Also supports progressive encoding. Licensing issues have slowed its adoption for use. JBIG2: A “lossy” image compression format for binary (black and white) images. A JBIG2 compressor identifies common objects (usually characters) in the image and creates a dictionary with references to those objects. Lossiness is induced by allowing similar objects to be represented by a single dictionary entry. This format is supported in PDF 1.4 and greater. JPG, JPEG: Joint Photographic Experts Group. An 8-24 bit image file format that is best suited for photographs. It supports “lossiness,” which means that it will throw away some detail in order to achieve better compression. It has variable amount of compression to vary quality and file size. It

Glossary -129-

does not work well for text. Widely used as a delivery format. JPEG 2000: An image format that provides the inclusion of metadata and structural elements for the image within the code stream. LCNAF: Library of Congress Name Authority File. A controlled vocabulary used for the names of persons, corporate bodies, uniform and series titles available at http://authorities.loc.gov/ LCSH: Library of Congress Subject Headings, A controlled vocabulary used for creating subject terms and geographical terms. link: encoding that is used for navigation. The link is seen on the browser-side and allows the user to “click” on it to go somewhere else in the document or on the internet. MARC: Machine-Readable Cataloging. Data structure standard used in Integrated Library Systems (ILS) for Online Public Access Catalogs (OPACs). metadata: structured information that describes, explains, locates, and otherwise makes it easier to retrieve and use an information resource. metadata harvesting: a technique for extracting metadata from individual repositories and collecting it in a central catalog to facilitate search interoperability. metadata scheme: a set of metadata elements and rules for their use that has been defined for a particular purpose metalanguage: a language used to describe other languages. SGML and XML are examples of metalanguages. METS: Metadata Encoding and Transmission Standard, a specification for structural metadata.

migration: a digital preservation technique to preserve the integrity of digital files by transferring them across hardware and software configurations and subsequent generations of computer technology. navigation: moving around a document or the internet nesting: the way in which SGML/XML sub-elements may be contained within other elements to create a multilevel document. noise: data or unidentified marks picked up in digital capture or data transfer that do not correspond to the original. OAI: Open Archives Initiative, an organization that maintains a protocol for harvesting metadata from distributed repositories. OCR: Optical Character Recognition, a process that produces a page of text from an image file. open tag: the tag that opens an element, also called a start tag. parent element: an element that may contain other elements, referred to as subelements of the parent element. parse: a check against the XML syntactic rules. See also validate. PCD: ImagePac, PhotoCD Lossy compression. 24 bit depth. Has 5 layered image resolutions. Used mainly for delivery of high quality images on CD. PCT: PICT Compressed. Mac standard. Up to 32 bit. Supported by Macs and a highly limited number of PC applications. PDF: Portable Document Format, 4-64bit depth. Uncompressed. Used mainly to image documents for delivery. Need plug-in or adobe application to view. Adobe’s Portable Document Format, the term Adobe uses to describe Acrobat files. See also Acrobat. PCDATA: parsable character data, i.e. text

Glossary -128-

pixel: short for picture elements, which make up an image. Each pixel can represent a number of different shades or colors, depending on how much storage space is allocated for it. PNG: Portable Networks Graphics, lossless compression. 24 bit. Replaced GIF due to copyright issues on the LZW compression. Some programs cannot read it. portable: to be functional across differing types of computers and operating systems. This can be used to describe programs or electronic documents. preservation metadata: metadata primarily intended to help manage the process of ensuring the long-term preservation and usability of digital information resources. progressive encoding: a method by which multiple resolutions of the same image is stored in the same image file. Imaging systems can efficiently serve lower-than-maximum resolutions with images encoded this way. Total file size is increased, but smaller amounts of data can be transmitted to clients. proofing: a service by which the resulting OCR text or PDF file is repaired for errors induced by the electronic process. provenance: history of ownership of materials prior to acquisition by the current institution. qualifier: see refinement quality control: techniques used to ensure that high quality is maintained through the various stages of digitization. quantization: to reduce the number of colors or shades of gray in an image, with the goal being to reduce the file size while maintaining image quality. Also used to display images with more colors than are available on the display device. Refinement: In Dublin Core and other metadata schemes, a term that restricts the

meaning of an element or identifies the encoding scheme used in representing the value of the element (also known as a qualifier). resolution: the number of pixels (in both height and width) making up an image. The more pixels, the higher the resolution; the higher the resolution , the greater its clarity and definition and the greater the file size. Can be expressed as a ratio (640 x 480 pixels) or in terms of dots per inch (dpi). It is recommended that you use between 72 and 100 dpi for images that will be displayed on the screen, and 300 dpi for images that will print on common inexpensive printers. rights metadata: metadata primarily intended to enable the management of rights related to information resources; a type of administrative metadata. RLG: Research Libraries Group, a not-for-profit membership organization of libraries, archives, museums, and other cultural heritage institutions now part of OCLC. RLIN: Research Libraries Information Network, a cataloging system and union catalog run by RLG. SAA: Society of American Archivists. scanning: see digitization schema: in XML a way of defining a document type used as an alternative to the dtd. scheme: a formally defined set of metadata elements or fields. scope and content note: note containing information regarding the scope and content of an archival collection. This note is written to provide an overall description of the collection and includes such information as significant topics covered by the collection, significant individuals, associations, or corporations, societies, and events documented by the collection, and the extent to which the material covers these

Glossary -129-

topics. It can also include the media the collection exists in and its organization. semantics: the definitions of the meaning of metadata elements, as opposed to the rules for encoding or representing the values of the elements, see also syntax. server: host computer for web pages SGML: Standard Generalized Markup Language, SGML is a platform-neutral standard for creating documents. It is a series of rules that define document structures. skew: during printing or scanning, the degree to which the page is not vertical. De-skewing is a process where the computer detects and corrects the skew in an image file. source code: the code (usually HTML) behind any web page viewed by a browser. To see the source code of a page in Internet Explorer, right click on the page and select View Source or click on View on the tool bar and then select View Source. In Netscape, it is referred to as Page Source. SPF: Still Picture Interchange Format (SPIFF), Official JPEG format. Lossless compression, supports text datafields, thumbnails, alternative color spaces. There is not a lot of support for this format, but it is designed to be read by applications that can handle jpg. start tag: see open tag. structural metadata: metadata that describes the internal organization of resource and its place in an external organization, including any relationships it has with other resources. subelement: an element that is available within one or more other elements. In EAD, every element except the document element <ead> is a subelement of one or more parent elements.

surrogate: a secondary object meant to substitute for the original, such as a photograph of an artwork used in place of the artwork. syntax: how a metadata scheme is structured for exchange in a machine-readable form, including the rules regarding that structure rather than their meaning. Common syntaxes include MARC, SGML, and XML. See also semantics. tag: the syntactic expression of an element; tag and element are used interchangeably, although tag refers to the actual representation of the element, while element refers to the intellectual content of the tag. tag library: a document that lists the names of the SGML or XML elements and attributes alphabetically, along with their definitions, tag names, and rules for their use. technical metadata: metadata primarily intended to document the creation and characteristics of digital files. TGA: TARGA format, compressed or uncompressed. Up to 32 bit, common in animation packages, good for interchange. thesaurus: a controlled vocabulary with syndectic structure in which all allowable terms are given and relationships between terms are shown. thresholding: when converting a pixel from grayscale to black and white, the threshold is the gray value above which will be considered white and below or equal to which will be considered black. TIF, TIFF: Tagged Image File Format, an industry standard image file format. Uncompressed, originally developed for desktop publishing. 1 to64 bit depth, used mostly for high quality imaging and archival storage. Generally non-compressed and high quality, including large file sizes. Most TIFF readers only read a maximum of 24-bit color. Delivery over web is hampered by file

Glossary -128-

sizes, although LZW compression can reduce these file sizes by 33%, it should not be used for archival material. It is unique in that it incorporates multiple compression techniques, allowing the user to specify the best format for a type of image, and that one file can contain multiple images. unicode: syntactic representation of special characters to eliminate conflict between XML syntax and textual content. A complete set of Unicode values is available at: http://www.ncecho.org/ncead/documents/unicode.htm URL: Uniform Resource Locator – the “address for a web site” (Ex.: http://www.ncecho.org/Guide/toc.htm) HTTP is the method of connection; www.ncecho.org is the name of the host computer or server, also known as the domain name; /Guide/ is the particular directory on that computer; and index.htm is the specific file. .htm is the kind of file, also called the file extension. Note that URLs are a specific kind of URI (Uniform Resource Indicator). validate: to assure that a document conforms to the rules in a dtd. See also parse. value: the specific expression of an attribute. vocabulary: the universe of values that can be used for a particular metadata element. W3C: World Wide Web Consortium, an international committee working to provide vision and standards for the internet. wrapper element: an element designed only as a container for other elements. Wrapper elements may have attributes and values but must contain one or more sub-elements in order to include text.

XHTML: eXtensible Hypertext Markup Language, an emerging markup language that combines information about the structure of a document and the structure of the data. The purpose of XHTML is to allow the exchange of information from different types of database. XML: eXtensible Markup Language, a way of coding text to allow for content searching and manipulation. An adaptation of SGML for the use on the Web. Z39.50: An ANSI/NISO standard protocol for system-to-system search and retrieval. Also International Standard, ISO 23950 “Information Retrieval (Z39.50): Application Service Definition and Protocol Specification” This standard is commonly used for the interchange of information in library catalogs and other databases. zooming: make an image appear larger (zoom in) or smaller (zoom out) by re-displaying the image at different resolutions. Higher resolutions will make the image appear larger and easier to read.

RESOURCES A Framework of Guidance for Building Good Digital Collections, Digital Library Forum, Institute of Museum and Library Services, available at: http://www.niso.org/framework/Framework2.html Apple Web Design Guide, http://www.geo.tu-freiberg.de/docs/apple/web_design/intro.html Benford, Gregory. Deep Time: How Humanity Communicates Across Millennia, New York: Avon, 1999. Besser, Howard. Procedures and Practices for Scanning. http://sunsite.berkeley.edu/Imaging/Databases/Scanning/ Bolinski, Dorissa, Christopher Mautner, and Timothy McLain. Creating Acceptable Use Policies. California: Classroom Connect, 1998. Building an Emergency Plan : A Guide for Museums and Other Cultural Institutions. Compiled by Valerie Dorge and Sharon L. Jones. Los Angeles : Getty Conservation Institute, c1999. Caplan, Priscilla. Metadata Fundamentals for all Librarians. Chicago: American Library Association, 2003. CDP Digital Audio Working Group, Digital Audio Best Practices, version 2.0, November 2005, http://www.cdpheritage.org/digital/audio/documents/CDPDABP_1-2.pdf Colet, Linda Serenson. “Planning an Imaging Project,” prepared as one of the Guides to Quality in Visual Resource Imaging, July 2000 for the Research Libraries Group (RLG) and the Digital Library Federation (DLF), http://www.rlg.org/legacy/visguides/visguide1.html. Columbia University. “Selection Criteria for Digital Imaging Projects” available at: http://www.columbia.edu/cu/lweb/projects/digital/criteria.html Conway, Paul. "The Implications of Digital Imaging for Preservation." In Preservation of Library and Archival Materials, 2nd ed. Edited by Sherelyn Ogden. Andover, MA: Northeast Document Conservation Center, 1994. Conway, Paul. Preservation in the Digital World, available at: http://www.clir.org/pubs/abstract/pub62.html. Copyright and Art Issues http://darkwing.uoregon.edu/~csundt/copyweb/ Copyright Clearance Center http://www.copyright.com Cornell University Legal Information Institute, Copyright Law Materials http://www.law.cornell.edu/topics/copyright.html

Resources -110-

Crash Course in Copyright: The TEACH Act Finally Becomes Law. University of Texas. 13 November 2002. http://www.utsystem.edu/ogc/intellectualproperty/teachact.htm Development of a Testing Methodology to Predict Optical Disk Life Expectancy Values (Summary) http://palimpsest.stanford.edu/byorg/nara/nistsum.html Digital Preservation Coalition http://www.dpconline.org Digital Projects Guidelines. Arizona State Library, Archives and Public Records http://www.lib.az.us/digital/ Duval, Erik, et al. “Metadata Principles and Practicalities” in D-Lib Magazine, 8(4), April 2002. Fleming, Jennifer and Richard Koman. Web Navigation: Designing the User Experience. Cambridge, MA: O'Reilly and Associates, 1998. Franklin Pierce Law Center. “The IP Mall” http://www.ipmall.fplc.edu/ Gassaway, Laura N. "When U.S. Works Pass into the Public Domain." http://www.unc.edu/~unclng/public-d.htm Gray, Douglas E. Preparing Graphics for the Web, http://www.dsdesign.com/articles//gif.htm "Guidelines for Selection" compiled by P. Ayris (UCL) as part of the joint RLG and NPO Preservation Conference, Warwick, 1998 http://www.rlg.org/preserv/joint/ayris.html Handbook for Digital Projects: A Management Tool for Preservation and Access. Northeast Document Conservation Center. First Edition. Maxine K. Sitts, editor. 2000 http://www.nedcc.org/oldnedccsite/digital/dman2.pdf Harper, Georgia. Copyright Crash Course http://www.utsystem.edu/ogc/intellectualproperty/cprtindx.htm Harvard University. “Selection for Digitizing: A Decision-Making Matrix” http://preserve.harvard.edu/bibliographies/selection.html Hazen, Dan, Jeffrey Horrell, and Jan Merrill-Oldham. Selecting Research Collections for Digitization, Council on Library and Information Resources, 1998. Available at: http://www.clir.org/pubs/reports/hazen/pub74.html Hodge, Gail. Metadata Made Simpler. Annapolis: NISO Press, 2001. Hoon, Peggy. “Scholarly Communication at NC State” http://www.lib.ncsu.edu/scc/main.html Hudgins, Jean, Grace Agnew, and Elizabeth Brown. Getting Mileage out of Metadata: Applications for the Library. Chicago: American Library Association, 1999.

Resources -111-

Indiana University, Copyright Management Center http://www.copyright.iupui.edu/, Bloomington. Indiana. Internet School Library Media Center: Copyright for Educators http://falcon.jmu.edu/~ramseyil/copy.htm Introduction to Metadata: Pathways to Digital Information. Martha Baca, ed. California: Getty Information Institute, 1998. http://www.getty.edu/research/institute/standards/intrometadata Jeng, Judy. “What is Usability in the Context of the Digital Library and How can it be Measured?” in Information and Technology Libraries, 24, June 2005, pp. 47-56. Kenney, Anne R. and Oya Y. Rieger. Moving Theory into Practice: Digital Imaging for Libraries and Archives. Mountain View, CA: Research Libraries Group, 2000. See also: http://www.library.cornell.edu/preservation/tutorial/index.html Kenney, Anne and Steven Chapman. Digital Imaging for Libraries and Archives. Ithaca: New York, Department of Preservation and Conservation, Cornell University Library, June 1996. LEARN NC, http://www.learnnc.org/ “Long-Term Usability of Optical Media - The National Archives and Records Administration and the Long-Term Usability of Optical Media for Federal Records: Three Critical Problem Areas” http://palimpsest.stanford.edu/bytopic/electronic-records/electronic-storage-media/critiss.html Lynch, Patrick J. and Sarah Horton. Web Style Guide: Basic Design Principles for Creating Web Sites. 2nd edition, Yale University Press, 2002 http://info.med.yale.edu/caim/manual/contents.html Mathur, Mira. “Advent of Digital Libraries and Measuring their performance: a review” in DESIDOC Bulletin of Information Technology, 25, 2005, pp. 19-25. McCain Library and Archives, University of Southern Mississippi, “Civil Rights in Mississippi: Intellectual Property and Privacy Information” http://www.lib.usm.edu/%7Espcol/crda/ipp/index.html. McCord Hoffman, Gretchen. Copyright in Cyberspace: Questions & Answers for Librarians. Neal-Schuman Publishers, 2001 Minow, Mary. “Library Digitization Projects and Copyright” available at: http://www.llrx.com/features/digitization.htm Moving Theory into practice: A Digital Imaging Tutorial, Cornell University Library, available at: http://www.library.cornell.edu/preservation/tutorial/contents.html NDLP Project Planning Checklist, National Digital Library Program, Library of Congress, available at: http://lcweb2.loc.gov/ammem/prjplan.html Nebraska Library Commission. “Copyright Handbook: Issues for libraries and schools” http://www.nlc.state.ne.us/libdev/copyright/copyright1.html

Resources -112-

Niederst, Jennifer. Web Design in a Nutshell: A Desktop Quick Reference. 2nd edition. Beijing, Sebastopol, CA: O’Reilly, 2001. Nielsen, Jakob. Designing Web Usability: The Practice of Simplicity, 1999. The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials (Version 1.1 of the First Edition, published February 2003, http://www.nyu.edu/its/humanities/ninchguide/) Oxford University. “Assessment Criteria for Digitization” http://www.bodley.ox.ac.uk/scoping/assessment.html PDInfo, Copyright and the Public Domain, http://www.pdinfo.com/copyrt.htm Planning Digital Projects for Historical Collections in New York State, New York Public Library, available at: http://digital.nypl.org/brochure/ Rehabilitation Act: Section 508 Standards http://www.section508.gov/index.cfm?FuseAction=Content&ID=12 Rieger, Robert and Geri Gay. “Tools and Techniques in Evaluating Digital Imaging Projects” in RLG DigiNews, 3(3), 1999, http://www.rlg.org/preserv/diginews/diginews3-3.html Research Libraries Group and Digital Library Federation. Guides to Quality in Visual Resource Imaging, http://www.rlg.org/legacy/visguides/visguide6.html Simpson, Carol Mann. Copyright for Schools: A Practical Guide, Third Edition. (Professional Growth Series). Ohio: Linworth, 2001. RLG Tools for Digital Imaging http://www.rlg.org/preserv/RLGtools.html Rothenberg, Jeff. “Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation,” http://www.clir.org/pubs/reports/rothenberg/contents.html (January 1998). St. Pierre, Margaret and William P. LaPlant, Jr. Issues in Crosswalking Content Metadata Standards. 1998. http://www.niso.org/press/whitepapers/crosswalk.html Smith, Abby, “Why Digitize?” Washington D.C.: Council on Library and Information Resources, 1999. available at: http://www.clir.org/pubs/reports/pub80-smith/pub80.html Smith, Terence R. (1996). “The Meta-Information Environment of Digital Libraries.” in D-lib Magazine. July/August 1996. Society of American Archivists. "Basic Principles for Managing Intellectual Property In the Digital Environment: An Archival Perspective" http://www.archivists.org/statements/managing-intproperty.asp SOLINET. Disaster Mitigation and Recovery Resources http://www.solinet.net/preservation/preservation_templ.cfm?doc_id=71

Resources -113-

Stanford University Libraries. Copyright and Fair Use http://fairuse.stanford.edu/Copyright_and_Fair_Use_Overview/index.html Taylor, Arlene. The Organization of Information. Englewood, Co.: Libraries Unlimited, Inc., 1999. Templeton, Brad. “10 Big Myths about copyright explained” http://www.templetons.com/brad/copymyths.html U.S. Copyright Office Home Page http://www.copyright.gov/ Wallace, Danny P. and Connie Van Fleet, ed. Library Evaluation: a casebook and can-do guide. Englewood, Co.: Libraries Unlimited, 2001. Weibel, Stuart (1995). “Metadata:The Foundations of Resource Description” D-Lib Magazine. July 1995 Williams, Don. “Selecting a Scanner,” Guides to Quality in Visual Resource Imaging, http://www.rlg.org/visguides/visguide2.html Zeng, Marcia Lei. "Metadata Elements for Object Description and Representation: A Case Report from a Historical Fashion Collection Project." Journal of the American Society for Information Science 50, no. 13 (1999): 1193-1208.