1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
CS 502: Computing Methods for Digital Libraries
description
Transcript of CS 502: Computing Methods for Digital Libraries
1
CS 502: Computing Methods for Digital Libraries
Lecture 9
Conversion to Digital Formats
Anne Kenney, Cornell University Library
2
What are Digital Images?
• Electronic snapshots taken of a scene or scanned from documents
• samples and mapped as a grid of dots or picture elements (pixels)
• pixel assigned a tonal value (black, white, grays, colors), represented in binary code
• code stored or reduced (compressed)
• read and interpreted to create analog version
Four Scanning Methods
Bitonal Grayscale
Color Special Treatment
4
Digital Image Quality is Governed By:
• resolution and threshold
• bit depth
• image enhancement
• color management
• compression
• system performance
• operator judgment and care
5
Resolution
• determined by number of pixels used to represent the image
• expressed in dots per inch (dpi)--actually dots/sq. inch
• increasing resolution increases level of detail captured and geometrically increases file size
Effects of Resolution
600 dpi600 dpi
300 dpi300 dpi
200 dpi200 dpi
7
Threshold Setting in Bitonal Scanning
defines the point on a scale from 0 to 255 at which gray values will be interpreted either as black or white
8
Effects of Threshold
threshold = 100
threshold = 60
9
Bit Depth
• number of bits used to represent each pixel, typically 8 bits or more per channel
• representing 256 (28) levels for grayscale and 16.7 million (224) levels for color example: 8-bit grayscale pixel
00000000 = black
11111111 = white
10
Bit Depth
• increasing bit depth increases the level of gray or color information that can be represented and arithmetically increases file size
• affects resolution requirements
11
Effects of Grayscale on Image Quality
3-bit gray 8-bit gray
12
Image Enhancement
• can be used to improve image capture
• use raises concerns about fidelity and authenticity
13
Effects of FiltersEffects of Filters
no filters usedno filters used
maximum maximum enhancementenhancement
14
Image Editing
15
Compression
• reduces file size for processing, storage, transmission, and display
• image quality may be affected by the compression techniques used and the level of compression applied
16
Compression Variables
• lossless versus lossy compression
• proprietary vs. open schemes
• level of industry support
• bitonal vs. gray/color
17
Common Compression Schemes• bitonal
– ITU Group 4: lossless – JBIG (ISO 11544): lossless– CPC: Lossy– DigiPaper
• grayscale/color– LZW, lossless– JPEG: lossy– Kodak Image Pac, “visually lossless”– Fractal and Wavelet compression
18
Effects of JPEG Compression
300 dpi, 8-bit grayscaleuncompressed TIFF
JPEG 18.5:1 compression
19
Compression Observations
• the richer the file, the more efficient and sustainable the compression
• the more complex the image, the poorer the compression
20
Equipment used and its performance over time
• scanners offer wide range of capabilities to capture detail, dynamic range, and color
• scanners with same stated functionality can produce different results
• calibration, age of equipment, and environment affect quality
21
Equipment used and its performance over time
• attributes and capabilities of monitor and/or printer are also factors
• assess quality visually and computationally– use targets– control QC environment– increasing availability of software to assess
resolution, tone, color, artifacts
22
Image Capture:
Create digital objects rich enough to be useful over time in the most cost- effective manner.
23
How to determine what’s good enough?
• Connoisseurship of document attributes
• Objective characterizations
• Translation between analog and digital– measurement to scanning requirement to
corresponding image metrics– e.g., detail sizeresolution MTF– tonal range bit depth signal-to-noise ratio
24
Case Study
• Brittle Books--printed text, use of metal type, commercial publishers, objective measurement, use of Quality Index from micrographics
• 600 dpi 1-bit capture adequately preserves informational content of text-based materials
25
Ensuring Full Informational Capture: “No More, No Less”
cost
imag
e qu
ality
and
util
itydesired point of capture
26
Create One Scan To Serve Multiple Uses
• Derive alternative formats/approaches to meet current and future information needs
• Base “derivative” requirements on document attributes, technical infrastructure, user requirements, and cost
• Understand technical links affecting presentation and utility of derivatives
27
User Requirements
• completeness
• legibility
• speed of delivery
• “cooked” files
28
Derivatives from a Digital Master
• the richer the image, the better the derivative– a derivative from a rich file is superior in
quality to one from a poorer scan– the richer the image, the better the image
processing
monitor: 800 x 600 pixels
800
600
document: 8” x 10”, 200 dpi (1,600 x 2,000 pixels)
2,000pixels
1,600 pixels
document at 60 dpi480 pixels x 600 pixels
document at 100 dpi800 pixels x 1,000 pixels
TIFF Uncompressed GGIF Compressed6:1 (NARA)6:1 (NARA)
JPEG Compressed 20:1 ( LC) Compressed
20:1 (LC)
Compression/File Format Comparison for Derivative Files
33
Alternatives for Displaying Oversize Images
• File formats and compression schemes that support multi-resolution image delivery, e.g., wavelet compression, GridPix, Flashpix
• User tools for representing scale (Blake Project ImageSizer, java applet), and improving image quality
34
Recommendations Coalescing• Intent of conversion drives decisions
– issues of access considered at conversion– notion of long-term utility and cross-institutional
resources gaining ground
• Access images will change with:– changing user needs and capabilities– changes in technologies: file formats, technical
infrastructure,compression, web browsers, processing programs, scaling routines