Post on 14-Jan-2016
description
Digitization Workflow Management System for
Massive Digitization Projects
Bibliotheca Alexandrina
November 19, 2006
The 2nd International Conference on Universal Digital Library 2006
(ICUDL 2006)
Mohamed Yakout Noha Adly Magdy Nagimohamed.yakout@bibalex.org noha.adly@bibalex.org magdy.nagi@bibalex.org
Goals Automate, track and manage the digitization
workflow. Flexibility in defining digitization workflow Phases. Support dynamic evolution and deviations with a
history tracking. Flexibility integration with the LIS and Library Digital
Repository. Accept external partially digitized Jobs to start in the
proper Phase within the digitization workflow Simultaneous management of multiple projects with
a diversity of materials (books, journals, manuscripts, audio, video, slides, … etc)
Related Work Manual workflow management using several software packages (MS
Excel, MS SharePoint, MS Project) Simple tracking workflow system with limited capabilities Several integrated digitization activities (digital capturing, image
processing, OCRing, …) in one software DOCWorks from CCS. BookRestorer from i2s. OUPS
Limitations: Tightly coupled with certain tools and do not allow easily other tools to be
integrated. No Resources Management (e.g. Workstations and users) Lack of projects and collections management. Manual files handling between the storage server and clients. Lack of handling workflow exceptions, dynamic evolution and deviations
except through manual intervention.
System Data Model
Phase
Job
Job Type
Collection
Workstation
User
System Data Model
The object being digitized Book for Naguib Mahfouz Photos for an event Map for Alexandria Music sheet for Omar Khayrat
Phase
Job
Job Type
Collection
Workstation
User
System Data Model
All types of materials in the system Book Manuscripts Map Journals Audio Video
Phase
Job
Job Type
Collection
Workstation
User
System Data Model
A task that should be applied within the digitization process
Scanning Processing OCRing Encoding Publishing Zipping for archiving
Phase
Job
Job Type
Collection
Workstation
User
System Data Model
The system users with several roles Digital lab operators Shift operators Administrator
Phase
Job
Job Type
Collection
Workstation
User
System Data Model
Represents logical grouping for the Jobs
Nasser AlexMed AMEEL
Phase
Job
Job Type
Collection
Workstation
User
System Data Model
The computer used to perform the Phase
Phase
Job
Job Type
Collection
Workstation
User
System ArchitectureP
ha
se M
ana
ge
r
Job Type CPhase CX
PrePhase
CX
Phase CX
PostPhase
CX
Phase C1 Phase C2Phase CN
Job Type B
Phase B1 Phase B2Phase BN
Job Type A
Phase A1 Phase A2Phase AN
XML Phases Definition Handler File Handler
Database Handler
DAF
DatabaseStored Procedures
LIS Server
LIS
File Server
Authentication and Authorization Handler
Check-InModule
Jobs in the System
Administration Module
ReportingModule
Check-OutTo
Digital Documents Repository
Archiving Module
Off-line Storage
System ArchitectureP
ha
se M
ana
ge
r
Job Type CPhase CX
PrePhase
CX
Phase CX
PostPhase
CX
Phase C1 Phase C2Phase CN
Job Type B
Phase B1 Phase B2Phase BN
Job Type A
Phase A1 Phase A2Phase AN
XML Phases Definition Handler File Handler
Database Handler
DAF
DatabaseStored Procedures
LIS Server
LIS
File Server
Authentication and Authorization Handler
Check-InModule
Jobs in the System
Administration Module
ReportingModule
Check-OutTo
Digital Documents Repository
Archiving Module
Off-line Storage
System ArchitectureP
ha
se M
ana
ge
r
Job Type CPhase CX
PrePhase
CX
Phase CX
PostPhase
CX
Phase C1 Phase C2Phase CN
Job Type B
Phase B1 Phase B2Phase BN
Job Type A
Phase A1 Phase A2Phase AN
XML Phases Definition Handler File Handler
Database Handler
DAF
DatabaseStored Procedures
LIS Server
LIS
File Server
Authentication and Authorization Handler
Check-InModule
Jobs in the System
Administration Module
ReportingModule
Check-OutTo
Digital Documents Repository
Archiving Module
Off-line Storage
System Handlers
XML Phases Definition Handler Pre-Phase and Post-Phase Physical section Database section Reflection Call
<Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . .
</Physical>
</PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" />
</PostPhase></Phase>
System Handlers
XML Phases Definition Handler Pre-Phase and Post-Phase Physical section Database section Reflection Call
<Phase Name="Book Arabic OCR"> <PrePhase>
<Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . .
</Physical> </PrePhase> <PostPhase>
<Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder>
</Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase></Phase>
System Handlers
XML Phases Definition Handler Pre-Phase and Post-Phase Physical section Database section Reflection Call
<Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . .
</Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical>
<Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . .
</Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase></Phase>
System Handlers
XML Phases Definition Handler Pre-Phase and Post-Phase Physical section Database section Reflection Call
<Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . .
</Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database>
<ReflectionCall Method="packageName.doSomething" /> </PostPhase></Phase>
System ArchitectureP
ha
se M
ana
ge
r
Job Type CPhase CX
PrePhase
CX
Phase CX
PostPhase
CX
Phase C1 Phase C2Phase CN
Job Type B
Phase B1 Phase B2Phase BN
Job Type A
Phase A1 Phase A2Phase AN
XML Phases Definition Handler File Handler
Database Handler
DAF
DatabaseStored Procedures
LIS Server
LIS
File Server
Authentication and Authorization Handler
Check-InModule
Jobs in the System
Administration Module
ReportingModule
Check-OutTo
Digital Documents Repository
Archiving Module
Off-line Storage
System ArchitectureP
ha
se M
ana
ge
r
Job Type CPhase CX
PrePhase
CX
Phase CX
PostPhase
CX
Phase C1 Phase C2Phase CN
Job Type B
Phase B1 Phase B2Phase BN
Job Type A
Phase A1 Phase A2Phase AN
XML Phases Definition Handler File Handler
Database Handler
DAF
DatabaseStored Procedures
LIS Server
LIS
File Server
Authentication and Authorization Handler
Check-InModule
Jobs in the System
Administration Module
ReportingModule
Check-OutTo
Digital Documents Repository
Archiving Module
Off-line Storage
System ArchitectureP
ha
se M
ana
ge
r
Job Type CPhase CX
PrePhase
CX
Phase CX
PostPhase
CX
Phase C1 Phase C2Phase CN
Job Type B
Phase B1 Phase B2Phase BN
Job Type A
Phase A1 Phase A2Phase AN
XML Phases Definition Handler File Handler
Database Handler
DAF
DatabaseStored Procedures
LIS Server
LIS
File Server
Authentication and Authorization Handler
Check-InModule
Jobs in the System
Administration Module
ReportingModule
Check-OutTo
Digital Documents Repository
Archiving Module
Off-line Storage
System ArchitectureP
ha
se M
ana
ge
r
Job Type CPhase CX
PrePhase
CX
Phase CX
PostPhase
CX
Phase C1 Phase C2Phase CN
Job Type B
Phase B1 Phase B2Phase BN
Job Type A
Phase A1 Phase A2Phase AN
XML Phases Definition Handler File Handler
Database Handler
DAF
DatabaseStored Procedures
LIS Server
LIS
File Server
Authentication and Authorization Handler
Check-InModule
Jobs in the System
Administration Module
ReportingModule
Check-OutTo
Digital Documents Repository
Archiving Module
Off-line Storage
System Modules Check-In
Plug-in based for integration.
Creates the Job in the system
Assign the Job to any Phase
Check-Out Java Reflection Call
section of the XML Phases Definition
Ingest the Job’s digital objects into the repository
DWMS
Check-in Plug-in
VirtuaPlug-in
DigiArabPlug-in
MARC FilePlug-in
.
.
.
.
MODS FilePlug-in
Check-out Plug-ins
DARPlug-in
FedoraPlug-in
DSpacePlug-in
.
.
.
.
aDORePlug-in
DAR
Fedora
DSpace
aDORe
System ArchitectureP
ha
se M
ana
ge
r
Job Type CPhase CX
PrePhase
CX
Phase CX
PostPhase
CX
Phase C1 Phase C2Phase CN
Job Type B
Phase B1 Phase B2Phase BN
Job Type A
Phase A1 Phase A2Phase AN
XML Phases Definition Handler File Handler
Database Handler
DAF
DatabaseStored Procedures
LIS Server
LIS
File Server
Authentication and Authorization Handler
Check-InModule
Jobs in the System
Administration Module
ReportingModule
Check-OutTo
Digital Documents Repository
Archiving Module
Off-line Storage
System Modules Phases Manager
Request a new Job Download the Jobs folders and files Submit the Job back to the system to continue other Phases Reject a Job and recommend another Phase in addition to
specifying reasons. Redirect a Job from the default Phase Sequence Provide information on the files level to help solving problems
System Modules (Contd)
Reporting Workflow Tracking Pending Items Late Jobs Operators rates Build Customized Report
Archiving On different Medias with
different size and on online storage
Administration
BA Digitization Workflow
Job Type: Small Images
Job Type: Latin Books
Job Type: Arabic Books
Arabic Books Scanning
Arabic Books Processing
Arabic Books OCRing
Arabic Books Encoding & Publishing
Arabic Books Archiving
Job Type: Manuscripts
ManuscriptsScanning
ManuscriptsProcessing
ManuscriptsArchiving
Small ImagesScanning
Small ImagesProcessing
Small ImagesPublishing
Small ImagesArchiving
ManuscriptsEncoding & Publishing
Che
ck-in
Che
ck-o
ut
Arabic Books QA
Latin BooksScanning
Latin Book sProcessing
Latin BooksOCRing
Latin BooksEncoding & Publishing
Latin BooksArchiving
Latin BooksQA
ManuscriptsQA
Job Type: Maps
MapsScanning
MapsProcessing
MapsPublishing
MapsArchiving
Job Type: Large Images
Large ImagesScanning
Large ImagesProcessing
Large ImagesPublishing
Large ImagesArchiving
Quality Assurance
Supported on two different stages Maintain QA information on the files levels while moving
from a Phase to another. A QA Phase is defined in the Digitization Phase Sequence
as the last Phase before the Archiving
Arabic Books Scanning
Arabic Books Processing
Arabic Books OCRing
Arabic Books Encoding & Publishing
Arabic Books Archiving
Arabic Books QA
Information of output objects (pages) level
Achieving Flexibility Using DWMS
The defined Phase Sequence for a Job Type is a guide, rather than a prescription.
The list of Phases can or can not be in the Phase Sequence. The operator can assign the Job to any of all of these Phases.
Jobs can be Forwarded dynamically to another Phase in the Phase Sequence.
Changes in the Phase Sequence affects the current and new Jobs in the system, leading to natural process evolution
Arabic Books Scanning
Arabic Books Processing
Arabic Books OCRing
Arabic Books Encoding & Publishing
Arabic Books Archiving
Arabic Books QA
Arabic Books Scanning
Arabic Books Processing
Arabic Books OCRing
Arabic Books Encoding & Publishing
Arabic Books Archiving
Arabic Books QA
Arabic Books Scanning
Arabic Books Processing
Arabic Books OCRing
Arabic Books Archiving
Arabic Books QA
Job Life Cycle
Start
Reject
Assign
Redirect
Finish
Administrator accept the rejection
File transfer Ordinary job finishing
Recommend re-do a phase to the jobAdministrator accept the recommendation
Job assigned to next stage
Reject job for some problems
New Job To Repository
Future Work
Check-out plug-in for Fedora.. Check-in plug-ins will be implemented to
support various metadata standards formats MODS, DC, VAR, etc.
Enhance the software interface with graphical tools to help design and follow the digitization process.
Thank You
mohamed.yakout@bibalex.org