DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE...
Transcript of DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE...
Computer Vision for
Historical Document Image Analysis
Alicia FornésComputer Vision Center, Barcelona, Spain
http://www.cvc.uab.es/people/afornes/
DHAD
Digital Humanities
Abu Dhabi
The Computer Vision Centre (CVC)
• A non-profit institution in Barcelona
• More than 130 researchers and
technicians devoted to R&D on
Computer Vision
• The most advanced resources in
Computer Vision hardware and software
• Computer vision is the science and
technology of machines that see.
CVC Areas of expertise
Index
Document Image Analysis
Handwriting Recognition
Word spotting
Conclusions
Introduction
Digitization in Archives
Preservation: Face the paper deterioration problem
Storage: Avoid having kilometers of shelves
Accessibility: Allow users around the world access the cultural heritage
Historical document image analysis and recognition is the area of computer
vision that addresses the problem of automatically recognizing document
contents (printed or handwritten text, or graphical elements).
Europeana
A Document Image
A digital image is a bidimensional function I(x,y) such that each point x,y
(pixel) has the value of the light intensity in that point.
Main tasks: Document Enhancement
Main tasks: Document Enhancement
A.Fornés, X.Otazu, J.Lladós. Show through cancellation and image enhancement by multiresolution
contrast processing. International Conference on Document Analysis and Recognition, 2013.
Text documents: main tasks
Layout analysis: to detect (crop) records, lines, words for subsequent
recognition.
Full transcription: to convert images to editable text.
Word spotting: given a query word to search,
to locate at image level visually similar word snippets.
dit dia reberê de Hieronym Ponsich corder de Bara fill de Juâ Pon=
BLOCKS
WORDS
LINES
Graphic documents
Pictures
Paitings
Maps
Music Scores
Drawings
Writer Identification
• Determine the author of a piece of handwriting among a set of writers
• Utilities
Forensic, such as signature verification
Typically applied to text documents
• Other scenarios: paintings, drawings, music scores…
Dating
Evolution of handwriting (variation over time)
Pianola Rolls
• Player piano rolls reflect the music interpretation and reception during early 20th
• Some are real piano recordings (with tempo & dynamics) historical recording!
• There are more than 10.000 pianola rolls in Spain
Pianola Roll Digitizer: https://www.youtube.com/watch?v=vmTryKCM_e8&feature=youtu.be
• Preservation (digital copy)
• Creation of MIDI (for listening)
HANDWRITING RECOGNITION APPLICATION TO HISTORICAL DEMOGRAPHIC DOCS
story
telling
health
migrations
Social
mobility
Universal access to the picture of the past
genealogy tourism
Lots of info locked in historical archives
• Birth, marriage, death records
• Census records Record linkage: Social Network of the Past
CitizensScholars
Services
Technical architecture
Image Space
Transcription
Space
Contextual
knowledge
Space
HW recognition
Crowdsourcing
Data mining• Harmonization
• Record linkage
Scanning
exploitation
Handwriting Recognition
Image:
Transcription: Anna Victoria donsella filla de Jaume Torrent pages de
Difficulties
- Different handwriting styles
- Large Vocabulary, Segmentation problem
Techniques
- Hidden Markov Models / Neural Networks
- We need to learn (we need annotated data)“forest”
Objective: Information Extraction (fill a database)
Advantage:
All marriage records share similar syntactic structures:
Handwriting Recognition Information Extraction
Semantic Recognition and Information Extraction
Semantic Recognition / Word Image Categorization
• Deep Neural Networks (CNNs)
• Named Entity Detection and Classification
E.g. (male/female name, surname,
occupation, place, date)
Information Extraction
• Language Models (e.g. grammars, n-grams, dictionaries)
• Fill in a database
E.g. wife’s name, husband’s occupation, father’s place of origin, etc.
Summary
Demo: http://dag.cvc.uab.es/infoesposalles/media-gallery/
WORD SPOTTING
Introduction
Users would like to search information in document images like this!
Word Spotting by similarity
Query DatabaseDynamic Time Warping
(sequence alignment)
When there is NO training data (annotated data)
Word Spotting by similarity (query-by-example)
The search is based on shape/appearance.
It can be adapted to any kind of script and document (words, symbols, etc.)
M. Rusiñol, D. Aldavert, R. Toledo and J. Lladós. Browsing Heterogeneous Document Collections by a Segmentation-free
Word Spotting Method. International Conference on Document Analysis and Recognition (ICDAR), 2011.
“hotel”
“pizzeria”OCanonical Correlation Analysis (CCA) learns a common subspace
J.Almazán, A.Gordo, A.Fornés, E.Valveny. Word Spotting and Recognition with Embedded Attributes.
IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.
Word Spotting by text (query-by-string)
When there is training data (annotated data), we can learn
Demo application in Android device (Tablet)
Segmentation-based. Query-by-string & Query-by-example
http://dag.cvc.uab.es/infoesposalles/media-gallery/
Combined searches
- AND
- OR
P.Riba, J.Almazán, A.Fornés, D.Fernández, E.Valveny, J.Lladós. e-Crowds: a mobile platform for browsing and searching in
historical demography-related manuscripts. International Conference on Frontiers in Handwriting Recognition (ICFHR), 2014.
WORD SPOTTING IN A
TRANSCRIPTION TOOL
Catalan census records example
household
relationship
members
street name
street number
machine printed
header of columns
Census records: extrinsic context1
82
8
18
78
19
55
19
10
19
15
19
20
19
24
19
30
19
40
19
45
19
50
18
89
19
00
19
06
18
81
18
86
18
57
18
90
18
33
Household linkage
(redundancy):
extrinsic context
+
-
Extrinsic context in census records
Jaime tort botella
Mercedes sanllehi comas
Julia tort sanllehi
Pedro id id
Jaime id id
Mariano Nin Figueras
Census of
previous
point in time
@ = Carretera 1
Layout
segmentation
Census page at time t
Family information
of home i
Street
numbers
Street address
recognition
Information
retrieval
Query by string
word spotting
1
2
3
4
5
6
1
2
3
4
5
6
Family
information of
home i at time t-1
Joan Mas, Alicia Fornés, Josep, Lladós (2016). An Interactive Transcription System of Census Records using
Word Spotting based Information Transfer. In IAPR Document Analysis Systems Workshop (DAS), 2016.
Steps
Citizen participation: transcription (crowdsourcing)
Transcription
(crowdsourcing)
Transfer of information
(word spotting)
Results available for society
Search Engine: http://dagapp.cvc.uab.es/CercadorSantFeliu/
CONCLUSIONS
• Handwriting Recognition only if enough labelled data to train
• Word Spotting useful when no labelled data is available
• Handwriting Recognition and Word Spotting:
– Can be useful for semi-automatic transcription
– Can be used for validation (check if the user did any mistakes)
– Can be used for indexing the information contained in document collections
But…. transcription is not enough.
• The goal is to evolve towards Document Understanding
– Incorporate the knowledge from experts
– Collaboration Humanists Computer Scientists
Conclusions
36
Thank you!
Computer Vision for Historical Document Image AnalysisAlicia Fornés
Computer Vision Center, Barcelona, Spain
http://www.cvc.uab.es/people/afornes/