DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE...

Post on 19-Sep-2018

224 views 0 download

Transcript of DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE...

Computer Vision for

Historical Document Image Analysis

Alicia FornésComputer Vision Center, Barcelona, Spain

afornes@cvc.uab.es

http://www.cvc.uab.es/people/afornes/

DHAD

Digital Humanities

Abu Dhabi

The Computer Vision Centre (CVC)

• A non-profit institution in Barcelona

• More than 130 researchers and

technicians devoted to R&D on

Computer Vision

• The most advanced resources in

Computer Vision hardware and software

• Computer vision is the science and

technology of machines that see.

CVC Areas of expertise

Index

Document Image Analysis

Handwriting Recognition

Word spotting

Conclusions

Introduction

Digitization in Archives

Preservation: Face the paper deterioration problem

Storage: Avoid having kilometers of shelves

Accessibility: Allow users around the world access the cultural heritage

Historical document image analysis and recognition is the area of computer

vision that addresses the problem of automatically recognizing document

contents (printed or handwritten text, or graphical elements).

Europeana

A Document Image

A digital image is a bidimensional function I(x,y) such that each point x,y

(pixel) has the value of the light intensity in that point.

Main tasks: Document Enhancement

Main tasks: Document Enhancement

A.Fornés, X.Otazu, J.Lladós. Show through cancellation and image enhancement by multiresolution

contrast processing. International Conference on Document Analysis and Recognition, 2013.

Text documents: main tasks

Layout analysis: to detect (crop) records, lines, words for subsequent

recognition.

Full transcription: to convert images to editable text.

Word spotting: given a query word to search,

to locate at image level visually similar word snippets.

dit dia reberê de Hieronym Ponsich corder de Bara fill de Juâ Pon=

BLOCKS

WORDS

LINES

Graphic documents

Pictures

Paitings

Maps

Music Scores

Drawings

Writer Identification

• Determine the author of a piece of handwriting among a set of writers

• Utilities

Forensic, such as signature verification

Typically applied to text documents

• Other scenarios: paintings, drawings, music scores…

Dating

Evolution of handwriting (variation over time)

Pianola Rolls

• Player piano rolls reflect the music interpretation and reception during early 20th

• Some are real piano recordings (with tempo & dynamics) historical recording!

• There are more than 10.000 pianola rolls in Spain

Pianola Roll Digitizer: https://www.youtube.com/watch?v=vmTryKCM_e8&feature=youtu.be

• Preservation (digital copy)

• Creation of MIDI (for listening)

HANDWRITING RECOGNITION APPLICATION TO HISTORICAL DEMOGRAPHIC DOCS

story

telling

health

migrations

Social

mobility

Universal access to the picture of the past

genealogy tourism

Lots of info locked in historical archives

• Birth, marriage, death records

• Census records Record linkage: Social Network of the Past

CitizensScholars

Services

Technical architecture

Image Space

Transcription

Space

Contextual

knowledge

Space

HW recognition

Crowdsourcing

Data mining• Harmonization

• Record linkage

Scanning

exploitation

Handwriting Recognition

Image:

Transcription: Anna Victoria donsella filla de Jaume Torrent pages de

Difficulties

- Different handwriting styles

- Large Vocabulary, Segmentation problem

Techniques

- Hidden Markov Models / Neural Networks

- We need to learn (we need annotated data)“forest”

Objective: Information Extraction (fill a database)

Advantage:

All marriage records share similar syntactic structures:

Handwriting Recognition Information Extraction

Semantic Recognition and Information Extraction

Semantic Recognition / Word Image Categorization

• Deep Neural Networks (CNNs)

• Named Entity Detection and Classification

E.g. (male/female name, surname,

occupation, place, date)

Information Extraction

• Language Models (e.g. grammars, n-grams, dictionaries)

• Fill in a database

E.g. wife’s name, husband’s occupation, father’s place of origin, etc.

Summary

Demo: http://dag.cvc.uab.es/infoesposalles/media-gallery/

WORD SPOTTING

Introduction

Users would like to search information in document images like this!

Word Spotting by similarity

Query DatabaseDynamic Time Warping

(sequence alignment)

When there is NO training data (annotated data)

Word Spotting by similarity (query-by-example)

The search is based on shape/appearance.

It can be adapted to any kind of script and document (words, symbols, etc.)

M. Rusiñol, D. Aldavert, R. Toledo and J. Lladós. Browsing Heterogeneous Document Collections by a Segmentation-free

Word Spotting Method. International Conference on Document Analysis and Recognition (ICDAR), 2011.

“hotel”

“pizzeria”OCanonical Correlation Analysis (CCA) learns a common subspace

J.Almazán, A.Gordo, A.Fornés, E.Valveny. Word Spotting and Recognition with Embedded Attributes.

IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Word Spotting by text (query-by-string)

When there is training data (annotated data), we can learn

Demo application in Android device (Tablet)

Segmentation-based. Query-by-string & Query-by-example

http://dag.cvc.uab.es/infoesposalles/media-gallery/

Combined searches

- AND

- OR

P.Riba, J.Almazán, A.Fornés, D.Fernández, E.Valveny, J.Lladós. e-Crowds: a mobile platform for browsing and searching in

historical demography-related manuscripts. International Conference on Frontiers in Handwriting Recognition (ICFHR), 2014.

WORD SPOTTING IN A

TRANSCRIPTION TOOL

Catalan census records example

household

relationship

members

street name

street number

machine printed

header of columns

Census records: extrinsic context1

82

8

18

78

19

55

19

10

19

15

19

20

19

24

19

30

19

40

19

45

19

50

18

89

19

00

19

06

18

81

18

86

18

57

18

90

18

33

Household linkage

(redundancy):

extrinsic context

+

-

Extrinsic context in census records

Jaime tort botella

Mercedes sanllehi comas

Julia tort sanllehi

Pedro id id

Jaime id id

Mariano Nin Figueras

Census of

previous

point in time

@ = Carretera 1

Layout

segmentation

Census page at time t

Family information

of home i

Street

numbers

Street address

recognition

Information

retrieval

Query by string

word spotting

1

2

3

4

5

6

1

2

3

4

5

6

Family

information of

home i at time t-1

Joan Mas, Alicia Fornés, Josep, Lladós (2016). An Interactive Transcription System of Census Records using

Word Spotting based Information Transfer. In IAPR Document Analysis Systems Workshop (DAS), 2016.

Steps

Citizen participation: transcription (crowdsourcing)

Transcription

(crowdsourcing)

Transfer of information

(word spotting)

Results available for society

Search Engine: http://dagapp.cvc.uab.es/CercadorSantFeliu/

CONCLUSIONS

• Handwriting Recognition only if enough labelled data to train

• Word Spotting useful when no labelled data is available

• Handwriting Recognition and Word Spotting:

– Can be useful for semi-automatic transcription

– Can be used for validation (check if the user did any mistakes)

– Can be used for indexing the information contained in document collections

But…. transcription is not enough.

• The goal is to evolve towards Document Understanding

– Incorporate the knowledge from experts

– Collaboration Humanists Computer Scientists

Conclusions

36

Thank you!

Computer Vision for Historical Document Image AnalysisAlicia Fornés

Computer Vision Center, Barcelona, Spain

afornes@cvc.uab.es

http://www.cvc.uab.es/people/afornes/