DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE...

36
Computer Vision for Historical Document Image Analysis Alicia Fornés Computer Vision Center, Barcelona, Spain [email protected] http://www.cvc.uab.es/people/afornes/ DHAD Digital Humanities Abu Dhabi

Transcript of DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE...

Page 1: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Computer Vision for

Historical Document Image Analysis

Alicia FornésComputer Vision Center, Barcelona, Spain

[email protected]

http://www.cvc.uab.es/people/afornes/

DHAD

Digital Humanities

Abu Dhabi

Page 2: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

The Computer Vision Centre (CVC)

• A non-profit institution in Barcelona

• More than 130 researchers and

technicians devoted to R&D on

Computer Vision

• The most advanced resources in

Computer Vision hardware and software

• Computer vision is the science and

technology of machines that see.

Page 3: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

CVC Areas of expertise

Page 4: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Index

Document Image Analysis

Handwriting Recognition

Word spotting

Conclusions

Page 5: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Introduction

Digitization in Archives

Preservation: Face the paper deterioration problem

Storage: Avoid having kilometers of shelves

Accessibility: Allow users around the world access the cultural heritage

Historical document image analysis and recognition is the area of computer

vision that addresses the problem of automatically recognizing document

contents (printed or handwritten text, or graphical elements).

Europeana

Page 6: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

A Document Image

A digital image is a bidimensional function I(x,y) such that each point x,y

(pixel) has the value of the light intensity in that point.

Page 7: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Main tasks: Document Enhancement

Page 8: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Main tasks: Document Enhancement

A.Fornés, X.Otazu, J.Lladós. Show through cancellation and image enhancement by multiresolution

contrast processing. International Conference on Document Analysis and Recognition, 2013.

Page 9: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Text documents: main tasks

Layout analysis: to detect (crop) records, lines, words for subsequent

recognition.

Full transcription: to convert images to editable text.

Word spotting: given a query word to search,

to locate at image level visually similar word snippets.

dit dia reberê de Hieronym Ponsich corder de Bara fill de Juâ Pon=

BLOCKS

WORDS

LINES

Page 10: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Graphic documents

Pictures

Paitings

Maps

Music Scores

Drawings

Page 11: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Writer Identification

• Determine the author of a piece of handwriting among a set of writers

• Utilities

Forensic, such as signature verification

Typically applied to text documents

• Other scenarios: paintings, drawings, music scores…

Page 12: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Dating

Evolution of handwriting (variation over time)

Page 13: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Pianola Rolls

• Player piano rolls reflect the music interpretation and reception during early 20th

• Some are real piano recordings (with tempo & dynamics) historical recording!

• There are more than 10.000 pianola rolls in Spain

Pianola Roll Digitizer: https://www.youtube.com/watch?v=vmTryKCM_e8&feature=youtu.be

• Preservation (digital copy)

• Creation of MIDI (for listening)

Page 14: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

HANDWRITING RECOGNITION APPLICATION TO HISTORICAL DEMOGRAPHIC DOCS

Page 15: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

story

telling

health

migrations

Social

mobility

Universal access to the picture of the past

genealogy tourism

Lots of info locked in historical archives

• Birth, marriage, death records

• Census records Record linkage: Social Network of the Past

CitizensScholars

Services

Page 16: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Technical architecture

Image Space

Transcription

Space

Contextual

knowledge

Space

HW recognition

Crowdsourcing

Data mining• Harmonization

• Record linkage

Scanning

exploitation

Page 17: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Handwriting Recognition

Image:

Transcription: Anna Victoria donsella filla de Jaume Torrent pages de

Difficulties

- Different handwriting styles

- Large Vocabulary, Segmentation problem

Techniques

- Hidden Markov Models / Neural Networks

- We need to learn (we need annotated data)“forest”

Page 18: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Objective: Information Extraction (fill a database)

Advantage:

All marriage records share similar syntactic structures:

Handwriting Recognition Information Extraction

Page 19: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Semantic Recognition and Information Extraction

Semantic Recognition / Word Image Categorization

• Deep Neural Networks (CNNs)

• Named Entity Detection and Classification

E.g. (male/female name, surname,

occupation, place, date)

Information Extraction

• Language Models (e.g. grammars, n-grams, dictionaries)

• Fill in a database

E.g. wife’s name, husband’s occupation, father’s place of origin, etc.

Page 20: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Summary

Demo: http://dag.cvc.uab.es/infoesposalles/media-gallery/

Page 21: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

WORD SPOTTING

Page 22: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Introduction

Users would like to search information in document images like this!

Page 23: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Word Spotting by similarity

Query DatabaseDynamic Time Warping

(sequence alignment)

When there is NO training data (annotated data)

Page 24: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Word Spotting by similarity (query-by-example)

The search is based on shape/appearance.

It can be adapted to any kind of script and document (words, symbols, etc.)

M. Rusiñol, D. Aldavert, R. Toledo and J. Lladós. Browsing Heterogeneous Document Collections by a Segmentation-free

Word Spotting Method. International Conference on Document Analysis and Recognition (ICDAR), 2011.

Page 25: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

“hotel”

“pizzeria”OCanonical Correlation Analysis (CCA) learns a common subspace

J.Almazán, A.Gordo, A.Fornés, E.Valveny. Word Spotting and Recognition with Embedded Attributes.

IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Word Spotting by text (query-by-string)

When there is training data (annotated data), we can learn

Page 26: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Demo application in Android device (Tablet)

Segmentation-based. Query-by-string & Query-by-example

http://dag.cvc.uab.es/infoesposalles/media-gallery/

Combined searches

- AND

- OR

P.Riba, J.Almazán, A.Fornés, D.Fernández, E.Valveny, J.Lladós. e-Crowds: a mobile platform for browsing and searching in

historical demography-related manuscripts. International Conference on Frontiers in Handwriting Recognition (ICFHR), 2014.

Page 27: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

WORD SPOTTING IN A

TRANSCRIPTION TOOL

Page 28: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Catalan census records example

household

relationship

members

street name

street number

machine printed

header of columns

Page 29: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Census records: extrinsic context1

82

8

18

78

19

55

19

10

19

15

19

20

19

24

19

30

19

40

19

45

19

50

18

89

19

00

19

06

18

81

18

86

18

57

18

90

18

33

Household linkage

(redundancy):

extrinsic context

+

-

Page 30: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Extrinsic context in census records

Jaime tort botella

Mercedes sanllehi comas

Julia tort sanllehi

Pedro id id

Jaime id id

Mariano Nin Figueras

Census of

previous

point in time

@ = Carretera 1

Layout

segmentation

Census page at time t

Family information

of home i

Street

numbers

Street address

recognition

Information

retrieval

Query by string

word spotting

1

2

3

4

5

6

1

2

3

4

5

6

Family

information of

home i at time t-1

Joan Mas, Alicia Fornés, Josep, Lladós (2016). An Interactive Transcription System of Census Records using

Word Spotting based Information Transfer. In IAPR Document Analysis Systems Workshop (DAS), 2016.

Page 31: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Steps

Page 32: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Citizen participation: transcription (crowdsourcing)

Transcription

(crowdsourcing)

Transfer of information

(word spotting)

Page 33: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

Results available for society

Search Engine: http://dagapp.cvc.uab.es/CercadorSantFeliu/

Page 34: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

CONCLUSIONS

Page 35: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

• Handwriting Recognition only if enough labelled data to train

• Word Spotting useful when no labelled data is available

• Handwriting Recognition and Word Spotting:

– Can be useful for semi-automatic transcription

– Can be used for validation (check if the user did any mistakes)

– Can be used for indexing the information contained in document collections

But…. transcription is not enough.

• The goal is to evolve towards Document Understanding

– Incorporate the knowledge from experts

– Collaboration Humanists Computer Scientists

Conclusions

Page 36: DHAD Digital Humanities Computer Vision for Abu …afornes/videos/AFornes_DHAD2017.pdf · IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, issue 12, 2014.

36

Thank you!

Computer Vision for Historical Document Image AnalysisAlicia Fornés

Computer Vision Center, Barcelona, Spain

[email protected]

http://www.cvc.uab.es/people/afornes/