Konstantin Anisimovich Tatiana Danielyanimages.abbyy.com/Partner News/june19/State-of-the-art in...

49
State-of-the-art in neural network implemented by ABBYY R&D Konstantin Anisimovich Tatiana Danielyan

Transcript of Konstantin Anisimovich Tatiana Danielyanimages.abbyy.com/Partner News/june19/State-of-the-art in...

State-of-the-art in neural

network implemented by ABBYY R&D

Konstantin Anisimovich Tatiana Danielyan

A brief overview of this presentation

Power of Deep Learning

Image Processing

OCR

Document analysis and capture

NLP

2

Traditional Machine Learning vs. Deep learning.

Traditional Machine Learning

Input Feature extraction Classification

PlaneNot Plane

Output

Deep Learning

Input Feature extraction + Classification

PlaneNot Plane

Output

3

Deep learning — a class of machine learning algorithms that use a graph of multiple layers of linear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input.

Deep learning

4

Advantages of Deep Learning

Learning high-level features from data in an incremental manner.

No need of domain expertise and hard core feature extraction.

Can solve the problem end to end, while traditional Machine learning techniques need the problem statements to break down to different parts to be solved first and then their results to be combine at final stage.

Power of Deep Learning

5

We use Deep Learning in all our AI components

Deep learning. Why the deep neural networks are not always the best solution

6

Disadvantages of Deep Learning

When “natural” features exist, SVM and GBDT provide comparable or better results using less computing resources.

It doesn’t work so well with small data.

Deep networks are very “black box” in that even now researchers do not fully understand the “inside” of deep networks.

Content

Image Processing

1. Image processing, correction and improvement

2. Image classification

3. Object detection

4. Text detection (Find text)

5. Neural network barcode detection

7

Image processing, correction and improvement

• Criterion of Crop launching — CNN for classifier which can decide whether Crop mechanism should be called or not.

• Crop — Gradient boosting to estimate Crop hypotheses. We plan to integrate CNN approach to correct more complicated geometric distortions (documents with folds and creases), to support new document types (IDs).

• Improvement — CNN for color elements removing, for detection and removing photometric distortions on images (noise, glares, blur).

TASKTo correct geometric and photometric distortions on images of various types documents to make it looks like an ideal scan and to improve the resulting quality of OCR

8

Image processing, correction and improvement: Crop launch criterionExamples

False pass

Redundant crop processCaptured image contain already cropped document or contain other content (no document).Issues:• It takes additional time to process and it can be

critical in mobile application.• Improper crop result can be appearedSolving: Run Crop launch criteria module that classifies input image and run crop module if it is necessary. Module should work much faster than crop module.

Normal crop processInput of the crop module receives dataand crop module process it and outputcropped document.

image crop module

image crop module

9

Image processing, correction and improvement: CropExamples

10

Original photo image Preprocessed image

Image semantic segmentation

11

Image processing, correction and improvement:Image semantic segmentationExamples

12

Image processing, correction and improvement:Image semantic segmentationNN architectureFacts about this NN:• CNN architecture• 10-11 layers• 1 output with multi-class answer (edges, text regions, photos, stamps)• Training dataset: 2000 real images • + augmentation during training gives more samples х10

13

Image classification

• Classification — CNN for image classification by document type (letters, invoices, business cards, etc) — extract document “features” for future ML models (transfer learning).

• Clusterization - clusterization is based on classification features.

TASKTo detect the class of processed document for further correct choice of image preprocessing scenario and set of recognition settings. And clusterization of documents is needed to process different classes with different flexible descriptions.

14

Image classificationExamples

A4 document Double page Receipt Contract Insurance policy Invoice

15

Image classificationNN architecture

Facts about this NN:CNN architecture18 layersTraining dataset: public datasets with different images10-15 000 real images of printed documents

16

Document objects detection (stamps, signatures, logos)

• Semantic document segmentation —CNN for semantic document segmentation

TASKTo improve the quality of document analysis through excluding from analysis the areas with stamps, signatures and logos.

To detect areas with color stamps and signatures to be able to filter color elements and to save black printed text for further OCR.

17

Document objects detection (stamps, signatures, logos).Examples (DA result without object detection)

18

• Incorrect text order, further natural language text analysis is impossible• Missed text – some text is inside picture block, cannot be recognized

Document objects detection (stamps, signatures, logos).Examples (DA result with object detection)

19

• Correct text order, further natural language text analysis is possible• All text is inside text blocks, can be recognized

Text Detection (on real environment scenes and ID documents)

• Text detection — CNN for real environment scenes (optimized model for mobile devices, 3.5 Mb) .

• Text detection - Modified architecture for ID documents, trained on various ID images.

TASKTo detect text on mobile photo to recognize itand identify it’s type for ID documents (Name, Date, Document number etc.) for data capture scenarios

20

Text Detection (on real environment scenes and ID documents).Examples

Original image Traditional approach DL approach

Original Image Traditional approach DL approach

Name

Surname Date

Document Number

21

Text DetectionNN architecture

Facts about this NN:• CNN architecture• 231 layers• 5 outputs make this CNN universal

Training data set:• 10 000 images ≈ 100 000 text regions• Augmentation during training gives more samples:

х 100-1000

Exit 1

AbcAbcAbcAbcAbc

Exit 2

AbcAbcAbcAbcAbc

Exit 3

AbcAbcAbcAbcAbc

Exit 4

AbcAbcAbcAbcAbc

Exit 5

AbcAbcAbcAbcAbc

22

Neural network barcode detection

We use mixed approach:

• Traditional barcode detector for the postcodes;

• CNN-barcode detector for other barcode types.

TASKTo improve the quality of barcode detection in 2 main scenarios:

- to sort the document stream using the barcode on the page as a feature;

- to find and learn the value of the barcode.

23

Neural network barcode detection

24

Australian_Pos

tAztec

Codabar

Code39

Code39.Cod

e 32

Code93

Code128

DataMatrix

EAN 8 EAN 13IATA 2

of 5

Industrial 2 of

5

Interleaved 2of 5

ISBN

ITF(USA_

Mortgage

KixMatrix2 of 5

MaxiCode

PDF417 PostnetQRCod

eRoyalmail

UCC128

UPC-A UPC-E

IntelligentMail(USPS-4CB)

Traditional approach 78% 95% 100% 98% 100% 99% 98% 72% 88% 96% 100% 96% 100% 95% 100% 94% 100% 96% 97% 78% 91% 74% 100% 87% 96% 90%

Mixed approach 78% 90% 100% 96% 100% 100% 100% 93% 100% 100% 100% 100% 100% 100% 100% 94% 100% 99% 99% 78% 98% 74% 100% 99% 100% 90%

0%

20%

40%

60%

80%

100%

120%

Detection: comparative testing between traditional approach (our current solution) and mixed approach (CNN + traditional for postcodes). Correctly found barcodes are depicted on the chart:

The detection in the full-cycle barcode processing in comparison with traditional approach results. Correctly found and recognized barcodes are depicted on the chart:

Australian_Pos

tAztec

Codabar

Code39

Code39.Cod

e 32

Code93

Code128

DataMatrix

EAN 8 EAN 13IATA 2

of 5

Industrial 2 of

5

Interleaved 2of 5

ISBN

ITF(USA_

Mortgage

KixMatrix2 of 5

MaxiCode

PDF417 PostnetQRCod

eRoyalmail

UCC128

UPC-A UPC-E

IntelligentMail(USPS-4CB)

Traditional approach 30% 63% 90% 85% 96% 79% 91% 61% 64% 77% 87% 63% 92% 66% 100% 69% 98% 91% 87% 41% 76% 59% 93% 38% 91% 79%

Mixed approach 31% 71% 92% 86% 98% 91% 93% 73% 73% 79% 89% 74% 94% 68% 100% 70% 98% 93% 87% 46% 79% 59% 93% 47% 96% 79%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1D example (Code128) 2D example (QRCode)

Neural network barcode detection

25

1D example (Code128) 2D example (QRCode)

Traditional approach examples

Mixed approach examples (traditional for postcodes + CNN for other barcode types)

Neural network barcode detection

26

Facts about this NN:• CNN architecture• 24 layers• Training dataset: 8000 real images • + augmentation during training gives more samples х10

Content

OCR

1. Architecture of OCR: segmentation, recognition, context analysis

2. Classical approach: word oversegmentation

3. CNN for hieroglyphs recognition

4. End-to-end word recognition without character separation

27

Architecture of OCR: segmentation, recognition, context analysis

OCR stages

Document Page Text block Text line Word Symbol

28

Classical approach for recognition: word oversegmentation

TASKImage ->text. We need to get the sequence of text symbols for the image of text line. Classic approach needs to segment line into words and word into characters.

29

CNN for hieroglyphs recognition

TASKImage ->text. We need to get the sequence of text symbols for the image of text line. We use the same base method –word is spited into characters. Convolutional neural networks approach allows us to integrate some new method of character recognition.

Real images

30

TwoStage CNN TwoStage CNNFast modeClassic approach

Classic approachFast mode

88,00%

90,00%

92,00%

94,00%

96,00%

98,00%

100,00%

300 350 400 450 500 550 600 650 700

Qu

alit

y (w

/o s

pac

e er

rors

, %)

Speed (char/sec)

Japanese results

CNN for hieroglyphs recognitionNN architecture

Facts about this NN:• Two stage CNN architecture• 7 layers for every net• ~8 000 symbols are divided into 100 clusters (groups of symbols)• Training dataset: 7700 real images for symbols• + 7000 synthetic images

31

End-to-end word recognition without character separation

TASKImage ->text. We need to get the sequence of text symbols for the image of text line. End-to-end neural networks approach allows us to pass the character separation stage.

Original image

Traditional approachwith word segmentation

End-to-end approach result

32

End-to-end word recognition without character separationNN architecture

Facts about this NNs:

• CNN + RNN (LSTM — Long short-term memory) architecture

• 37 layers• Training dataset: 100 000 real images (one

image = image of one word/fragment)• + 200 000 augmented images

33

End-to-end word recognition without character separationResults

34

Books

Docs

News/Magazines

Docs

Books

News/Magazines

94,00%

94,50%

95,00%

95,50%

96,00%

96,50%

97,00%

0 100 200 300 400 500 600 700 800 900

Qu

alit

y (w

/o s

pac

e er

rors

, %)

Speed (char/sec)

Arabic Results

Content

Document analysis and capture

1. DNN for Invoice Capture: field detection, line items

2. On-premise supervised learning for Invoices

35

DNN for Invoice Capture: field detection, line items

36

TASKTo improve Invoice capture OOTB

ML methodPre-trained NN for Invoices

Field:- field name: Total- field value: 1,287.39

Field:- field name: BSB- field value: 0320000

Field:- field name: Account- field value: 403762

Field:- field name: Vendor- field value: Nestle Australia Ltd

Field:- field name: TAX Invoice- field value: 1126609718

Field:- field name: Invoice Date- field value: 05/03/14

Line item 1

Table heading

Total line

DNN for Invoice Capture: field detection, line items

37

DNN for Invoice Capture: field detection

38

USA

Fields F-measure Fields count

InvoiceDate 0,89 3015

InvoiceNumber 0,86 2940

Total 0,83 3158

Test dataset – 3265 documents Test was made in March 2019

Traditional

approach

Fields

Invoice Number 0.76

Invoice Date 0.79

Total 0.83

Australia

Fields F-measure Fields count

InvoiceDate 0,94 2426

InvoiceNumber 0,87 2380

Total 0,83 2424

Test dataset – 2726 documents

Canada

Fields F-measure Fields count

InvoiceDate 0,89 553

InvoiceNumber 0,87 553

Total 0,87 548

Test dataset – 638 documents

Sweden

Fields F-measure Fields count

InvoiceDate 0,95 614

InvoiceNumber 0,91 558

Total 0,83 615

Test dataset – 628 documents

France

Fields F-measure Fields count

InvoiceDate 0,89 535

InvoiceNumber 0,85 536

Total 0,83 540

Test dataset – 568 documents

Germany

Fields F-measure Fields count

InvoiceDate 0,99 57

InvoiceNumber 0,98 57

Total 0,95 55

Test dataset – 122 documents

Great Britain

Fields F-measure Fields count

InvoiceDate 0,92 622

InvoiceNumber 0,93 615

Total 0,9 627

Test dataset – 646 documents

The algorithm automatically detects the type of the incoming invoice and relates it to those invoices that have already been edited.

What is the value?Verification results re-used to extract data from a new “similar” invoice gives an increase in the extraction quality.

On-premise supervised learning for InvoicesCombines pre-trained DL model with on-the-fly learning, so that local model can be updated in real time.

39

Content

NLP

1. Flexi NLP: Machine Learning on the customer side

2. Flexi NLP: Use cases

40

Natural language processing (NLP)* is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

What is NLP and why do we need it? Main NLP tasks

https://en.wikipedia.org/wiki/Natural_language_processing41

Bi-LSTM for Named Entity Recognition (NER)

NER - ORG NER - LOC

42

Semantic Analysis

Flexi NLP: Machine Learning on the customer side

Section extraction

Decision Tree

ensemble

Lexical Analysis

Entities

Votes for best variant

Text(e-mail, contact, news, corporate

charter…)

Bank opened its branch…

Splits text into words and sentences.

Executes morphological analysis

Understands text on the sentence level, and detects links between sentences.

Splits a document into relevant sections

43

Decision Tree

ensembleEntities

Decision Tree

ensemble

Entities

Semantic Analysis

Semantic Analysis

… …

Intents

Facts

Relationships

+ NLP Studio

NER

Name Entity Recognition

The State Bank of India (SBI) has announced that it has opened its first branch in South Korea.

Flexi NLP: Fact extraction

NER - ORG NER - LOC

NER: “Organization”, “Location” Semantic: Resolved anaphora (Bank – it)Semantic context: “Organization” opens a branch in “Location”

44

NLP “knows” a lot about each word in the sentence. This knowledge is fed into machine learning model and used as features.

Flexi NLP: Fact extraction for Contracts, Leases, Loans

45

TASKExtract Agreement Date, Party Name, Agreement Max Amount, Available Amount and etc.

USE CASEAudit, Contract migration

Segments:PreambleContract SubjectAvailable Amount

Attributes:Agreement DateParty NameAgreement Max AmountAvailable Amount

Flexi NLP: Fact extraction for News analysis

46

NER: “Organization” “Location” Semantic context: Who fined Whom Why

TASKExtract risk factors (positive and negative), e.g. fact of fraud of the business.

USE CASERisk management

THE CENTRAL BANK has fined Allied Irish Bank (AIB)

nearly €2.3 million for a series of anti-money laundering

and terrorist financing compliance failures.

The €2,275,000 fine relates to six breaches of the Criminal

Justice (Money Laundering and Terrorist Financing) Act

2010 (CJA 2010).

AIB had admitted to all six breaches.

Flexi NLP: Fact extraction for Healthcare

47

TASKExtract medication, adverse effects and relationships between them

USE CASEHealthcare: Research Article Analysis (pharmacovigilance)

Entities:

Age 59-year-old Gender Female Symptom Loss of muscle strengthMedication Simvastatin Dosage 40 mg PO qDay in the eveningMedication Fluconazole Dosage 150 mg PO as a single dose Symptom Rhabdomyolysis

Simvastatin

Fluconazole

Loss of muscle strength

Rhabdomyolysis

Connect medications and side effects:

Relationships

48

ABBYY AI

1. We create intelligent skills for a robot-clerk which is able to solve practical tasks.

2. The implementation of these skills allows to reduce the time for routine operations or to make balanced management decisions.

3. We combine Knowledge Engineering and Machine Learning approaches to reach the best result with the limited training data.

4. We use the following machine learning algorithms: Naive Bayes, SVM, Gradient Enhancement, Deep Learning, Evolutionary Algorithm.

Thank you

Questions &Answers