LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout....
Transcript of LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout....
![Page 1: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/1.jpg)
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou
KDD 2020
![Page 2: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/2.jpg)
Outline 1. Background
2. Motivation
3. Method
4. Experiments
5. Conclusion
![Page 3: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/3.jpg)
1. Background
![Page 4: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/4.jpg)
Document Understanding in Real World
Form Receipt Report Invoice
Born-digital DocumentsScanned Documents
Visually-rich Documents
![Page 5: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/5.jpg)
Preprocessing
� Scanned documents� File format: .jpg, .png, …� Toolkit: Optical character recognition, a.k.a. OCR � Open source tools: Tesseract
� Born-digital documents� File format: .docx, .pdf, .pptx, …� Toolkit: DOCX parser, PDF parser, …� Open source tools: python-docx, pdfminer, PyMuPDF
Documents
OCR Toolsor
Specific Parser
Semi-structured Data
![Page 6: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/6.jpg)
Typical Document Understanding Task
Key Value
TO Lorillard Corporation
ADDRESS 666 Fifth Avenue
CITY New York
… …
Key Value
Total 4.95
Company StarBucksStore
Address 11302 Euclid Avenue
Cleveland, OH
Date 12/07/2014
Category: Form
Form Understanding Receipt Understanding Document Image Classification
![Page 7: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/7.jpg)
Sequence Labeling
CRF LSTM
LSTM+CRF BiLSTM+CRFHuang, Zhiheng et al. “Bidirectional LSTM-CRF Models for Sequence Tagging.” ArXiv abs/1508.01991 (2015).
![Page 8: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/8.jpg)
Graph Convolution for Multimodal Information Extraction from Visually Rich Documents
� Propose a graph convolution based model to combine textual and visual information.
� Combine graph embedding with text embedding using a standard BiLSTM-CRF model.
Liu, Xiaojing et al. “Graph Convolution for Multimodal Information Extraction from Visually Rich Documents.” NAACL-HLT (2019).
Examples of VRDs and example entities to extract.
![Page 9: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/9.jpg)
Document Modeling
� Model each document as a full-connected graph of text segments
� Document D is a tuple (T, E), where 𝑇 =𝑡!, 𝑡", … , 𝑡# , 𝑡$ ∈ 𝑇 is a set of n text nodes
� 𝑅 = 𝑟$!, 𝑟$", … , 𝑟$% , 𝑟$% ∈ 𝑅 is a set of edges
� E = 𝑇×𝑅×𝑇is a set of directed edges of the form 𝑡$ , 𝑟$% , 𝑡%
Liu, Xiaojing et al. “Graph Convolution for Multimodal Information Extraction from Visually Rich Documents.” NAACL-HLT (2019).
Document graph
![Page 10: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/10.jpg)
Feature Extraction
� Edge Embedding 𝑟$% = [𝑥$% , 𝑦$% ,&!'!,'"'!,&"'!], where
� 𝑥$% and 𝑦$% are horizontal and vertical distance between the two text boxes
� 𝑤$ and ℎ$ are the width and height of the corresponding text box.
Liu, Xiaojing et al. “Graph Convolution for Multimodal Information Extraction from Visually Rich Documents.” NAACL-HLT (2019).
![Page 11: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/11.jpg)
Popular BERT and his Family
� Contextual embedding� Pre-training technique
![Page 12: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/12.jpg)
BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding� Incorporate contextualized embedding into the grid document
representation
Denk, Timo I. and Christian Reisswig. “BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding.” Document Intelligence Workshop at NeuriPS (2019).
![Page 13: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/13.jpg)
2. Motivation
![Page 14: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/14.jpg)
Motivations
1. Previous work: contextual text embedding + non-contextual spatial information
2. Local invariance in document layout3. Extra information in visually rich documents
![Page 15: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/15.jpg)
Be Contextual
Problem: contextual text embedding + non-contextual spatial information
Contextualizing spatial information to represent local invariance
![Page 16: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/16.jpg)
Local Invariance in Document Layout
� Relative positions of words in a document contribute a lot to the semantic representation.
� Local Invariance� Key-value layout: left/right or up/down� Table layout: grid
� Pre-training technique will utilize the local invariance and better align the layout information with the semantic representation.
![Page 17: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/17.jpg)
Visual Feature in Document Style
� Document-level� the whole image can indicate the document layout
� Word-level� visual features, styles such as bold, underline, and
italic
![Page 18: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/18.jpg)
Insufficient and Expensive Labeled Data
Massive unlabeled documents Few labeled documents
![Page 19: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/19.jpg)
Pre-training Techniques
Self-supervised training on large amounts of text.
Supervised training on a specific task with labeled data.
![Page 20: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/20.jpg)
Language Understanding
Text-only feature
Document Image Understanding
Text feature
Layout feature
Style feature
…
![Page 21: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/21.jpg)
Goals
1. 2D Language Model: contextual text embedding + contextual spatial information
2. Modeling and pre-training local invariance in document layout3. Utilizing visual information in visually rich documents
![Page 22: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/22.jpg)
3. Method
![Page 23: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/23.jpg)
LayoutLM Architecture
![Page 24: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/24.jpg)
2-D Position Embedding
BERT
LayoutLM
![Page 25: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/25.jpg)
Image Embedding
![Page 26: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/26.jpg)
Pre-training for LayoutLM
• Masked Visual-Language Model
Input Date MASK January 11, 1994 Contract MASK 4011
TextEmbeddings
PositionEmbeddings (x0)
PositionEmbeddings (y0)
PositionEmbeddings (x1)
PositionEmbeddings (y1)
E(86) E(117) E(227) E(281) E(303) E(415) E(468) E(556)
E(138) E(138) E(138) E(138) E(139) E(138) E(139) E(139)
E(112) E(162) E(277) E(293) E(331) E(464) E(487) E(583)
E(148) E(148) E(153) E(148) E(149) E(149) E(149) E(150)
+ + + + + + + +
+ + + + + + + +
+ + + + + + + +
+ + + + + + + +
E(Date) E(Routed:) E(January) E(11,) E(1994) E(Contract) E(No.) E(4011)
0000
E(589)
E(139)
E(621)
E(150)
+
+
+
+
E(0000)
[CLS]
E(0)
E(0)
E(maxW)
E(maxH)
+
+
+
+
E([CLS])Text
Layout
![Page 27: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/27.jpg)
Pre-training for LayoutLM
• Document Image Classification
Input Date Routed: January 11, 1994 Contract No. 4011
TextEmbeddings
PositionEmbeddings (x0)
PositionEmbeddings (y0)
PositionEmbeddings (x1)
PositionEmbeddings (y1)
E(86) E(117) E(227) E(281) E(303) E(415) E(468) E(556)
E(138) E(138) E(138) E(138) E(139) E(138) E(139) E(139)
E(112) E(162) E(277) E(293) E(331) E(464) E(487) E(583)
E(148) E(148) E(153) E(148) E(149) E(149) E(149) E(150)
+ + + + + + + +
+ + + + + + + +
+ + + + + + + +
+ + + + + + + +
E(Date) E(Routed:) E(January) E(11,) E(1994) E(Contract) E(No.) E(4011)
0000
E(589)
E(139)
E(621)
E(150)
+
+
+
+
E(0000)
[CLS]
E(0)
E(0)
E(maxW)
E(maxH)
+
+
+
+
E([CLS])Text
Layout
![Page 28: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/28.jpg)
Pre-training Data
11 million scanned document images from IIT-CDIP Test Collection 1.0 https://ir.nist.gov/cdip/
![Page 29: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/29.jpg)
4. Experiments
![Page 30: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/30.jpg)
Downstream Tasks
Form Understanding
Receipt Understanding
Document Image Classification
![Page 31: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/31.jpg)
Form Understanding with LayoutLM[Task] Sequence labeling (B-I-O class labels) for key-value from forms[Data] 149 training, 50 testing[Metric] Precision, Recall, F1[Baseline] Pre-trained BERT and RoBERTa
FUNSD: Form Understanding in Noisy Scanned Documentshttps://guillaumejaume.github.io/FUNSD/
![Page 32: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/32.jpg)
Form Understanding with LayoutLM
![Page 33: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/33.jpg)
Receipt Understanding with LayoutLM
"company": "STARBUCKS STORE #10208","date": "12/07/2014","address": "11302 EUCLID AVENUE, CLEVELAND, OH (216)
229-0749","total": "4.95",
ICDAR 2019 Robust Reading Challenge on Key Information Extraction from Scanned Receiptshttps://rrc.cvc.uab.es/?ch=13&com=tasks
[Task] Sequence labeling (B-I-O class labels) for values from receipts[Data] 626 training, 347 testing[Metric] Precision, Recall, F1[Baseline] Pre-trained BERT, RoBERTa
![Page 34: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/34.jpg)
Receipt Understanding with LayoutLM
![Page 35: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/35.jpg)
Document Image Classification with LayoutLM
[Task] Image Classification (16 classes) [Data] RVL-CDIP dataset (320K training, 40K validation, 40K testing)[Metric] Accuracy[Baseline] InceptionResNetV2, LadderNet, Multimodal
https://www.cs.cmu.edu/~aharley/rvl-cdip/
![Page 36: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/36.jpg)
Document Image Classification with LayoutLM
![Page 37: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/37.jpg)
Different Data and Epochs
![Page 38: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/38.jpg)
Different Initialization Methods
![Page 39: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/39.jpg)
Visualization: Table Detection Task on DocBank
BERT LayoutLM BERT LayoutLM
Error Correct Ground Truth
Li, Minghao et al. “DocBank: A Benchmark Dataset for Document Layout Analysis.” ArXiv abs/2006.01038 (2020).
![Page 40: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/40.jpg)
5. Conclusion
![Page 41: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/41.jpg)
• LayoutLM• 1st document-level pre-trained model using text and layout• Support different downstream tasks
• Form/Invoice understanding• Receipt understanding• Document image classification
• Paper: https://arxiv.org/abs/1912.13318• Code: https://aka.ms/layoutlm
LayoutLM
![Page 42: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/42.jpg)
How to conduct research as an undergraduate?
![Page 43: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/43.jpg)
My suggestions
1. Being self-motivated and hard-working2. Doing well in math and programming courses3. Finding a group/professor/graduate student4. Getting involved in a research project
![Page 44: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/44.jpg)
Working with a Professor/Graduate Student
• Clear goal• A topic or an idea• Conference deadline
• Weekly one-to-one meeting• Progress report: reading, codes, results
![Page 45: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/45.jpg)
More Advice
• How to Do Research With a Professor?• Jason Eisner, CS professor at Johns Hopkins University, ACL Fellow• http://www.cs.jhu.edu/~jason/advice/how-to-work-with-a-professor.html
• How undergraduates can make successful research (in Chinese)• Minlie Huang, CS professor at Tsinghua University• http://coai.cs.tsinghua.edu.cn/hml/media/files/undergraduate-res.pdf
![Page 46: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/46.jpg)
Life at MSRA
Novel Topic/Idea
• Mentorship• Diverse research area
Computing Resource
• Azure Machine Learning
Programming Skill • Research & Develop
Conditions of Good Research
![Page 47: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/47.jpg)
Acknowledgement
![Page 48: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/48.jpg)
Acknowledgement: MSRA NLC Group
Ming Zhou Lei CuiFuru Wei
![Page 49: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/49.jpg)
UniLM Family: https://github.com/microsoft/unilm
� UniLM(v1@NeurIPS'19 | v2@ICML'20): unified pre-training for language understanding and generation
� MiniLM(arXiv'20): small pre-trained models for language understanding and generation
� LayoutLM (v1@KDD’20): multimodal (text + layout/format + image) pre-training for document understanding
� s2s-ft: sequence-to-sequence fine-tuning toolkit
![Page 50: LayoutLM: Pre-training of Text and Layout for Document ...Toolkit: DOCX parser, ... [CLS]) Layout. Pre-training for LayoutLM •Document Image Classification ... Doing well in math](https://reader033.fdocuments.us/reader033/viewer/2022053120/60a4ca5a2e0d2026aa64db56/html5/thumbnails/50.jpg)
© Copyright Microsoft Corporation. All rights reserved.
Thank you for listening