Khmer ocr itc

8
Khmer OCR LONG Seangmeng Lecturer and researcher, GIC - ITC [email protected] 17/02/2012 1

Transcript of Khmer ocr itc

Page 1: Khmer ocr itc

1

Khmer OCR

LONG SeangmengLecturer and researcher, GIC - ITC

[email protected]/02/2012

Page 2: Khmer ocr itc

2

Khmer OCR

• OCR System• Khmer OCR Project• State of the Art• Work Done• Current Work

Page 3: Khmer ocr itc

3

OCR System

OCR

Page 4: Khmer ocr itc

4

Khmer OCR Project

• 2011-2012• Team– 1 researcher– 1 intern student (5th year)

• Develop a Khmer OCR system– Font independent– Size independent

Page 5: Khmer ocr itc

5

State of the ArtAuthor Limitation Result

C. Chey, P. Kumhom and K. Chamnongthai

10 characters (បពជកភណឃសវទ)

92%

C. Chey, P. Kumhom and K. Chamnongthai

20 fonts 92.85% (size 22)91.66% (size 18)89.27% (size 12)

L. Ing and A. Muaz Limon R1 22 98.88%

V. Kruy Font and size independent 97%

Tesseract• Top 3 engines in 1995• Most accurate open source OCR engine

Page 6: Khmer ocr itc

6

Work Done

• Training Tesseract for Khmer font– Khmer OS font– 2210 character clusters

– 11 MB• Problems– Some characters not detected – Some characters misdetected

Page 7: Khmer ocr itc

7

Current Work

• Improve works done by Vanna Kruy– Improve performance– Create an easy-to-use GUI– Make it easy to add new fonts

Page 8: Khmer ocr itc

8

Thanks for your attention!

Questions???