LinuxTag 2011 Lab Manager Light - Self Service Virtualization as a Private Cloud
OCRFeeder LinuxTag 2011
-
Upload
joaquim-rocha -
Category
Technology
-
view
1.380 -
download
1
description
Transcript of OCRFeeder LinuxTag 2011
![Page 1: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/1.jpg)
static void_f_do_barnacle_install_properties(GObjectClass
*gobject_class){
GParamSpec *pspec;
/* Party code attribute */ pspec = g_param_spec_uint64
(F_DO_BARNACLE_CODE, "Barnacle code.", "Barnacle code",
0, G_MAXUINT64,
G_MAXUINT64 /* default value */,
G_PARAM_READABLE | G_PARAM_WRITABLE |
G_PARAM_PRIVATE);
g_object_class_install_property (gobject_class,
F_DO_BARNACLE_PROP_CODE,
Joaquim [email protected]
OCRFeeder
Converting printed documents into digital formats
Berlin, May 2011
![Page 2: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/2.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
What is it?
Document Analysis and Optical Character Recognition
for GNOME
![Page 3: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/3.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Why?
Paper has a number of problems
No applications for GNU/Linux to do a fair job
![Page 4: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/4.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Paper problems:Security
CC Photo by: http://www.flickr.com/photos/badwsky/
![Page 5: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/5.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Paper problems:Preservation
CC Photo by: http://www.flickr.com/photos/98469445@N00/
![Page 6: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/6.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Paper problems:Data processing
CC Photo by: http://www.flickr.com/photos/hugovk/
![Page 7: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/7.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Paper problems:Ecology
CC Photo by: http://www.flickr.com/photos/pranavsingh/
![Page 8: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/8.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
No fair conversion apps for GNU/Linux
apart from OCR engines, but...
![Page 9: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/9.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
OCR != Document Conversion
(it only deals with chars)(does not consider the layout)(does not distinguish contents)
![Page 10: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/10.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
What's needed is
Document Analysis and Recognition
(conversion of documents to an electronic format)
(first projects in the 80s)
![Page 11: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/11.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Where are were we at?
* Some closed solutions* Only for proprietary systems
* Various prices* still... arguable results
![Page 12: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/12.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
How
![Page 13: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/13.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
So many layouts...
CC Photo by: http://www.flickr.com/photos/uber-tuber/
![Page 14: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/14.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Layouts vary with the type of document
What works on detecting one, won't work on others
![Page 15: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/15.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
OCRFeeder focuses on contents, not on layouts!
![Page 16: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/16.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Key concept:
If a document image can be divided in windows of 1 (content)
or 0 (not content), then it is possible to group all the
1s and outline the contents
![Page 17: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/17.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
![Page 18: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/18.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Recognition:
System-wide OCR engines are used
Engines are configured from the GUI or XML files
![Page 19: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/19.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
![Page 20: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/20.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Most known free OCR engines are detected and configured
automatically:
* Tesseract* GOCR
* OCRAD* Cuneiform
![Page 21: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/21.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Exportation formats:
ODTHTML
Plain text
![Page 22: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/22.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
User interaction:
Users can edit everythingand review the algorithm's results
So, UI can work in attended and unattended ways
CLI only works in an unattended mode
![Page 23: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/23.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
![Page 24: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/24.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Demo time!
![Page 25: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/25.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Other features:
* PDF importation* Unpaper preprocessor
* Font style edition* Image deskewing
* OCR results cleaning* Project saving/loading
![Page 26: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/26.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
A11y:
* OCRFeeder is a very useful tool for visually impaired users
* Last year, the main target of its development was to improve a11y
![Page 27: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/27.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Future:
* Integrate Ocropus as an alternative analysis backend
* More exportation formats: HOCR, PDF, etc.
* Make OCR engines' management easier
![Page 28: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/28.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Webpage:http://live.gnome.org/OCRFeeder
git:http://git.gnome.org/ocrfeeder
Bugzilla:http://bugzilla.gnome.orgproduct: OCRFeeder
![Page 29: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/29.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Manual in German:
http://wiki.ubuntuusers.de/OCRFeeder
![Page 30: OCRFeeder LinuxTag 2011](https://reader034.fdocuments.us/reader034/viewer/2022051818/54bb77f64a795937348b45a5/html5/thumbnails/30.jpg)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Thank you!