The IMPACT Interoperability Framework - Workflows for OCR and beyond
-
Upload
cneudecker -
Category
Technology
-
view
82 -
download
0
description
Transcript of The IMPACT Interoperability Framework - Workflows for OCR and beyond
![Page 1: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/1.jpg)
The IMPACT Interoperability Framework: Workflows for OCR and beyondClemens Neudecker, KB National Library of the Netherlands
2nd IMPACT Conference, British Library, London 24/25 October 2011
![Page 2: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/2.jpg)
Background
> 20 individual software components for specific challenges
Prototyping new algorithms, improving commercial solutions
Different frameworks (C, C++, Java, etc.), platforms (Win/Linux)
Extensible with 3rd party applications
IMPACT Interoperability Framework (IIF)
![Page 3: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/3.jpg)
Architecture
Java
Web Services
Apache
Taverna
Open Source available on https://github.com/impactcentre
Free Hackathon 14/15 November, University of Manchesterhttp://impact-mygrid-taverna-hackathon.wikispaces.com/
![Page 4: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/4.jpg)
Integration
Only requirement:command line executable
Generic command line wrapperproduces web service
Web service exposed as workflow module withdocumentation
Quick & easy integration: developers can focus on their application and have to worry less about integration = higher quality software
![Page 5: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/5.jpg)
Workflows OCR workflow =
data pipeline
Building blocks = processing modules (nodes)
Integration = interaction between nodes (mashups)
Collaboration with
![Page 6: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/6.jpg)
![Page 7: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/7.jpg)
Evaluation features Text comparison of result with ground truth,
using Levenshtein distance method Word evaluation (with reading order) Layout based comparison of result with ground truth,
using the Page Analysis And Ground Truth Elements Framework
![Page 8: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/8.jpg)
Community
Web2.0 style workflow registry
Ready-to-use and documented resources
Community of experts
Sharing of experimentsand know how
![Page 9: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/9.jpg)
Local client: Taverna Workbench
Background: BioSciences
Developed and maintained bymyGrid, UK
Open source
GUI for design and execution of web services & workflows
![Page 10: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/10.jpg)
Remote client: Portal
SOAP/REST API Remote execution of web services & workflows
![Page 11: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/11.jpg)
Results Repository
Custom service for IMPACT:
automatic storage of
workflow outputs and
provenance via WebDAV Fully interoperable,
since HTTP-based Configurable storage of
result sets Create reports using POI
![Page 12: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/12.jpg)
Scalability
Central ESB proxy manages multiple service copies
Process parallelization,Load distribution,Fail over, Security
Served >2M requests
Throughput improvements of 94% with every additional instance
Tested on Dutch Supercomputing Cloud (“Enlighten Your Research”)
![Page 13: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/13.jpg)
Outlook
Online service for testing/evaluation Specification & Guidelines
Extending the scope:Workflows for linguistic analysis: CLARINWorkflows for preservation: SCAPE
Even better scalability: Map/Reduce
Supported by a community of developers & practitioners
![Page 14: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/14.jpg)
![Page 15: The IMPACT Interoperability Framework - Workflows for OCR and beyond](https://reader033.fdocuments.us/reader033/viewer/2022052904/557c00f7d8b42a72228b5294/html5/thumbnails/15.jpg)
“Anyway, the thing about progress is that is always seems greater than it really is.”
Ludwig Wittgenstein, Philosophical Investigations (quoting Johann Nestroy)
xkcd.com/688