BSB Demo Day - Schlarb - Workflow-Design
-
Upload
impact-centre-of-competence -
Category
Documents
-
view
465 -
download
1
description
Transcript of BSB Demo Day - Schlarb - Workflow-Design
![Page 1: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/1.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Entscheidungsfindung in der Digitalisierung durch experimentelle Workflow-Entwicklung
Sven Schlarb, Austrian National LibraryIMPACT Demo Day
München, 11. Oktober 2011
![Page 2: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/2.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
2
OCR: Herausforderungen …I. Bildvorverarbeitung und OCR
![Page 3: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/3.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
3
OCR: Herausforderungen …II. Linguistische Nachbearbeitung (Gemischte Sprachen, Historische
Varianten, etc.)
Beispiel: Historische Varianten des Niederländischen Worts ‘wereld’(Welt):
werelt weerelt wereld weerelds wereldt werelden weereld werrelts waerelds weerlyt wereldts vveerelts waereld weerelden waerelden weerlt werlt werelds sweerels zwerlys swarels swerelts werelts swerrels weirelts tsweerelds werret vverelt werlts werrelt worreld werlden wareld weirelt weireld waerelt werreld werld vvereld weerelts werlde tswerels werreldts weereldt wereldje waereldje weurlt wald weëled
![Page 4: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/4.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
4
… und eine Vielfalt an Lösungen� 22 verschiedene ‘Werkzeuge’ von verschiedenen Entwicklern und
aus unterschiedlichen Work Packages � Unterschiedliche technische Umgebungen:
– OCR (C++, C#),
– Bildverarbeitung & Lexika (C, C++, DLL),
– Kommandozeilenprogramme (Windows/Linux),
– Java, Ruby, PHP, Perl, etc.
� IMPACT Interoperability Framework (IIF)
![Page 5: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/5.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
5
Technische Herausforderungen� Skalierbarkeit
– Umfang der Eingabedaten (Einzelne Seiten / tausendeBücher/Zeitungen)
– Größe der Eingabedaten (z.B. sehr hochauflösende Bilder)� Stabilität
– Parallelisierung – Geklonte Knoten → Gleiches Verhalten?– Failover – Alternative Knoten bei Fehlern– Korrekte Funktionsweise der Einzelkomponenten
� Transparenz– Verständliche Fehlermeldungen während der Stapelverarbeitung
auf den verschiedenen Architekturebenen (Werkzeug-, Service-, Workflowebene)
![Page 6: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/6.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
6
Experimentelle Workflow-Entwicklung
� Beispieldaten online verfügbar → Reproduzierbarkeit
� Workflows unmittelbar ausführbar → Vergleichbarkeit
� Workflow-Entwicklung als eine gemeinsame, institutionsübergreifende
Aktivität → Annotation, Bewertung
� „Auf-einen-Blick“-Darstellung des Workflows
� Auffindbarkeit von Komponenten und Workflows, und Workflow-
Fragmenten
� Zentraler Ergebnisdatenspeicher
![Page 7: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/7.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
7
Interoperability Framework� Interoperabilität vs. Integration� Web-basiert vs. lokale Applikation/Plattform
� Java 6� Apache Tomcat� Apache Axis2� Apache Synapse (optional)� Taverna Workflow Engine
![Page 8: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/8.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
8
Tool Wrapper
Anforderung: Werkzeug als Kommandozeilenprogramm verfügbar
Tool wrapper code im Github Repository der Open Planets Foundation (OPF) verfügbar:
https://github.com/openplanets/scape/tree/master/xa-toolwrapper
� Minimaler Integrationsaufwand für Werkzeug-Entwickler
![Page 9: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/9.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
9
Service Oriented Architecture� Java als
Programmiersprache
� Standard Apache Komponenten
� Synapse als Enterprise Service Bus (load balancing & fail over)
� HTTPS Verschlüsselung& Basic Auth
� Minimaler Aufwand für das Komponenten-Deployment
![Page 10: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/10.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
10
Verknüpfung von Einzelkomponenten zu einem
„Workflow“
![Page 11: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/11.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
11
Workflow-Entwicklung
� OCR workflow = Datenverarbeitungspipeline
� Komponenten =
Verarbeitungsschritte(knoten)
![Page 12: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/12.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Workflow-Komponenten
� “Basic” workflow = Minimal-Komponente für ein IMPACT-Werkzeug
� Gut dokumentiert, Beispieldaten vorhanden, ausführbar
![Page 13: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/13.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
13
Workflow Management� Komponenten-Verzeichnis: myExperiment
� Localer Client: Taverna Workbench
� Web Client: Projekt Website
![Page 14: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/14.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Workflow-Verzeichnis
� Komponenten und Workflowsveröffentlichen
� Bewerten, Taggen, Kommentieren, ...
� Verweise auf verwendete Komponenten und Workflows anderer Nutzer
![Page 15: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/15.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Komponenten-Katalog?
Tool
Bitonal imageInput and output
binary image, but incompatible
Bitonal image
GetImageFromURL
URL String
Bitonal imageRGB Image
RGB Image
How to find the corresponding tool?
How to proceed in case of a Gap?
Viele Fehler unterlaufen, weil Anforderungen an Eingabe- und Ausgabedaten nicht ausreichend spezifiziert (formalisiert!) sind.
![Page 16: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/16.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Local client: Taverna Workbench
http://www.taverna.org.uk/
� Hintergrund: Bioinformatik
� Entwickelt vonmyGrid, Manchester
� Verfügbar für Windows/Linux/OSXals Open Source
![Page 17: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/17.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
17
Workflowentwicklung in Taverna� Workflows lassen sich
einfach aus verfügbaren Komponenten und Workflows erstellen (drag and drop)
� Hinweis: Komplexität limitiert →Zusammengehörende Arbeitsschritte in Komponente zusammenfassen
![Page 18: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/18.jpg)
![Page 19: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/19.jpg)
![Page 20: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/20.jpg)
![Page 21: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/21.jpg)
![Page 22: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/22.jpg)
![Page 23: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/23.jpg)
![Page 24: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/24.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Web client: Taverna Server/Workflow Parser
� SOAP/REST API
� Entfernte Workflowausführung durch Übergabe der XML-Instanz
![Page 25: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/25.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
25
Use case: Workflows für die Evaluation
� Werkzeug A vs. Werkzeug B (Werkzeug A(v1) vs Werkzeug A(v2))
� Workflow X (Werkzeug A + B) vs Workflow Y (Werkzeug A + C)
� Optimaler Workflow mit Bezug auf das Quellmaterial ermitteln
![Page 26: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/26.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
26
Zentraler Ergebnisdatenspeicher
Schnittstelle zur Speicherung von Ergebnisdaten (WebDAV) und zurBerichterstellung (Apache POI) als Workflow-Modul realisiert
![Page 27: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/27.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
27
Workflows in laufenden Projekten
� Workflows in der Digitalisierung � IMPACT
� Workflows in der Linguistischen Analyse � CLARIN
� Workflows in der Langzeitarchivierung � SCAPE
� Und viele mehr ...
![Page 28: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/28.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
28
Kompatibilität der Workflow-Frameworks
� Beispiel: UIMA ↔ Taverna� Eigennamenextraktion → Linguistische Analyse → Semantic Web� Digitalisierung, OCR → Langzeitarchivierung
![Page 29: BSB Demo Day - Schlarb - Workflow-Design](https://reader033.fdocuments.us/reader033/viewer/2022051411/547b1df3b47959a4098b4d5e/html5/thumbnails/29.jpg)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Danke! Fragen?