CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the...

24
CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the...

Page 1: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

CLARINWP2 Tech Infra

WP2 Breakout 2008

open discussion about all aspects my slides just the pacemaker

Page 2: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

Focus

• CLARIN Technical Infrastructure

• must offer an open market place for “ALL” resources and tools • needs to be the super-market in LRT area • must be open, extendible and web-based • must not be bound by decisions of external groups • must build upon experience of lots of experts and work already done • must re-use components that are out there and that fit into the open policy • must not be fiction – but more important is sustainability• must be coherent with administrational/organizational/financial constraints • all code to be developed must be open source and free to use (academics)

• CLARIN TI is about integration and interoperability of existing LRT • CLARIN TI is about scaling up – so not so much principally new ideas

Page 3: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

Preparation Phase

• EC funds: in prep phase (3 years) only piloting and tests to allow cost estimation

• national funds: at the end some demo cases • evaluation: probably already in 2 years national evaluations • necessary: early domain wide services at the Web-site

(repository/archiving, ISOcat, PID, translation, etc)

• development contribution in WP2 widely by national funds – real money • who has got them so far?

• tasks of WP leaders• all writing and overhead by WP leaders (except financial admin)• all interaction national-EC level by WP leaders• clear separation between EC level and national level activities • all reports/deliverables subject of extensive discussion in EB• special workshops and seminars planned • WP members should be able to just focus on content

• real work organized in “small” working groups • when national funds are available WG leaders from other sites than MPI

Page 4: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

Institutes / Persons

MPI: Peter Wittenburg, Dieter van Uytvanck, Daan Broeder, Marc Kemps-Snijders INL: Jeannine Beeken, Remco van Veenendaal ELDA: Khalid Choukri, Viktoria Arranz DFKI: Thierry Declerck ILC: Nicoletta Calzolari, Ricardo del Gratta WROCUT: Maciej Piasecki, Bartosz Broda OTA: Martin Wynne UPF: Nuria Bel, Anna Guardiola, Santiago Bel ILSP: Stelios Piperides, Maria Gavrilidou RACAI: Dan Tufis, Epapadat? USFD: Wim Peters, Adam Funk Helsinki: Tero Aalto (CSC), (Kimmo Koskienniemi) Lund: Sven Strömqvist Leipzig: Gerhard Heyer Latvia: Inguna Skadina Leuven: Ineke Schuurman Utrecht: Jan Odijk, (Steven Krauwer)

UniVie: Gerhard Budin, Csilla Bornemisza Tübingen: Erhard Hinrichs, Lothar Lemnitzer, Andreas Witt

Page 5: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WP2 Tasks/WGs

• WG2.1: Centers Network Formation MPI T0 d=rep/T6 types of services, types of centers, requirements• WG2.2: Federation Foundation MPI T0 d=rep/T6

LRT requirements, architecture, schemas, criteria, selection• WG2.3: Federation Building MPI T6 d=dev/T18/36

analysis, component installation + adaptation, agreements• WG2.4: Registry Requirements MPI T0 d=rep/T9

experiences, new requirements, model+schema, ISO• WG2.5: Registry Infrastructure MPI T6 d=dev/T18/36

design, building of components, integration, testing• WG2.6: Web-Services and Workflow Requirements ? T6 d=rep/T12/24

analysis, experiences, requirements• WG2.7: WS and WF Creation ? T12 d=dev/T24/36

encapsulation methods, services development, WF test tool• WG2.8: Service & Application Building ? T24 d=dev/T36

show cases, cross-searching, LREP interaction• WG2.9: Cost estimates MPI T21 d=rep/T24/36

for construction + operation phase

Page 6: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WP Interaction

• WP2 – WP5: close interaction required • WP2 all aspects that have to do with LRT as a whole • WP5 all internal LRT aspects • WP2 more the IT side – WP5 more the linguistic side • WP2 the specifications, frameworks etc – WP5 the integration

• WP2 – WP7: close interaction required• smooth LRT domain only when licensing and trust agreements settled • WP2 all specifications from technical/infrastructure perspective• WP7 all agreements from a formal/legal perspective

• WP2 – WP3/4: WP2 needs to listen to requirements and wishes • WP2 will provide infos about possibilities/plans etc

• WP2 – WP6: WP2 (all partners) needs to provide info for dissemination • WP2 should participate in national training programs

• WP2 – WP8: WP2 needs to listen to constraints• WP2 needs to contribute with estimates (costs etc)

Page 7: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.1 Service Centers

• need to add a persistent infrastructure layer on top of the landscape formed by accidental and temporary collaborations that is easily accessible for everyone and that offers high availability so that humanities scholars can rely on it• perhaps different types of centers dependent on the services they give• fundamental deal: researchers give their babies and get seamless access to more • centers need to change their attitudes – they have to offer a true service mentality, a new form of openness and technical accessibility and little burocratic overhead• access suitable for humanities research – unpredictable access patterns

Page 8: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.1 Types of Services

• LR Services • uploading and integration of new resources/versions • long-term data preservation • curation of LR (immediate conversion, ...) • allowing access to LR via web services • building virtual collections• finding appropriate tools

• LT Services• offer web services to execute LT • allow to combine LT to larger workflows (chains of operations)• conversion services/translation services/ ....

• Infrastructure Services• ISOcat services for concepts, terminology, relations • PID service for unique and persistent identifiers • metadata services (all sorts of usages, all sorts of LRT)• etc

• Advisory Services (WP6) – where to locate general advice???

Page 9: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.1 Requirements

• determine types of centers dependent on types of services • need a taxonomy of services (WP5)

• determine together with WP3/4 what the required “business” models are • what are humanities researchers expecting – it’s much more than NLP!• searching for patterns in large virtual collections, easy application of tools • how much and which overhead is allowed to not hamper innovation • nevertheless proper handling of IPR

• determine technical requirements for service centers • example: proper repository/archiving system

• determine general requirements• what is the national support• what kind of expertise is around • what is the size and capability of the staff • geographic spreading – political aspects

• prepare a call for participation in a first network of centers (together with WG2.2)• analyze all applications and determine local situation • make a selection based on criteria

Page 10: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.2/3 LRT Federation

• what are we going for?• alternative to Google model of “centralized” data management and ownership • distributed but nevertheless coordinated model that

• can act as juridical person to make contracts • can interact with LRT providers about licenses etc (goal is simplification)• can interact with national identity federations to establish trust relations and make deals

• if we don’t organize ourselves others will take over control about scientific data • competition requires efficient operations

national Identity Federations

eJournal Service Providers

LRT Service Providers

TrustAgreement

TrustAgreements

Schema

Page 11: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.2 Federation Requirements

• Technical Requirements • joint metadata registry for resources and tools based on long experience (see WG2.4/5)• support for virtual collections with resources from different archives based on metadata and for combining services to more complex operations • unique way of referencing electronic resources and fragments in federation • single sign-on/identity principles in federation • all based on trusted and signed certificates (quality of certificates)

• Trust Relations (together with WP7)• trust agreements with national identity federation CLARIN needs to build a federation based on simplified and unified rules for licensing, accessing, user authentication etc • what kind of auditing is required?

Page 12: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.2 Work

• understand the requirements of LRT providers wrt to a federation• interact with TERENA, national IF and organizations to understand the trends (attribute sets and values, schemas, architecture)• usage of attributes in debate (mostly EduPerson, but different MPG requirements)• grid integration to get applications under the distributed AA scheme (NL project)

• what are the trust agreements in federation – also a matter of licenses (WP7)

• criteria for centers to participate in federation (see DAM-LR) (strong enough, nat. support, appropriate staff, national grid support, ...)• launch a call for participation in first round • check the situation of all applicants – can’t take everyone in preparatory phase • check includes the repository and service structure • promised to have at least 10 participating centers

Page 13: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.3 Federation Building

• follow up WG with the selected centers mainly

• deep analysis of local repository, service and authentication structure • get certificates and PKI in work • make training courses for local experts • install components (repository, MD infra, Handle System for PID)• install main AAI component Shibboleth and set up schema etc (help from CC)• adapt, test and integrate all components

• make some agreements with national IF (WP7)

• procedure in two steps – lot of detail work• first the “safe” candidates• later the other candidates

Page 14: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.4/5 LRT Registry

• joint metadata registry for resources and tools based on long experience • Metadata is an increasingly important research resource• Metadata is part of a proper registry mechanism

• need to do a deep analysis of current practices and trends• domain crossing initiatives: DC, OAI PMH• working distributed infrastructures IMDI, OLAC, … in the domain• projects and organizations: ELDA catalogue, TEI usage, CHILDES, etc.• initiatives for tools: DFKI tool registry• technologies at web service level (UDDI, ebXML, WSDL, etc.)• importance of ontology based IS like “LT World”.

• know the current limitations• coverage is still too little in LRT domain • in appropriate descriptions, fixed schemas, non-suitable vocabularies• hardly any localization • little customization (virtual collections, dynamic abstraction, faceted search, ...) • hardly any use or support for PIDs• not suitable for web services domain

Page 15: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.4 Registry Requirements

• deep analysis of current practices and their limitations

• determining the essentials• concept registration and PIDs are essential - schemas less so • integration of relevant concepts in ISOcat and creation of relations • requirements for specific resource types and sub-disciplines • what is required to extend to LRT web services • how to integrate the huge amount of legacy material• get a proper LRT taxonomy as basis (WP5) • establish a board for the ISOcat MD profile • is social tagging an issue• how to split responsibility between national contributors

• discuss ODD component model implications and opportunities

• specify a flexible component based CLARIN “standard” that can be submitted to ISO as well (model, core + extensions)

• specify the requirements for the infrastructure (tools, portals, reps, harvesting gateways, etc)

Page 16: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.5 Registry Building

• follow up of WG2.4 with those partners who are capable

• design components and component schema • integrate relevant concepts in ISOcat, create relations, enter localizations and sub-discipline variants • design set of infrastructure components and define APIs• decide about code development aspects and task division • develop code• do the code integration, testing etc• setup portals• give help for integration of existing collections and services (with WP5)• write manuals, do training courses etc

• Registry should be finished after three years!

Page 17: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.6/7 WS and WF

• essential for CLARIN is a flexible and simple way to integrate and combine resources and tools (virtual collections, chain of operations, profile match, etc)• need to understand what humanities researchers find simple

• need a deep analysis of the current practices and trends • the registry part of this will be dealt with in WG4/5 (UDDI, ebXML, etc)• there are general suggestions such as

• WSDL, REST for interface specification• BPEL, JBPM, Yahoo pipes, etc for workflow languages and graphical WF options

• there is domain specific knowledge and experience such as• within GATE (SAFE), UIMA, Bricks, at RACAI, at MPI etc etc

• how to get all the LRT which is out there into an SOA• how to achieve an open and flexible setup• how much standardization is required wrt to formats etc (strong/weak typing) • what kind of conversion routines are required• what are the special requirements of grid computing applications (http might be too slow) and services such as media streaming

Page 18: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.6 WS & WF Requirements

• carry out the deep analysis • understand in detail what humanities researchers would like to have and which degree of complexity they can handle (WP3/4)• understand in detail what other initiatives have been trying out • analyze chains of operations in LRT to understand the interoperability issues (WP5)• analyze requirements of profile matching • determine how ISOcat etc can be included to achieve interoperability at tag level

• make workshops with experts

• write requirements specification document • detect necessity of additional standards (WP5)

Page 19: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.7 WS & WF Building

• the general goal is to• include a large number of resources into the service landscape (WP5)• include a number of tools into the service landscape (WP5)• show the potential of an open ws domain at the end• implement a first simple graphical WF framework (or re-use existing stuff)• estimate costs for the construction and operation phase, i.e. broad coverage

• for the selection of LRT components• need an architectural design and a framework • design and develop encapsulation/wrapping methods • create services • create the required conversion services • develop a first simple WF tool and/or re-use existing stuff

• carry out some grid computing tests based on fast, shared file systems

Page 20: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.8 Basic Services

• any suggestions are welcome – until vague commitments such as

• domain wide searches• metadata • content (which architecture?, which rights?, ...)• combined

• LREP profile matching service

• more ideas to come from humanities projects

Page 21: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WG2.8 Not to forget

• offer first services asap• web-site as center point• should have mirror sites at a certain moment (not just one portal)

• examples• MPI will offer deposit, annotation/LMF curation and access services• RACAI can offer some technology services• Sheffield etc can offer services?• has someone a translation service?• etc

Page 22: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WP2 Work Distribution

• open for any suggestions• constraint: we have to fulfill the constraints given by the TA

• my original suggestion (see overview) – but it is TA theory • various WG registration until now (see overview)

• the big questions: • how much power is available per institute?• is there additional national support for you beyond the few pm?

Page 23: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WP2 Work Distribution

WG 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9

centers fed found fed build reg req reg build ws requ ws build basic serv costs

PM reg PM reg PM reg PM reg PM reg PM reg PM reg PM reg PM reg

MPI 1 X 2 X 4 X 2 X 14 X 2 X 16 X 6 X X

INL .5 .5 .5 .5 2

OTA .5 .5 1 1 3

RACAI .5 X .5 1 .5 3.5

Wroclaw X X X X

UPF X .5 X .5 1 5.5 1

ELDA X X X 2 X

ILSP .5 .5 1 X 1 6

ILC .5 .5 1 X 1 X 6

USFD X X X X 1 8 2

ULund .5 .5 .5 .5 2

DFKI 1 3 .5 3.5

CSC .5 .5 1 2

UIL-OTS X X X X

U Latvia X

U Leipzig X

K.U.Leuven X

Page 24: CLARIN WP2 Tech Infra WP2 Breakout 2008 open discussion about all aspects my slides just the pacemaker.

WP2 Procedure

• start today

• forming the WG for the first half year • start with providing documents

• video conferencing regularly • personal meeting when necessary• meeting at LREC if possible • workshop with experts

• in summer workshop together with WP5