ALIGNED Data Curation Methods and Tools

13
ALIGNED Data Curation Methods and Tools Rob Brennan, ALIGNED Coordinator SWIMing VoCamp Workshop, Dublin, 22 March 2016

Transcript of ALIGNED Data Curation Methods and Tools

Page 1: ALIGNED Data Curation Methods and Tools

ALIGNED Data Curation Methods andTools

Rob Brennan, ALIGNED Coordinator

SWIMing VoCamp Workshop,

Dublin, 22 March 2016

Page 2: ALIGNED Data Curation Methods and Tools

3/25/20162

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644055.

This communication reflects only the author’s view and the Commission is not responsible for any use that may be made of the information it contains.

Page 3: ALIGNED Data Curation Methods and Tools

ApplicationUsers

Data Harvesters

DatasetDomain Experts

Software Developers

SystemAdmins

Data Architects

Dev.Managers

Software Testers Data

Consumers

SoftwareAnalysts

Implementation

Analysis

Planning

Maintenance

SoftwareEngineering

Lifecycle

Design

Manual Revision/ Author

Inter-linking/Fusing

Classify/Enrich

Quality Analysis

Evolve /RepairSearch/

Browse/Explore

Extract

Store/Query

DataEngineering

Lifecycle

SystemAnalysts

Overall Goal:How can we get these guys to talk?

To improve: Productivity, Agility, Quality?

Page 4: ALIGNED Data Curation Methods and Tools

Data Quality and Data Curation in ALIGNED

• Building high quality data-intensive systems requires high quality datasets

• But– Datasets are now first class citizens with lifecycles

that are independent of the consuming apps

– Quality still problematic

• We observe:– Rich data models support quality engineering

– Linked Data entering the enterprise

Page 5: ALIGNED Data Curation Methods and Tools

ALIGNED Tools for Data CurationProductivity, Agility, Quality

DataEngineering

Data Quality Validation

UnifiedProcess

Governance

Data Integrity Assurance

Data IntegrationAssurance

Semi-Supervised

Data Curation

See: http://aligned-project.eu/open-source-tools/https://www.poolparty.biz/

Linked DataExtract,

Transform, Load

TaxonomyManagement

Dataset Release Automation

Page 6: ALIGNED Data Curation Methods and Tools

ALIGNED Validates in Real-World, Data Intensive Systems

Global History Databank

Legal InformationSystem

Nucleus for the Web of Data

SemanticMiddleware

Page 7: ALIGNED Data Curation Methods and Tools

Data Consumers

Community of experts & Volunteers

Electronic Archives

Example: Seshat Target System

databases

SeshatDatabank

Collective Intelligence

High

Quality

Open

Data

Feedback

“improve the extraction of collective intelligence from electronic archives,

research communities and data consumers to improve the quality of published data”

Page 8: ALIGNED Data Curation Methods and Tools

Seshat Data Web

Wiki

RDF Triple Store

Linked Data Publication

User Management

Schema Management

tool

Wiki Data Entry/Validati

on Tool

Errors

Data Visualisations

Data Transformations

Links to other Datasets

Seshat Data Web Pages

Read/query

Enter Data

Validate Candidate

Time Series Analysis

Data ExportTool

Data Dump File (TSV )

CandidateGeneration/

Filtering tools

Seshat Editor Seshat AdministratorSeshat Contributors Seshat Analyst

Copy of Seshat Data

Seshat Schema Knowledge

Model

Seshat Data Knowledge

Model

Seshat Reader

FeedbackView Data

Data Quality Controls

Read Data

DBpediaExternal candidate

source

WorkflowManagement

WikiGeneration

tool

generate

Global History Databank Pilot Data Curation System

Page 9: ALIGNED Data Curation Methods and Tools

Goal is to minimise work requirements from expert users (domain expert, architect) and to ensure data-quality in different dimensions at different steps in the process.

Dacura: Generic, Quality-Oriented Data Curation Process

Page 10: ALIGNED Data Curation Methods and Tools

Dacura Data Harvesting Interfaces

Page 11: ALIGNED Data Curation Methods and Tools

• Knowledge and Data Engineering Group/ADAPT Centre, Trinity College Dublin

• Software Engineering Group, University of Oxford

• Institute of Cognitive and Evolutionary Anthropology,University of Oxford

• Agile Knowledge Engineering and Semantic Web GroupUniversität Leipzig

• Semantic Web Company GmbH• Content Strategy and Architecture Department,

Wolters Kluwer Germany,Wolters Kluwer Poland

• Institute of PrehistoryAdam Mickiewicz University at Poznan

Partners

Page 12: ALIGNED Data Curation Methods and Tools

We want to help you!The ALIGNED Consultancy Program

• Are you a business?

• Do any of these apply:– Are you building data-intensive applications?

– Do you want to curate high quality data?

– Need help integrating Linked Data + apps?

– Want to integrate your software and data engineering teams?

Call on the ALIGNED consultancy program!

http://aligned-project.eu/aligned-consultancy-program-opportunities/

Page 13: ALIGNED Data Curation Methods and Tools

Contact: [email protected]

Web: http://www.aligned-project.eu

Twitter: @AlignedProject