Agro-Know & the European agricultural research information ecosystem

Post on 22-Jun-2015

213 views 1 download

Tags:

description

Slides of my talk to members of the Agricultural Information Institute (AII) of the Chinese Academy of Agricultural Sciences (CAAS), on September 19th, 2014.

Transcript of Agro-Know & the European agricultural research information ecosystem

Agro-Know & the European agricultural research

information ecosystem

Nikos Manouselis (PhD)CEO Agro-Knowwww.agroknow.gr

ToC

• about me & Agro-Know• our context of work• building a European data e-infrastructure for

agricultural research• collaboration between CAAS AII & Agro-Know

about me

Nikos• MSc, MΕng, PhD• >150 pubs• 1 post-doc• 1 project

management position

• Agro-Know

Κρήτη (Crete)

• Crete is the largest and most populous of the Greek islands

• It forms a significant part of the economy and cultural heritage of Greece while retaining its own local cultural traits (such as its own poetry, and music)

• Crete was once the center of the Minoan civilization (circa 2700–1420 BC), which is currently regarded as the earliest recorded civilization in Europe

Minoan civilisation

• Named after King Minos

• A king of Crete, son of Zeus and Europa

Minoans: enemies with Athens

• Every nine years, King Minos of Crete made King Aegeus of Athens to pick seven young boys and seven young girls to be sent to his palace, the labyrinth, to be eaten by the monster Minotaur (half man, half bull)

Theseus prince of Athens

princess Ariadne, daughter of Minos

so the myth is about navigating

through a labyrinth

helping people navigate through agricultural information

An extraordinary company that captures, organizes and adds value to the rich information available in agricultural and biodiversity sciences, in order to

make it universally accessible, useful and meaningful.

http://www.agroknow.gr

We develop and put in real practice solutions that transform data into meaningful knowledge

and services

We help people solve problems

informed by data

Unorganized Content in local and remote sites

Widgets

Authoring services

Data Discovery Services

Analytics services

Data Framework

Ingestion Translation Publication

Harvesting BlossomCultivation

Organized and structured Content in local and remote

DBs

Educational

Bibliographic

Other

Enrichment

Aggregate data from diverse sources

Works with different type

of data

Prepare data for

meaningful services

Educational

Bibliographic

data aggregation & sharing solutions

working with high profile partners & clients

• Food and Agriculture Organization (FAO) of the United Nations

• World Bank Group• UK’s Dept for International Development (DFID)• Michigan State University (MSU)• Wageningen University & Research (WUR)• French Institute of Agricultural Research (INRA)• Creative Commons

context

CIARD• “towards a Knowledge Commons on

Agricultural Research for Development”• “agricultural knowledge is freely accessible

and contributes to reducing hunger and poverty”

• “open knowledge makes it easier to provide better solutions”http://www.ciard.net/about/manifesto

Open Knowledge Convening (February 2013)

• Open Knowledge for Agricultural Development Convening, hosted by MSU in February 2013

launch of RDA (March 2013) • joint USA, EU, Australia Research Data Alliance – “researchers and innovators openly sharing data

across technologies, disciplines, and countries to address the grand challenges of society”

• Interest Group on Agricultural Data Interoperability– Wheat Data Interoperability Working Group– Germplasm Data Interoperability Working Group– …morehttps://rd-alliance.org

G8 conference (April 2013)“How Open Data can be harnessed to help meet the challenge of sustainably feeding nine billion people by 2050”

GODAN initiative• “support global efforts to make agricultural and

nutritionally relevant data available, accessible, and usable for unrestricted use worldwide”

• “advocate for the release and re-usability of data in support of Innovation and Economic Growth, Improved Service Delivery and Effective Governance, and Improved Environmental and Social Outcomes”http://godan.info/statement.html

building a European data e-infrastructure for agricultural research

• Agricultural research can be broadly defined as any research activity aimed at improving productivity and quality of crops– by genetic improvement, better plant protection , irrigation,

storage methods, farm mechanization , efficient marketing, better management of resources, human development

[Loebenstein & Thottappilly, 2007]

agricultural research

• Primary data:– Structured, e.g. datasets as tables– Digitized : images, videos, etc.

• Secondary data (elaborations, e.g. a dendogram)• Provenance information, incl. authors, their organizations and

projects• Methods and procedures followed• Reports, including papers• Secondary documents, e.g. training resources• Metadata about the above• Social data, tags, ratings, etc.

agricultural research information

there is a lot of data

…but where do I start searching?

simple goal of agINFRA• demonstrate how we can make information on

European agricultural research – more discoverable– better linked– interoperable & exchangeable

• focus on selected types of information (primarily bibliographic information, educational resources; also germplasm data, soil maps, …)

• collaboration cases with international partners (such as CAAS)

Registry of Datasets and APIs

Productivity Tools

Registry of vocabularies

and tools VEST registry

LOD Vocabularies

AGROVOCLocal KOSsControlled lists- Document types- Data types- File formats (IANA +)- Protocols- Audiences- Licenses etc.

agINFRA RDFvocabularies

agINFRA LOD KOSs

BibliographicEducationalGermplasmSoilDatasetsAPIsetc.

agINFRA data sources

agINFRA collections

agINFRA APIs

Including:

Information services

Grid

jobs

Grid

wor

kflow

ssag

KEA,

ag@

RDF,

agH

arve

st…

Publ

ic R

EST

APIs

agH

arve

st,

agTr

ansf

orm

, ag

Tagg

erCloud / SaaS tools

Omeka, AgriDrupal, AgriOceanDSpace

VocBench

Shared URIs

agIFNRA e-infrastructure

Call APIs

Data providers

Information systems

providers

Researchers

Taxonomists

Registry of Datasets and APIs

Productivity Tools

Registry of vocabularies

and tools

LOD Vocabularies

agINFRA RDFvocabularies

agINFRA LOD KOSs

data sources

collections

APIs

Information services

Grid

jobs

Grid

wor

kflow

ss

Publ

ic R

EST

APIs

Cloud / SaaS tools

Policy makers

Developers

actors over the infrastructure

new agINFRA RING

moving forward

HARVESTER

OAI-PMH Service Provider #1

Schema #1

OAI-PMH Service Provider #n

Schema #n

INDEXER

AggregatedXML Repository

Web Portals

Open AGRIS (FAO)AgLR/GLN (ARIADNE)Organic.Edunet (UAH)

VOA3R (UAH)...

AGRIS AP Schema

IEEE LOM Schema

DC Schema

...

RDF Triple Store

Common Schema

SPARQL endpoint(Data Source #1)

SPARQL endpoint(Data Source #n)

INDEXER

Web Portals

SPARQL endpoint

NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES

HARVESTER

OAI-PMH Service Provider #1

Schema #1

OAI-PMH Service Provider #n

Schema #n

INDEXER

AggregatedXML Repository

Web Portals

Open AGRIS (FAO)AgLR/GLN (ARIADNE)Organic.Edunet (UAH)

VOA3R (UAH)...

AGRIS AP Schema

IEEE LOM Schema

DC Schema

...

RDF Triple Store

Common Schema

SPARQL endpoint(Data Source #1)

SPARQL endpoint(Data Source #n)

INDEXER

Web Portals

SPARQL endpoint

NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES

problem when scaling up

• enable the seamless federation of:– large, live, constantly updated datasets and

streams–heterogeneous data

• involve data publishers that– cannot or will not join a tight, centrally

controlled distributed database– cannot or will not directly and immediately

make the transition to new vocabularies

the SemaGrow solution• a SPARQL endpoint that federates several

heterogeneous data sources– client poses a query in their preferred schema• no need to know where to ask for what• no need to know the source’s schema

– by means of collecting and indexing meta-information about the data stored in each data source

• in this manner the data sources do not need to be cloned and re-hashed, and the way data is distributed among them does not need to be centrally controlled

Query

Federated endpoint Wrapper

SemaGrow SPARQL endpoint

Resource Discovery

Query results

query fragment,Source

(#1)

Instance StatisticsData Summaries

SPARQL endpoint

POWDER Inference Layer

P-Store

InstanceStatistics

query fragment,target Source

transformed query

Query Decomposition

querypatterns

Query Results Merger

query fragment,Source

(#n)

queryresults

Client

Reactivityparameters

Query Decomposer

Data Source(s) Selector

Ctrl

Candidate Source(s) List· Instance Statistics· Load Info· Semantic Proximity

Query Transformation Service

SchemaMappings

SPARQL endpoint(Data Source #n)

SPARQLquery

Ctrl

Ctrl

Load Info

Instance Statistics

Data Summaries

Set of query

patternsQuery Pattern Discovery

Service

equivalentpatterns

querypattern

SemanticProximity

Resource Selector

query results schema

transformed schema

queryrequest #1

queryrequest #n

queryresults

SPARQL endpoint(Data Source #1)

SPARQLquery

Query Manager

what Semantic Web can bring into the picture• One Data Access Point for the entire Data Cloud–Enabling Service-Data level agreements with Data providers

• Application-level Vocabularies / Thesauri / Ontologies–Enabling different application facets for different communities of users over the same data pool

•Going beyond existing Distributed Triple Store Implementations–Link Heterogeneous but Semantically Connected Data–Index Extremely Large Information Volumes (Peta Sizes)–Improve Information Retrieval response

• Data (+Metadata) physically stored in Data Provider–No need for harvesting

• Vocabularies / Thesauri / Ontologies of Data Provider choice–No need for aligning

according to common schemas

research challenges• develop novel methods for querying

distributed triple stores – that can overcome the problems stemming from

heterogeneity and the undetermined distribution of data over nodes

• develop scalable and robust semantic indexing algorithms – that can serve detailed and accurate data source

annotations (metadata) about extremely large datasets

what is next

similar/relevant efforts

• PubAg: forthcoming service by National Agricultural Library (NAL) for discovering USDA publications – and beyond

• LGU community of ag knowledge: forthcoming service federating institutional repositories of Land Grant Universities in the US

• CGIAR open: (to be) federating & providing access to publications and data from all CG center repositories

• …and maybe more to come

collaboration between CAAS AII & Agro-Know

a route for sharing knowledge

what happens when we are hosting?

we make a formal intro & present plans

then we eat

we do some work

we eat again

we drink a bit

we drink a bit more

and of course we eat

what will happen when you will host us?

I have gotten an idea…

who is next?

thank you!

nikosm@agroknow.grhttp://blog.agroknow.gr