Semantic search within Earth Observation products databases based on automatic tagging of image...
-
Upload
gasperi-jerome -
Category
Technology
-
view
311 -
download
1
description
Transcript of Semantic search within Earth Observation products databases based on automatic tagging of image...
Jérôme Gasperi
2014 Conference on Big Data from Space Frascati - Italy - November 12th, 2014
SEMANTIC SEARCH WITHIN EARTH OBSERVATION PRODUCTS DATABASES BASED ON AUTOMATIC TAGGING OF IMAGE CONTENT
Big Data ? The data deluge
The search paradigm
iTag An EO tagging library
resto An EO product search engine
What’s next ? Conclusion and perspectives
Brett Ryder - http://www.economist.com/node/15579717
The data deluge
Earth Observation products search paradigm is to use the acquisition parameters stored in the metadata
When Where How
What ?i.e. image content
Sven Sachsalber | http://www.palaisdetokyo.com/fr/events/sven-sachsalber
Automatic tagging of Earth Observation products
iTag
Orthorectified image Characterized image
This is urban
This is water
This is forest
What we got What we need
iTag provides semantic enhancement of Earth Observation data
It uses metadata footprint to enrich metadata from exogenous data
i.e. no image processing !
Out of the box tagging sources Continents, Countries, Regions, States, Cities, Land cover, Rivers, Population count
# Polygon around Moscow$moscow = ‘POLYGON((37.1351 55.9655,38.1006 55.9640,38.0525 55.4969,37.0926 55.5171,37.1351 55.9655))’;
# Initialize iTag$iTag = new iTag();
# Tag polygon for land cover$result = $iTag->tag($moscow, array(
‘landcover’ => true));
github.com/jjrom/itag
restoToward an Earth Observation products search engine
Search, visualize and download Earth Observation data
Architecture
Query AnalyzerGazetteer
Administration
RES
T W
ebse
rvic
es
Abs
trac
t D
atab
ase
Acc
ess
Laye
r
PostgreSQL Driver
iTag 2.0
resto 2.0
Search
Visualize
Download
Users
DELETE
POST
Admin
Data
Abs
trac
t D
atab
ase
Acc
ess
Laye
r
PostgreSQL Driver
databaseresto
schema_collection1
schema_collection2
…etc…
schemaresto
schemausersmanagement
PostGIShstoreTable inheritance
Rresto
IngestSearch
POSTGET
Ingest
Query AnalyzerGazetteer
Administration
RES
T W
ebse
rvic
es
Abs
trac
t D
atab
ase
Acc
ess
Laye
r
PostgreSQL Driver
iTag 2.0
resto 2.0
Search
Visualize
Download
Users
DELETE
POST
Data
During ingestion process, resources are automatically tagged thanks to iTag library
Why to tag image first ?
Search images o
ver Russia
Bounding box !!
Search
resto provides semantic search capabilitiesIt uses a Query Analyzer to translate natural language query into a set of EO OpenSearch parameters
<with> "keyword"<without> "keyword"
"quantity" <lesser> (than) "numeric" "unit""quantity" <greater> (than) "numeric" "unit""quantity" <equal> (to) "numeric" "unit"<lesser> (than) "numeric" "unit" (of) "quantity"<greater> (than) "numeric" "unit" (of) "quantity"<equal> (to) "numeric" "unit" (of) "quantity""quantity" <between> "numeric" <and> "numeric" ("unit")<between> "numeric" <and> "numeric" "unit" (of) "quantity"
<today><yesterday><before> "date"<after> "date"<between> "date" <and> "date""numeric" "(year|day|month)" <ago><last> "(year|day|month)"<last> "numeric" "(year|day|month)""numeric" <last> "(year|day|month)""(year|day|month)" <last><since> "numeric" "(year|day|month)"<since> "month" "year"<since> "date"<since> "numeric" <last> "(year|day|month)"<since> <last> "numeric" "(year|day|month)"<since> <last> "(year|day|month)"<since> "(year|day|month)" <last>
Query string analysis algorithm is based on simple recognition of words and patterns
« Images of urban area in Russia acquired in last year with less than 5 % of cloud cover »
Example
Example
keyword location date acquisition parameter« Images of urban area in Russia acquired in last year with less than 5 % of cloud cover »
2. Each search result has an « human readable url » that can be indexed by web crawler (i.e. google robots)
1. Search parameters are derived from Natural Language query
3. Keywords on resources are links to search requests : they can be indexed by web crawler…and so on
2. Each search result has an « human readable url » that can be indexed by web crawler (i.e. google robots)
1. Search parameters are derived from Natural Language query
3. Keywords on resources are links to search requests : they can be indexed by web crawler…and so on
http://goo.gl/BCZ3z4
As of version 2.0, resto supports faceted search
PerformancesiTag / resto
Time period of 1 month within a 10x10 km2 box
SEARCH
INGEST
0.2s
0.5s
1 000 000SPOT DATABASE
New products retrieved every 3 hours from ADS catalog
Per product for a ~5000 products ingestion
Order of magnitude compute on a Dual Core 2.6 GHz | 4 Go RAM | HDD 500 To
What’s next ?Conclusion and perspectives
Need for « fresh » tagging reference databases(e.g. GLC2000 replacement)
Enhance metadata with twitter trends hashtagsAdd tags #mh370,#plane,#malaysianairline to resources acquired between 2014, march 8th and 2014, april 14th in the south of the Indian Ocean
« Linked data is the right way to do Semantic Web »Tim Berners-Lee
Update iTag JSON model to follow JSON-LD format{ "@context": "http://json-ld.org/contexts/person.jsonld", "@id": "http://dbpedia.org/resource/John_Lennon", "name": "John Lennon", "born": "1940-10-09", "spouse": "http://dbpedia.org/resource/Cynthia_Lennon" }