PANGAEA - Providing access to geoscientific data using Apache Lucene Java
-
Upload
ervin-miller -
Category
Documents
-
view
232 -
download
0
Transcript of PANGAEA - Providing access to geoscientific data using Apache Lucene Java
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
1/25
PANGAEA - Providing access togeoscientific data using Apache
Lucene JavaUwe Schindler
PANGAEA / SD DataSolutions GmbH, [email protected]
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
2/25
My Background
My main focus is on development of Lucene Java.Implemented fast numerical search and maintaining the new attribute-based text analysis API .Studied Physics at the University of Erlangen-Nuremberg andwork as consultant and software architect for PANGAEA(Publishing Network for Geoscientific & Environmental Data) in Bremen, Germany, where I implemented the portal's geo-spatial retrieval functions with Lucene Java .Talks about Lucene at various international conferences like
ApacheCon EU/US, Lucene Eurocon, Berlin Buzzwords andvarious local meetups.
I am committer and PMC member of Apache Lucene and Solr .
http://www.lucidimagination.com/blog/2010/03/10/state-of-spatial-support-in-apache-solr/http://www.lucidimagination.com/blog/2010/03/10/state-of-spatial-support-in-apache-solr/http://www.lucidimagination.com/blog/2010/03/10/state-of-spatial-support-in-apache-solr/http://www.lucidimagination.com/blog/2010/03/10/state-of-spatial-support-in-apache-solr/http://www.lucidimagination.com/blog/2010/03/10/state-of-spatial-support-in-apache-solr/ -
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
3/25
since 1993Information system for earth system science data hosted by AWI &MARUM 2001Mandate of the International Council for Science (ICSU):
World Data Center for Marine Environmental Sciences (WDC- MARE) 2007Mandate of the World Meteorological Organisation (WMO):World Radiation Monitoring Center (WRMC)
2010 (certification in progress)Mandate of the World Meteorological Organisation (WMO):Data Collection and Processing Center (DCPC)
About PANGAEA
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
4/25
Nuclear RadiationTokyo, Japan
WDC Co-ordination OfficesWashington DC, USABeijing, China
MeteorologyAsheville NC, USA
Beijing, ChinaObninsk, Russia
OceaographyObninsk, RussiaSilver Spring MD, USATianjin, China
PaleoclimatologyBoulder CO, USA
Marine Geology and GeophysicsBoulder CO, USAMoscow, Russia
Remotely Sensed Land DataSioux Falls SD, USA
Renewable Resources and EnvironmentBeijing, China
Recent Crustal MovementsOndrejov, Czech Republic
AirglowMitaka,Japan
AstronomyBeijing, China
Atmospheric Trace GasesOak Ridge TN, USA
AuroraTokyo, Japan
Cosmic RaysToyokawa, Japan
GeologyBeijing, China
Human Interactions in the EnvironmentPalisades NY, USA
IonosphereTokyo, Japan
Earth TidesBrussels, Belgium
GeomagnetismCopenhagen, DenmarkEdinburgh, UKKyoto, JapanColaba, India
GlaciologyBoulder CO, USACambridge, UK
Lanzhou, China
Marine Environmental SciencesBremen, Germany, (2001)
Rotation of the EarthObninsk, RussiaWashington DC, USA
Satellite InformationGreenbelt MD, USA
Rockets and SatellitesObninsk, Russia
SeismologyDenver CO, USABeijing, China
Solar Radio EmissionNagano, Japan
Space ScienceBeijing, China
Space Science SatellitesKanagawa, Japan
Solar ActivityMeudon, France
SoilsWageningen, The Netherlands
Sunspot IndexBrussels, Belgium
Solar Terrestrial PhysicsBoulder CO, USADidcot Oxon, UKMoscow, RussiaHaymarket, Australia
Solid Earth GeophysicsBeijing, China
Boulder CO, USAMoscow, Russia
Network of World Data CentersGeophysical Year 1957
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
5/25
Why do we need Data Libraries?
- Good scientific practice- Needed for verification of scientific
work- Good availability of data for large
scale and complex scientificapproaches
-than reproduction
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
6/25
Geosciences before 1900
Turin papyrus,~1160 BC
William Smith, 1815Glomar challenger, 1875
http://upload.wikimedia.org/wikipedia/commons/0/0b/Turine_Papyrus,_ca._1320_v.C..jpg -
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
7/25
ENIAC, 1944
Technical Improvements
Magnetometer
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
8/25
Development of the globalclimate
The last 1300 years
Thousands of years before present
Thousands of years before present
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
9/25
0
5
10
15
20
25
30
1970 1980 1990 2000 2010
Publications
Data
?
Information increase in empirical sciences
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
10/25
Archiving and publication ofscientific data
Data acquisitionQuality assuranceLong-term availability and access
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
11/25
Long term archive
Open access & non restricted datao Creative Commons license
Data accepted from individual scientists,institutes, and science projectsLong term funding for basic operation
o hardware, software, system management &organisation
Long term preservation of datao Technical: security, migration of media,o Usability: preserving the integrity & semantics of
data sets
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
12/25
Contents
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
13/25
Data Types in PANGAEA
IRD(grav/ 10cm3)
Sand(%)
CaCO3(%)
TOC(%)
Radio(%/ sand)
Smect(%/ cl ay)
IRD(grav/ 10cm3)
Sand(%)
CaCO3(%)
TOC(%)
Radio(%/ sand)
Smect(%/ clay)
IRD(grav/ 10cm3)
Sand(%)
CaCO3(%)
TOC(%)
Radio(%/ sand)
Smect(%/ clay)
IRD(grav/ 10cm3)
Sand(%)
CaCO3(%)
TOC(%)
Radio(%/ sand)
Smect(%/ clay)
IRD(grav/ 10cm3)
Sand(%)
CaCO3(%)
TOC(%)
Radio(%/ sand)
Smect(%/ cl ay)
PS1389-3 PS1390-3 PS1431-1 PS1640-1 PS1648-1
Age (kyr) max. : 233.55 kyr PS1389-3ff
0.0
100.0
200.0
0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 1 00 0 2 0 0 10 0 0 1 5 0 0. 5 0 50 0 1 00
54 0' 54 0'
5430' 5430'
55 0' 55 0'
5530' 5530'
11
11
12
12
13
13
14
14
15
15
World vectorshore lineGrain size class KOLP AGrain size class KOEHN2Grain size class KOEHNGeochemistryGrain size class KOLP B
rain size class KLP DIN20 m
Scale: 1:2695194 atL atitude 0
Source:Baltic Sea Research Institute,Warnemnde.
Profiles => doi:10.1594/PANGAEA.701299 Time series => doi:10.1594/PANGAEA.323487 Sea bed photos => doi:10.1594/PANGAEA.319877 Distributes samples => doi:10.1594/PANGAEA.51749 Complex data => doi:10.1594/PANGAEA.108079 Air photos => doi:10.1594/PANGAEA.323540 Audio record => doi:10.1594/PANGAEA.339110
http://doi.pangaea.de/10.1594/PANGAEA.108079http://doi.pangaea.de/10.1594/PANGAEA.701299http://dx.doi.org/10.1594/PANGAEA.323487http://doi.pangaea.de/10.1594/PANGAEA.319877http://dx.doi.org/10.1594/PANGAEA.51749http://doi.pangaea.de/10.1594/PANGAEA.108079http://doi.pangaea.de/10.1594/PANGAEA.323540http://doi.pangaea.de/10.1594/PANGAEA.339110http://doi.pangaea.de/10.1594/PANGAEA.339110http://doi.pangaea.de/10.1594/PANGAEA.323540http://doi.pangaea.de/10.1594/PANGAEA.108079http://dx.doi.org/10.1594/PANGAEA.51749http://dx.doi.org/10.1594/PANGAEA.51749http://dx.doi.org/10.1594/PANGAEA.51749http://doi.pangaea.de/10.1594/PANGAEA.319877http://dx.doi.org/10.1594/PANGAEA.323487http://dx.doi.org/10.1594/PANGAEA.323487http://dx.doi.org/10.1594/PANGAEA.323487http://doi.pangaea.de/10.1594/PANGAEA.701299http://doi.pangaea.de/10.1594/pangaea.103958http://doi.pangaea.de/10.1594/pangaea.319879http://doi.pangaea.de/10.1594/pangaea.323487 -
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
14/25
unclassified
Sediment
Water
Corals
Atmosphere Ice
Total number of data sets ~ 1 millionData items ~ 8 billions
Statistics (9/2010)
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
15/25
Now the technical details :-)
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
16/25
SybaseASE
Middleware Webserver
Editorialsystem
PANGAEAsearchengine
PANGAEA -Architecture
Harddisk+ tape (silo)
RDB
ApacheLucene
GoogleMaps / Earth
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
17/25
Indexing contents from relationaldatabase with dynamic updates
Data Set
Staffs
Projects
Data Series
Events
Update Log
XML Data SetDescription(Metadata)
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
18/25
Indexed Information
Textual metadata: citation (authors, title),abstract, measurement parameters,methods, associated projects, comments,documentation including field info for allXML schema element types)Fulltext data set contentsGeographical information: latitude/longitude/BBOX/track, dates,
geological age, depth/elevation[NumericField/NumericRangeQuery]Soon: Fulltext of attached external documentation
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
19/25
Geo-Retrieval with Lucene
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
20/25
Using scored querieswith KML regions as filters
-
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
21/25
Apache Luceneas fast Key-Value Store
Lucene is used for almost every query on theweb-client
of keyword terms indexed for quickretrieval of data setsExample: Lookup of datsets related topublications using DOI PANGAEA is hit byhundreds of DOI lookup queries per secondfrom scientific publishers:
http://doi.pangaea.de/10.1016/0377-8398(92)90001-Z -
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
22/25
Apache Luceneas fast Key-Value Store
Lucene is used for almost every query on theweb-client
of keyword terms indexed for quickretrieval of data setsExample: Lookup of datsets related topublications using DOI PANGAEA is hit byhundreds of DOI lookup queries per secondfrom scientific publishers:
http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://dx.doi.org/10.1016/S0377-8398(01)00044-5http://doi.pangaea.de/10.1016/0377-8398(92)90001-Z -
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
23/25
PRESENTATION
Live
http://www.pangaea.de/http://www.pangaea.de/ -
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
24/25
ContactUwe Schindler
PANGAEA - Publishing Network for Geoscientific &Environmental Data
MARUM, Leobener Str., 28359 Bremen, [email protected]
SD DataSolutions GmbHWtjenstr. 49, 28213 Bremen, Germany
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected] -
8/8/2019 PANGAEA - Providing access to geoscientific data using Apache Lucene Java
25/25
Thank you!Know more about Apache Lucene at
www.lucidimaginatin.com
http://www.lucidimagination.com/events/revolution2010http://www.lucidimagination.com/events/revolution2010