Towards Comprehensive Big Data Support: the EarthServer...
Transcript of Towards Comprehensive Big Data Support: the EarthServer...
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 1
EU FP7-INFRA
EarthServer
grant #283610
Towards Comprehensive Big Data Support: the EarthServer Project
Findability Challenge Workshop Taormina, 2012-may-10
Peter Baumann
Jacobs University | rasdaman GmbH
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 2
EU FP7-INFRA
EarthServer
grant #283610
Big Data Research
@ Jacobs U
Jacobs University:
international, multi-cultural
◦ 110 nations, english official language
Large-Scale Scientific Information
Systems research group
◦ large-scale n-D raster services & beyond:
theory, practice, application, standardization
◦ Main outcomes:
rasdaman: n-D Array DBMS
Standards
◦ www.jacobs-university.de/lsis
Hiring
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 3
EU FP7-INFRA
EarthServer
grant #283610
Roadmap
Motivation
Arrays in Databases
EarthServer
Conclusion
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 4
EU FP7-INFRA
EarthServer
grant #283610
Climate Modelling
Example: ECHAM T42◦ 50+ physical parameters („variables“):
temperature, wind speed x/y, humidity,
pressure, CO2, ...
◦ 2.5 TB per variable
DKRZ: 24-node NEC SX-6
dimension extent
Longitude 128
Latitude 64
Elevation 17
time (24 min per time slice)
2,190,000(200 years)
„Even with multi-terabyte local disk sub-
systems and multi-petabyte archives,
I/O can become a bottleneck in HPC.“
-- Jeanette Jenness,
LLNL, ASCI-Project, 1998
[MPIM]
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 5
EU FP7-INFRA
EarthServer
grant #283610
The Geo Data Tsunami
sensors feeds
[OGC]
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 6
EU FP7-INFRA
EarthServer
grant #283610
The Geo Data Tsunami
[OGC]
sensors feeds
coverage
server
Semantic Web
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 7
EU FP7-INFRA
EarthServer
grant #283610
The Challenge for Data Centers
Serving data is not enough
trend: service quality as differentiating criterion
transition from data stewardship to service stewardship
Specifically with high-volume data, what can this mean?
t
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 8
EU FP7-INFRA
EarthServer
grant #283610
Motivational Example: EO Image Time Series Archive
[DFD-DLR, Diederich et al]
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 9
EU FP7-INFRA
EarthServer
grant #283610
select tiff( ht[ $1, *:*, *:* ] )
from HeadTomograms as ht,
Hippocampus as mask
where count_cells( ht > $2 and mask )
/ count_cells( mask )
> $3
goal: understand structural-functional relations in human brain
Experiments capture activity patterns (PET, fMRI)◦ Temperature, electrical, oxygen consumption, ...
◦ lots of computations „activation maps“
Example: “a parasagittal view of all scans containing
critical Hippocampus activations, TIFF-coded.“
Human Brain Imaging
$1 = slicing position, $2 = intensity threshold value, $3 = confidence
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 10
EU FP7-INFRA
EarthServer
grant #283610
It„s Not Always About Big Return:
Reverse Lookup Scenario
3-D volumes
1-D time series
2-D imagery
2-D tables
„all clinical trials of drug X
where patient temperature > 40º C
within the first 48 hours.“
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 12
EU FP7-INFRA
EarthServer
grant #283610
Who Needs Array Databases?Sensor, image, statistics data:
◦ Life Science: Pharma/chem, healthcare / bio research, bio statistics,
genetics
◦ Geo: Geodesy, geology, hydrology, oceanography, meteorology, earth
system research, ...
◦ Space: optical astronomy, radio astronomy, cosmological simulation
◦ Engineering & research: Simulation & experimental data in
automotive/shipbuilding/ aerospace industry, turbines, process industry,
astronomy, experimental physics, high energy physics, ...
◦ Management/Controlling: Decision Support, OLAP, Data Warehousing,
census, statistics in industry and public administration, ...
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 13
EU FP7-INFRA
EarthServer
grant #283610
Why Array Databases?
"classical“ database benefits for raster data:
◦ data integration
◦ flexibility
◦ scalability
◦ ...plus all further assets,
like off-the-shelf tool support
Unfortunately database people have been soooo conservative
◦ "images are matrices [...] which are stored as byte strings, ie, BLOBs"
◦ „this is NOT SQL!“
Server
App_nApp_1App_nApp_1
DBMS
App-
Server
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 14
EU FP7-INFRA
EarthServer
grant #283610
Array Analytics
Array Analytics := Efficient analysis on multi-dimensional arrays
of a size several orders of magnitude above main memory of
evaluation engine
◦ Array := n-D sequence (cf programming languages) := raster
◦ Cf 1st Workshop on Array Databases, Uppsala 2011
www.rasdaman.com/ArrayDatabases_Workshop
Earth, Space, Life, Social sciences; business (OLAP!)
Research issues:
◦ Concepts: modeling, access interfaces (QLs!), ...
◦ Architecture: storage, processing, optimization, ...
◦ Scalability, usability, applications, standards, ...
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 15
EU FP7-INFRA
EarthServer
grant #283610
Related (DB) Work
Precursors
◦ Image partitioning, API access library [Tamura 1980]
◦ Fixed set of imaging operators [Chang, Fu 1980; Stucky, Menzi 1989; Neumann et al 1992]
◦ PICDMS [Chock, Cardenas 1984]
Algebra & models
◦ rasdaman model & algebra [Baumann 1991]
◦ „Call to order“ [Maier 1993]
◦ AQL [Libkin,Machlin 1996]
◦ AML [Marathe, Salem 2002]
Components & systems
◦ rasdaman [Baumann+ 1992+]
◦ tertiary storage for arrays[Sarawagi, Stonebraker 1994]
◦ ESRI ArcSDE, Oracle GeoRaster [~2004]
◦ TerraLib [Camara et al, 2003]
◦ MonetDB: RAM, SciQL [~2004]
◦ PostGIS Raster [~2007]
◦ SciDB [~2008]
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 16
EU FP7-INFRA
EarthServer
grant #283610
The rasdaman Array DBMS
C/S Array DBMS for massive n-D raster data
◦ new attribute type:
array<celltype,extent>
◦ In operational use on dozen-TB objects
rasql = declarative array query language
◦
◦ algebraic foundation
Tile-based architecture
◦ n-D array set of n-D tiles DB blobs
◦ evaluation based on “tile streaming”
◦ extensive storage & query optimization
ATable array arrayOID
oid 1
oid 2
oid 3
oid 4
oid 5
metadata att 2att 1 att n
key1 ... oid 1
key2 ... oid 2
key3 ... oid 3
select img.green[x0:x1,y0:y1] > 130
from LandsatArchive as img
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 17
EU FP7-INFRA
EarthServer
grant #283610
trimming & slicing
–
result processing
Some Array Query Operators
search & aggregation
data format conversion
select c.img[ *:*, 100:200, *:*, 42 ]
from ClimateSimulations as c c
select img * (img.green > 130)
from LandsatArchive as img
select mri.img
from MRI as mri, masks as am
where some_cells( mri.img > 250 and m.img )
select png( c[ *:*, *:*, 100, 42 ] )
from ClimateSimulations as crasdaman
DBHDF
PNG
NetCDF
HDF
PNG
NetCDF
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 18
EU FP7-INFRA
EarthServer
grant #283610
Examples
Histogram
◦ select marray bucket in [0:255]
values count_cells( a = bucket)
from a, b
Matrix multiplication
◦ select marray x in sdom(a)[0], y in sdom (b)[1]
values condense +
over z in sdom(a)[1]
using a[x,z] * b[z, y] )
from a, b
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 19
EU FP7-INFRA
EarthServer
grant #283610
„Raster SQL“: navigation, extraction, aggregation, analysis
Time series
Image processing
Summary data
Sensor fusion
& pattern mining
N-D Sensor, Image, and Statistics Queries
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 20
EU FP7-INFRA
EarthServer
grant #283610
Query Processing: Overview
Parsing
Normalisation
Optimization
◦ Common subexpression elimination
[Generate query plan]
Tile-based evaluation
select a < avg_cells( b + c )
from a, b, c
<ind
avg
+ind
cb
a
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 21
EU FP7-INFRA
EarthServer
grant #283610
understood:
heuristic optimization – 150 rules
in rasdaman [Ritsch 2002]
partially understood:
cost-based optimization
Optimization/1: Query Rewriting
select avg_cells( a + b )
from a, b
select avg_cells( a )
+ avg_cells( b )
from a, b
avg
+
a
avg
b
avg
+ind
ba
≡
Tile stream
high traffic
Scalar stream
low traffic
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 22
EU FP7-INFRA
EarthServer
grant #283610
Optimization/2: Just-In-Time Compilation
Observation: interpreted
mode slows down
Approach:
• cluster suitable operations
• compile & dynamically bind
Benefit:
• Speed up complex, repeated operations
Variation:
• compile code for GPU
Times [ms] for 5122 * n ops[Jucovschi, Stancu-Mara 2008]
select x*x*...*x
from float_matrix as x
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 23
EU FP7-INFRA
EarthServer
grant #283610
Optimization Does Pay Off!
Ex: real-time WMS zoom/pan/styling
◦ 1 background, 1 bathymetry, 3*RGB
◦ www.earthlook.org
SELECT png(
(marray x in [0:399,0:399] values {255c,255c,255c})
overlay
((scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) < -1300)*{0c,0c,0c}
+(-1300.000000< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1290)*{219c,0c,172c}
+(-1289.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1282)*{209c,0c,178c}
+(-1281.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1275)*{199c,0c,186c}
+(-1274.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1272.5)*{186c,0c,189c}
+(-1272.499999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1271)*{174c,0c,194c}
+(-1270.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1270.5)*{162c,0c,199c}
+(-1270.499999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1270)*{150c,0c,204c}
+(-1269.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1269.5)*{139c,0c,209c}
+(-1269.499999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1269)*{125c,0c,214c}
+(-1268.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1268.5)*{110c,0c,219c}
+(-1268.499999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1268)*{98c,0c,227c}
+(-1267.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1267.5)*{77c,0c,232c}
+(-1267.499999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1267)*{59c,0c,237c}
+(-1266.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1266.5)*{28c,0c,242c}
+(-1266.499999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1266)*{16c,12c,245c}
+(-1265.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1265.5)*{36c,32c,247c}
+(-1265.499999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1265)*{47c,47c,247c}
+(-1264.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1264.5)*{60c,63c,250c}
+(-1264.499999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1264)*{67c,74c,250c}
+(-1263.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1263.5)*{76c,90c,252c}
+(-1263.499999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1263)*{83c,103c,252c}
+(-1262.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1262)*{88c,116c,252c}
+(-1261.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1261)*{93c,128c,252c}
+(-1260.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1260)*{99c,144c,255c}
+(-1259.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1257)*{102c,158c,255c}
+(-1256.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1250)*{107c,174c,255c}
+(-1249.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1240)*{110c,190c,255c}
+(-1239.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -1230)*{112c,205c,255c}
+(-1229.999999< scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) and scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ) <= -126.5)*{115c,220c,255c}
+ (-126.5 < scale( extend( img0[269:349,0:65], [269:395,-60:65] ), [0:399,0:399] ))*{255c,255c,255c})
overlay (scale( extend( img2[124:468,0:578], [124:717,-14:578] ), [0:399,0:399] ))
overlay (scale( extend( img3[11375:11578,0:120], [11375:11968,-473:120] ), [0:399,0:399] )) )
FROM Hakon_Bathy AS img0, Hakoon_Dive1_8 AS img1, Hakoon_Dive2_8 AS img2, Hakoon_Dive2b_8 AS img3
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 24
EU FP7-INFRA
EarthServer
grant #283610
EarthServer: Big Earth Data Analytics
Mission: to enable standards-based ad-hoc analytics on
the Web for Earth science data
◦ scalable to Petabyte/Exabyte volumes
◦ directly manipulate, analyze & remix any-size geospatial data
Core idea: integrated query language for all spatio-
temporal coverage data
◦ standards based
◦ Server + clients
Starting point: the rasdaman Array DBMS
EU FP7-INFRA: operational services by end of project
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 25
EU FP7-INFRA
EarthServer
grant #283610
EarthServer Technical Approach
Starting point: rasdaman
Core DB Challenges:
◦ query distribution
◦ cloud virtualization
◦ In-situ databases
Clients, clients, clients!
◦ from mobile devices
to Web GIS
to high-end immersive VR
◦ Challenge: how to map query paradigm to user interaction?
standards
◦ WCS & WCPS + XQuery
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 26
EU FP7-INFRA
EarthServer
grant #283610
“coverages” based on ISO 19123, GML [OGC 09-146]
Web Coverage Service (WCS) [OGC 09-110] Core:
subset = trim | slice
further functionality in extension standards
Web Coverage Processing Service (WCPS) [OGC 08-068r2]
◦ Declarative geo raster processing language
Standards Perspective:
Open Geospatial Consortium
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 27
EU FP7-INFRA
EarthServer
grant #283610
WCPS By Example
"From MODIS scenes M1, M2, and M3, the absolute of the
difference between red and nir, in HDF-EOS"
for $c in ( M1, M2, M3 )
return
encode(
abs( $c.red - $c.nir ),
"hdf“
)
(hdfA,
hdfB,
hdfC)
27
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 28
EU FP7-INFRA
EarthServer
grant #283610
WCPS By Example
"From MODIS scenes M1, M2, and M3, the absolute of the
difference between red and nir, in HDF-EOS"
◦ …but only those where nir exceeds 127 somewhere
for $c in ( M1, M2, M3 )
where
some( $c.nir > 127 )
return
encode(
abs( $c.red - $c.nir ),
"hdf“
)
(hdfA,
hdfC)
28
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 29
EU FP7-INFRA
EarthServer
grant #283610
WCPS By Example
"From MODIS scenes M1, M2, and M3, the absolute of the
difference between red and nir, in HDF-EOS"
◦ …but only those where nir exceeds 127 somewhere
◦ …inside region R
for $c in ( M1, M2, M3 ),
$r in ( R )
where
some( $c.nir > 127 and $r )
return
encode(
abs( $c.red - $c.nir ),
"hdf“
)
(hdfA)
29
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 31
EU FP7-INFRA
EarthServer
grant #283610
EarthServer Lighthouse Applications
Each 100+ TB ultimately
front-end to existing archives - no new archives
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 32
EU FP7-INFRA
EarthServer
grant #283610
Partners
Activity leaders:
Findability Challenge ::Taormina :: 2012-may-10 EarthServer 33
EU FP7-INFRA
EarthServer
grant #283610
Conclusion
Core Issues in Earth science information services:
◦ Ad-hoc and „long-tail“ analysis, best near-realtime
◦ Unified handling of data & metadata
Large array support with „processing & filtering“
◦ Fusion: different data types, different locations; database or in-situ
◦ Scalability: query complexity, data complexity, data volume, users
EarthServer vision: „mix & match“multi-source, any-size
geo data
◦ integrated data/metadata QL;
scalability & query distribution; standards
◦ Operational services for all Earth science
domains
◦ www.earthserver.eu
[Dali]