Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura...
-
date post
21-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura...
![Page 1: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/1.jpg)
Smoothing the ROI Curve for Scientific Data Management Applications
Bill Howe
David Maier
Laura Bright
![Page 2: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/2.jpg)
Bill Howe, CMOP @ OGI @ OHSU 2
Motivation
“Physical Scientists aren’t using databases!”
who don’t know Jim Gray
![Page 3: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/3.jpg)
Bill Howe, CMOP @ OGI @ OHSU 3
ROI Shape as Success Indicator
time (months)
Cu
mu
lati
ve R
OI
single-release
multi-release
continuous-release
T = Time spent on non-science data tasks
ROI(X) = T(status quo) – T(X)
![Page 4: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/4.jpg)
Bill Howe, CMOP @ OGI @ OHSU 4
Ironing the ROI Curve
Rubrics: Pay-as-you-go (“earn as you learn”?) Let many flowers blossom
• Postpone or obviate selection between competing solutions
Specialize to the current instance• “Extreme schema design”
Strive for zero configuration• Don’t replace simple programming with complex configuration
Operate on in-situ data• Let them keep their files, at least initially
Goal: Transformative services … by 5:00 pm
![Page 5: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/5.jpg)
5
Example: Environmental Observation and Forecasting System
Downloaded forcings: Atmosphere, River,
Global Ocean
Observations via Sensor Networks Circulation Models
Data Products
1M files; some DBs
-Datasets-Scripts-Data products-Configuration files-Log files-Annotations
…/anim-sal_estuary_7.gif
![Page 6: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/6.jpg)
6
Harvesting (Prop,Val) pairs
7.5M triples describing 1M files
path prop value
…/anim-sal_estuary_7.gif variable salt
Variable = “salt”
…/anim-sal_estuary_7.gif type anim
Type = “Animation”
…/anim-sal_estuary_7.gif region estuary
Region = “Estuary”
…/anim-sal_estuary_7.gif depth 7
Depth = “7”
…/anim-sal_estuary_7.gif
![Page 7: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/7.jpg)
Bill Howe, CMOP @ OGI @ OHSU 7
Example: Quarry
![Page 8: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/8.jpg)
Bill Howe, CMOP @ OGI @ OHSU 8
Example: Quarry (2)
![Page 9: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/9.jpg)
Bill Howe, CMOP @ OGI @ OHSU 9
Example: Quarry (3)
![Page 10: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/10.jpg)
Bill Howe, CMOP @ OGI @ OHSU 10
Example: Quarry (4)
![Page 11: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/11.jpg)
Bill Howe, CMOP @ OGI @ OHSU 11
Example: Quarry (5)
![Page 12: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/12.jpg)
Bill Howe, CMOP @ OGI @ OHSU 12
Quarry: Summary
Browse-oriented rather than query-oriented narrow API (GetProperties, GetValues, a few others) interactive performance
No time for thorough schema design; data owners just write scripts emitting (resource, prop, value) triples
Derive a schema automatically Simple API insulates apps from this dynamic schema
specialize to the current instance
near-zero configuration
pay-as-you-go
in situ data
![Page 13: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/13.jpg)
Bill Howe, CMOP @ OGI @ OHSU 13
Experimental Results: Queries
3.6M triples606k resources149 signatures
![Page 14: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/14.jpg)
Bill Howe, CMOP @ OGI @ OHSU 14
Example: Foreman
~20 daily forecasts of coastal regions worldwide; expected to grow to 100+
“Factory” metaphor for managing the daily runs
Harvest existing log files Permute existing inputs to
add value
zero configuration
in situ data
let many flowers blossom
Bright, Maier, CIDR 2005
Bright, Maier, SSDBM 2005
Bright, Maier, Howe, SciFlow 2006
![Page 15: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/15.jpg)
Bill Howe, CMOP @ OGI @ OHSU 15
Foreman
Number of timestepsdoubles
cascadingdelays
?
![Page 16: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/16.jpg)
Bill Howe, CMOP @ OGI @ OHSU 16
Other Examples
Incremental deployment of an algebra for simulation results
Automatically generated access methods for ad hoc file formats
Howe, Maier, Data Eng. Bulletin 2004
Howe, Maier, SSDBM 2005
Howe, Maier, VLDB 2004
Howe, Maier, VLDB Journal 2005
![Page 17: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/17.jpg)
Bill Howe, CMOP @ OGI @ OHSU 17
Acknowledgements
Thanks to Antonio Baptista and Paul Turner
http://www.stccmop.org
![Page 18: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/18.jpg)
Bill Howe, CMOP @ OGI @ OHSU 18
Foreman Screenshot
![Page 19: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/19.jpg)
Bill Howe, CMOP @ OGI @ OHSU 19
Experimental Results
Yet Another RDF Store (YARS) Several B-Tree indexes:
• rpv _, pv r, vr p, etc. authors report good performance against
Redland and Sesame • ~3M triples, single term queries
We investigate simple multi-term queries ?s <p0> <o0>?s <p1> <o1>:?s <pn> <on>
![Page 20: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/20.jpg)
Bill Howe, CMOP @ OGI @ OHSU 20
Quarry Architecture
3. db filesystem2. triples
1. Collection scripts
website
4. derive schema
5. publish 6. query and browse via signatures
![Page 21: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/21.jpg)
Bill Howe, CMOP @ OGI @ OHSU 21
A Narrower Interface
specialized schema
filesystem
SQL statementsDatabase APIsLoad Strategies
Data formats/models
RDF triples
Collection scripts
generic schema
filesystem
![Page 22: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/22.jpg)
Bill Howe, CMOP @ OGI @ OHSU 22
Computing Signatures
r0 p0 v(0,0)r2 p1 v(2,1)r0 p2 v(0,2)r0 p1 v(0,1)
r0 p0p1p2
r1 p1r1 p3 v(1,3) p3
r0 p0, p1, p2 v(0,0), v(0,1), v(0,2)r1 p1, p3 v(1,1), v(1,3)
v(0,0)v(0,1)v(0,2)v(1,1)v(1,3)
hash(S0)hash(S1)
r1 p1 v(1,1)r2 p3 v(2,3)
r2 p1p3
v(1,1)v(1,3)
r2 p1, p3 v(1,1), v(1,3)hash(S2)
External Sort
Nest
![Page 23: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/23.jpg)
Bill Howe, CMOP @ OGI @ OHSU 23
Computing Signatures
r0p0, p1, p2
r1
p1, p3hash(S0)hash(S1)
r2
v(0,0) v(0,1) v(0,2)
v(1,1) v(1,3)v(1,1) v(1,3)
rsrc p1 p3
rsrc p0 p1 p2signaturesighash
hash(S1)
hash(S0)signatures
r0p0, p1, p2 v(0,0), v(0,1), v(0,2)r1p1, p3 v(1,1), v(1,3)
hash(S0)hash(S1)
r2 v(1,1), v(1,3)
![Page 24: Smoothing the ROI Curve for Scientific Data Management Applications Bill Howe David Maier Laura Bright.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649d605503460f94a40763/html5/thumbnails/24.jpg)
Bill Howe, CMOP @ OGI @ OHSU 24
Quarry API: Canonical Application
p
v
all unique properties
all unique values of parent property
all properties of resources satisfying p=v
Every path from a root represents a conjunctive query