How to Create the Google for Earth Data (XLDB 2015, Stanford)

XLDB CONFERENCE 2015, STANFORD UNIVERSITY MAY 2015

How To Create the Google for Earth Data

Rainer SternfeldCEO & co-founder

in the example of NOAA Big Data Project

2 MAY 2015XLDB CONFERENCE 2015, STANFORD UNIVERSITY

Finding and accessing the right data is really hard Planet OS Data Discovery makes it easy

Crawl the web and index the most recent data without moving the data itself until you need it

3 August 2014

Cloud Platform for Industrial Sensor Networks

Building with: Scala, Akka RabbitMQ, Kafka HDFS, HBase, Elasticsearch Spark, Spark Streaming GIS Libraries (raster, geometry)Compressed array formats

Sensor Data Discovery & Exchange Search, Exploration Visualisation, Analytics Data Management Marketplace APIs

marinexplore.org — the biggest deployment of Planet OS 43,000+ data streams, 35 organizations, 8,000 users

Data Discovery Raster data, heatmap overlays Access with third party applications

Raster data with quiver plot overlays Rich graph visualizations Build custom datasets

5 April 2015

Marine data website Visualising Browsing Filtering Export

Built with: Python/Cython RabbitMQ, Celery Postgres, PostGIS, VerticaGIS Libraries (raster, geometry) NumPy

marinexplore.org

How to make NOAA’s large-scale weather and climate data easily discoverable and machine-readable?

Real-time weather & climate data is a global multi-billion dollar opportunity

NOAA’s Challenge

• Tens of thousands of devices deployed in the ocean, on land, and space

• Critical for the government and industries

• Hundreds scattered web services (FTPs, flat files, THREDDS/OPeNDAP API)

• Data grows 10TB+ per day

• 26,595 NOAA datasets with ISO-19139 metadata

• 4,894 NOAA datasets with OpenDAP interface

So what’s wrong with how it’s done now?

An enthusiastic young researcher starts downloading data to an

external HDD connected to his laptop — data keeps coming,

external HDDs pile up…

the beard’s getting longer and longer, and when the data is

finally downloaded, we have a middle-aged, bored professor

Technical Challenges in the NOAA Big Data Project

• Storing, processing, and indexing spatio-temporal time-series & array data

• Processing data at 10s (100s) of TB/day

• Transporting and processing archives at volumes 10s (100s) of PB

• Disseminating real-time data at latency of minutes

• Indexing 100K+ logical datasets and 100M+ technical datasets/files

• Providing uniform API/export for various data formats/protocols/projections

Potential solutions under consideration

• Indexing spatio-temporal and semantic metadata

• Indexing downsampled remote datasets acquired via OpenDAP and others

• Store chunked array data (MBs) in object store (e.g. Amazon S3)

• Provide on-demand computational infrastructure for analyzing data (e.g. Amazon)

What would make it even better?

• Incremental data compression

• BitTorrent-like data dissemination

• Sending pre-filtered data to consumers (e.g. by area)

• Computations scheduled next to the data storage

• Fast interconnect (10 GBit / Inifiniband) and GPUs

• Run analytical scripts (eg IPython Notebook, Matlab) to

work with array data in the cloud

DATA EXCHANGE

APIsPLANET OS

ALGORITHMS

3RD PARTY INTEGRATIONS

DISCOVER

COLLABORATE

VISUALIZE

ENTERPRISE DATA

ALL PUBLIC DATA

MODELS APPS

ANALYZE

We index your world

How to Create the Google for Earth Data (XLDB 2015, Stanford)

Data & Analytics

Transcript of How to Create the Google for Earth Data (XLDB 2015, Stanford)

Building Software Systems at Google and Lessons …research.google.com/people/jeff/Stanford-DL-Nov-2010.pdfBuilding Software Systems at Google and Lessons Learned Jeff Dean jeff@google.com

1 Jacek Becla, XLDB-Europe, CERN, May 2013 LSST Database Jacek Becla.

Privacy, Discovery, and Authentication for the Internet of ...for the Internet of Things David J. Wu Stanford University Ankur Taly Google Asim Shankar Google Dan Boneh Stanford University.

Privacy, Discovery, and Authentication for the Internet of ... · Ankur Taly Google Asim Shankar Google Dan Boneh Stanford University. The Internet of Things (IoT) Lots of smart devices,

Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)

1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com.

Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.

Jacek Becla SLAC Technology Officer for Scientific Databases XLDB Chairman

PrasadL06IndexConstruction1 Index Construction Adapted from Lectures by Prabhakar Raghavan (Google) and Christopher Manning (Stanford)

Maya R. Gupta, Google - Stanford University · Multi‐Task Averaging: Theory and Practice Maya R. Gupta, Google Research, Univ. Washington 1 Sergey Feldman Univ. Washington Bela

Proﬁting from Mark-Up - Stanford NLP Groupnlp.stanford.edu/pubs/markup-slides.pdf · Spitkovsky et al. (Stanford & Google) Proﬁting from Mark-Up ACL (2010-07-14) 4 / 33. Overview

Ruth Kedar: Google, Playing Cards, and Digital Art · The Google Logo In her last year of teaching at Stanford, Ruth Kedar was commissioned by Google to design their logo and their

XLDB South America Keynote: eScience Institute and Myria

1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…

Architectures for Data Commons (XLDB 15 Lightning Talk)

Google App Engine Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008.

2008 11 14 Google Oss Stanford

PrasadL1IntroIR1 Information Retrieval Adapted from Lectures by Berthier Ribeiro-Neto (Brazil), Prabhakar Raghavan (Google and Stanford) and Christopher.

Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

SANTA CLARA, CALIFORNIA · 2017-04-17 · LinkedIn Microsoft HQ Google HQ Facebook Stanford University Hewlett Packard HQ Raytheon ... 1500 N Shoreline Legacy at 101 Google in Mo˚ett