1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com.

Post on 21-Dec-2015

218 views 2 download

Tags:

Transcript of 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com.

104/18/23

XLDB ‘09

Luke Lonerganllonergan@greenplum.com

“Big” numbers for GP today

• 70K/day - Query Rate • 6.5PB – Dataset Size • +100GB/s – Analysis Rate • +3GB/s – Net Loading Rate • 100,000/s – Transaction Rate• 56 TB / kW, 1.6 GB/s/kW – Power Rate• 100s – Number of Data/Compute nodes

04/18/23 2

Things I’ve Heard

• Tiered computing– Organizational / Political / Geographic

boundaries require it

• Metadata computing for HEP– “10TB sounds small but it’s not easy”

• Processing for Radio Astronomy, HEP– Data intensive computing– Requires an efficient pipeline from raw to

consumables

04/18/23 3

Thoughts

• A lot of plumbing! Moving data around, pipeline processing– Core engine should do this so the plumbing

isn’t done over and over

• Need for specialized access methods and storage classes

• “Computing in data” is key to success

04/18/23 4

GP Basic Features

• Access Methods– Compression, Column Store, Heap Store,

External Tables, Indexes (GIST, GIN, Rtree, Bitmap, B-Tree, …)

– Network Ingest / Export directly into parallel pipeline

– Logical Partitioning by Range, List

• Parallel Programming Languages– SQL 2003 with Analytics– Map Reduce in Perl, Python, C, SQL, …– PL/R,python,perl,C,pgSQL,SQL, …

04/18/23 5

From Enterprise Data Clouds

• Elastic / adaptive infrastructure for data warehousing and analytics

– IT Operations deploy pools of low-cost commodity infrastructure

• Physical servers, virtual infrastructure, or onramp to public cloud

– DBAs and Analysts provision sandboxes and warehouses in minutes

• Assemble the data they need (common, private, etc) for agile analytics

04/18/23 6 Proprietary & Confidential

DBA

Analyst

ConsumerDivision

PackagedGoods

Finance

4040

881616 1616

120Free 1616 1616

68Free

9696 4040 64Free

Infrastructure

Warehouses

IT Operations

Use Case: Big TelcoData Mart Consolidation

04/18/23 7 Proprietary & Confidential

Goals:•Reduce maintenance and support costs from proliferation of data mart platforms

•Reduce risks and exposure due to data in shadow IT systems

•Break down silo walls - provide a unified way to find and access all data

Approach:•Embrace data – encourage ‘physical consolidation’ in advance of data model unification

•Provide ‘self serve’ model to bring shadow IT into the light

•Allow unified data access and pragmatic ‘logical’ data model unification incrementally

DataSources

US- West100 nodes

XX

X

X

XX

X

X

X

Use Case: Big Ad NetworkProject Sandboxes

04/18/23 8 Proprietary & Confidential

Goals:•Remove IT barriers to analyst productivity and value creation

•Dramatically reduce IT resource constraints and delays – i.e. realize ideas sooner

•Combine centralized ‘EDW’ data with freshly discovered feeds and other useful sources

Approach:•Self-serve creation of project warehouses in minutes – and elastically expand as needed

•Load new data feeds without requiring formal modeling

•Bring together any data within the EDC – even if globally distributed – and analyze

US- East100 nodes

Analyst’s New Warehouse

Analyst’s New Warehouse

Analyst’s Private

Data Feed

Analyst’s Private

Data Feed

EDC

Self-ServeDashboard

GP is Software – Develop Now

• Download at:– Gpn.greenplum.com– Get the VMWare image or use it on OSX, Linux,

Solaris

04/18/23 9

Think Big. Think Fast.