Data discoveryonhadoop@yahoo! hadoopsummit2014

38
Data Discovery on Hadoop - Realizing the Full Potential of Your Data PRESENTED BY Thiruvel Thirumoolan, Sumeet Singh June 3, 2014 2014 Hadoop Summit, San Jose, California

description

 

Transcript of Data discoveryonhadoop@yahoo! hadoopsummit2014

Page 1: Data discoveryonhadoop@yahoo! hadoopsummit2014

Data D iscove ry on Hadoop - Rea l i z i ng the Fu l l Po ten t i a l o f You r Da ta

P R E S E N T E D B Y T h i r u v e l T h i r u m o o l a n , S u m e e t S i n g h ⎪ J u n e 3 , 2 0 1 4

2 0 1 4 H a d o o p S u m m i t , S a n J o s e , C a l i f o r n i a

Page 2: Data discoveryonhadoop@yahoo! hadoopsummit2014

Introduction

2 2014 Hadoop Summit, San Jose, California

Sumeet Singh Senior Director, Product Management Hadoop and Big Data Platforms Cloud Engineering Group

Thiruvel Thirumoolan Principal Engineer Hadoop and Big Data Platforms

Cloud Engineering Group

§  Developer in the Hive-HCatalog team, and active contributor to Apache Hive

§  Responsible for Hive, HiveServer2 and HCatalog across all Hadoop clusters and ensuring they work at scale for the usage patterns of Yahoo

§  Loves mining the trove of Hadoop logs for usage patterns and insights

§  Bachelors degree from Anna University

701 First Avenue, Sunnyvale, CA 94089 USA @thiruvel

§  Manages Hadoop products team at Yahoo!

§  Responsible for Product Management, Strategy and Customer Engagements

§  Managed Cloud Services products team and

headed Strategy functions for the Cloud Platform Group at Yahoo

§  M.B.A. from UCLA and M.S. from Rensselaer(RPI) 701 First Avenue, Sunnyvale, CA 94089 USA @sumeetksingh

Page 3: Data discoveryonhadoop@yahoo! hadoopsummit2014

Agenda

3

The Data Management Challenge 1

Apache HCatalog to Rescue 2

Data Registration and Discovery 3

Opening Up Adhoc Access to Data 4

Summary and Q&A 5

2014 Hadoop Summit, San Jose, California

Page 4: Data discoveryonhadoop@yahoo! hadoopsummit2014

Hadoop Grid as the Source of Truth for Data

4 2014 Hadoop Summit, San Jose, California

TV

PC

Phone

Tablet

Pushed Data

Pulled Data

Web Crawl

Social

Email

3rd Party Content

Data

Advertising

Content

User Profiles / No-SQL Serving Stores

Serving

Data Highway Feeds

Hadoop Grid

BI, Reporting, Adhoc Analytics

ILLUSTRATIVE

Page 5: Data discoveryonhadoop@yahoo! hadoopsummit2014

5 2014 Hadoop Summit, San Jose, California

34,000 servers

478 PB

0

100

200

300

400

500

600

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

2006 2007 2008 2009 2010 2011 2012 2013 2014

Raw

HD

FS S

tora

ge (i

n PB

)

Num

ber o

f Ser

vers

Year

Servers 1 Across all Hadoop (16 clusters, 32,500 servers, 455 PB) and HBase (7 clusters, 1,500 servers, 23 PB) clusters, May 23, 2014

Growth in HDFS1 1.25 billion files & dir

Page 6: Data discoveryonhadoop@yahoo! hadoopsummit2014

Processing and Analyzing Data with Hadoop…Then

6 2014 Hadoop Summit, San Jose, California

HDFS

MapReduce (YARN)

Pig Hive Java MR APIs

InputFormat/ OutputFormat

Load / Store SerDe

MetaStore Client

Hive MetaStore

Hadoop Streaming

Oozie

Page 7: Data discoveryonhadoop@yahoo! hadoopsummit2014

Processing and Analyzing Data with HBase…Then

7 2014 Hadoop Summit, San Jose, California

HDFS

HBase

Pig Hive Java MR APIs

TableInputFormat/ TableOutputFormat

HBaseStorage MetaStore Client

Hive MetaStore

HBaseStorage Handler

Oozie

Page 8: Data discoveryonhadoop@yahoo! hadoopsummit2014

Hadoop Jobs on the Platform Today

8 2014 Hadoop Summit, San Jose, California

100% (21.5 M)

1% 4%

9%

10%

31%

45%

All Jobs Pig Oozie Launcher

Java MR Hive GDM Streaming, distcp, Spark

Job Distribution (May 1 – May 26, 2014)

Page 9: Data discoveryonhadoop@yahoo! hadoopsummit2014

Challenges in Managing Data on Multi-tenant Platforms

9 2014 Hadoop Summit, San Jose, California

Data Producers

Platform Services

Data Consumers

§  Data shared across tools such as MR, Pig, and Hive

§  Schema and semantics knowledge across the company

§  Support for schema evolution and downstream change communication

§  Fine-grained access controls (row / column) vs. all or nothing

§  Clear ownership of data

§  Data lineage and integrity

§  Audits and compliance (e.g. SOX)

§  Retention, duplication, and waste

Data Economy Challenges

Apache HCatalog

& Data Discovery

Page 10: Data discoveryonhadoop@yahoo! hadoopsummit2014

Apache HCatalog in the Technology Stack at Yahoo

10 2014 Hadoop Summit, San Jose, California

Compute

Services

Storage

Infrastructure Services

Hive Pig Oozie HDFS Proxy GDM

YARN MapReduce

HDFS HBase

Zookeeper Support Shop Monitoring Starling Messaging

Service

HCatalog

Storm Spark Tez

Page 11: Data discoveryonhadoop@yahoo! hadoopsummit2014

HCatalog Facilitates Interoperability…Now

11 2014 Hadoop Summit, San Jose, California

HDFS

MapReduce (YARN)

Pig Hive Java MR APIs

InputFormat/ OutputFormat

SerDe & Storage Handler MetaStore

Client

HCatalog MetaStore

HCatInputFormat / HCatOutputFormat

HCatLoader/ HCatStorer

HDFS

HBase Notifications

Oozie

Page 12: Data discoveryonhadoop@yahoo! hadoopsummit2014

12 2014 Hadoop Summit, San Jose, California

Data Model

Database (namespace)

Table (schema)

Table (schema)

Partitions Partitions

Buc

kets

Buc

kets

Skewed Unskewed

Optional per table

Partitions, buckets, and skews facilitate faster, more direct access to data

Note on Buckets §  It is hard to guess the right number of buckets that can also change overtime, hard to coordinate and align for joins §  Community is working on dynamic bucketing that would have the same benefit without the need for static partitioning

Page 13: Data discoveryonhadoop@yahoo! hadoopsummit2014

Sample Table Registration

13 2014 Hadoop Summit, San Jose, California

Select project database USE  xyz;     Create table CREATE  EXTERNAL  TABLE  search  (  

bcookie  string    COMMENT  ‘Standard  browser  cookie’,  time_stamp  int    COMMENT  ‘DD-­‐MON-­‐YYYY  HH:MI:SS  (AM/PM)’,  uid  string      COMMENT  ‘User  id’,  ip  string    COMMENT  ‘...’,    pg_spaceid  string  COMMENT  ‘...’,    ...)  

PARTITIONED  BY  (  locale  string      COMMENT  ‘Country  of  origin’,    datestamp  string  COMMENT  ‘Date  in  YYYYMMDD  format’)  

STORED  AS  ORC  LOCATION  ‘/projects/search/...’;   Add partitions manually, (if you choose to) ALTER  TABLE  search  ADD  PARTITION  (  locale=‘US’,  datestamp=‘20130201’)    LOCATION  ‘/projects/search/...’;  

All your company’s data (metadata) can be registered with HCatalog irrespective of the tool used.

Page 14: Data discoveryonhadoop@yahoo! hadoopsummit2014

Getting Data into HCatalog – DML and DDL

14 2014 Hadoop Summit, San Jose, California

LOAD Files into tables Load operations are copy/move operations from HDFS or local filesystem that move datafiles into locations corresponding to HCat tables. File format must agree with the table format. LOAD  DATA  [LOCAL]  INPATH  'filepath'  [OVERWRITE]  INTO  TABLE  tablename    [PARTITION  (partcol1=val1,  partcol2=val2  ...)];  

INSERT data from a query into tables Query results can be inserted into tables of file system directories by using the insert clause. INSERT  OVERWRITE  TABLE  tablename1  [PARTITION  (partcol1=val1,  partcol2=val2  ...)  [IF  NOT  EXISTS]]  select_statement1  FROM  from_statement;    INSERT  INTO  TABLE  tablename1  [PARTITION  (partcol1=val1,  partcol2=val2  ...)]  select_statement1  FROM  from_statement;  

HCat also supports multiple inserts in the same statement or dynamic partition inserts.

ALTER TABLE ADD PARTITIONS

You can use ALTER TABLE ADD PARTITION to add partitions to a table. The location must be a directory inside of which data files reside. If new partitions are directly added to HDFS, HCat will not be aware of these. ALTER  TABLE  table_name  ADD  PARTITION  (partCol  =  'value1')  location  'loc1’;  

Page 15: Data discoveryonhadoop@yahoo! hadoopsummit2014

Getting Data into HCatalog – HCat APIs

15 2014 Hadoop Summit, San Jose, California

Pig HCatLoader is used with Pig scripts to read data from HCatalog-managed tables, and HCatStorer is used with Pig scripts to write data to HCatalog-managed tables.    A  =  load  '$DB.$TABLE'  using  org.apache.hcatalog.pig.HCatLoader();      B  =  FILTER  A  BY  $FILTER;      C  =  foreach  B  generate  foo,  bar;      store  C  into  '$OUTPUT_DB.$OUTPUT_TABLE'  USING  org.apache.hcatalog.pig.HCatStorer                                              ('$OUTPUT_PARTITION');  

 

MapReduce

The HCatInputFormat is used with MapReduce jobs to read data from HCatalog-managed tables. HCatOutputFormat is used with MapReduce jobs to write data to HCatalog-managed tables. Map<String,  String>  partitionValues  =  new  HashMap<String,  String>();  partitionValues.put("a",  "1");  partitionValues.put("b",  "1");  HCatTableInfo  info  =  HCatTableInfo.getOutputTableInfo(dbName,  tblName,  partitionValues);  HCatOutputFormat.setOutput(job,  info);        

Page 16: Data discoveryonhadoop@yahoo! hadoopsummit2014

HCatalog Integration with Data Mgmt. Platform (GDM)

16 2014 Hadoop Summit, San Jose, California

HCatalog MetaStore

Cluster 1 - Colo 1 HDFS

Cluster 2 – Colo 2 HDFS

Grid Data Management

Feed Acquisition

Feed Replication

HCatalog MetaStore

Feed datasets as partitioned external tables Growl extracts schema for backfill

HCatClient. addPartitions(…) Mark LOAD_DONE

HCatClient. addPartitions(…) Mark LOAD_DONE

Partitions are dropped with (HCatClient.dropPartitions(…)) after retention expiration with a drop_partition notification

add_partition event notification

add_partition event notification

Page 17: Data discoveryonhadoop@yahoo! hadoopsummit2014

HCatalog Notification

17 2014 Hadoop Summit, San Jose, California

Namespace:  E.g.  “hcat.thebestcluster”  

JMS  Topic:  E.g.  “<dbname>.<tablename>”  

Sample  JMS  Notification  {      "timestamp"  :  1360272556,      "eventType"  :  "ADD_PARTITION",      "server"        :  "thebestcluster-­‐hcat.dc1.grid.yahoo.com",      "servicePrincipal"  :  "hcat/thebestcluster-­‐[email protected]",      "db"                :  "xyz",      "table"          :  "search",      "partitions":  [                                        {  "locale"  :  "US",  "datestamp"  :  "20140602"  },                                        {  "locale"  :  "UK",  "datestamp"  :  "20140602"  },                                        {  "locale"  :  "IN",  "datestamp"  :  "20140602"  }                                  ]  }  

§  HCatalog uses JMS (ActiveMQ) notifications that can be sent for add_database, add_table, add_partition, drop_partition, drop_table, and drop_database

§  Notifications can be extended for schema change notifications (proposed)

HCat Client

HCat MetaStore

ActiveMQ Server

Register Channel Publish to listener channels

Subscribers

Page 18: Data discoveryonhadoop@yahoo! hadoopsummit2014

Oozie, HCatalog, and Messaging Integration

18 2014 Hadoop Summit, San Jose, California

Oozie

Message Bus

HCatalog

3. Push notification <New Partition>

2. Register Topic

4. Notify New Partition

Data Producer HDFS

Produce data (distcp, pig, M/R..)

/data/click/2014/06/02

1. Query/Poll Partition

Start workflow

Update metadata (ALTER TABLE click ADD PARTITION(data=‘2014/06/02’) location ’hdfs://data/click/2014/06/02’)

Page 19: Data discoveryonhadoop@yahoo! hadoopsummit2014

Data Discovery with HCatalog

19 2014 Hadoop Summit, San Jose, California

§  HCatalog instances become a unifying metastore for all data at Yahoo

§  Discovery is about

o  Browsing / inspecting metadata

o  Searching for datasets

§  It helps to solve

o  Schema knowledge across the company

o  Schema evolution

o  Lineage

o  Ownerships

o  Data type – dev or prod

Page 20: Data discoveryonhadoop@yahoo! hadoopsummit2014

Data Discovery Physical View

20 2014 Hadoop Summit, San Jose, California

Global View of

All Data in HCatalog

DC1-C1

DC1-C2

DCn-Cn

. . .

DC2-C1

DC2-C2

DCm-Cm

. . .

Discovery UI

Data Center 1 Data Center 2

HCat REST (Templeton)

HCat REST (Templeton)

HCat REST (Templeton)

HCat REST (Templeton)

HCat REST (Templeton)

HCat REST (Templeton)

ILLUSTRATIVE

Page 21: Data discoveryonhadoop@yahoo! hadoopsummit2014

Data Discovery Features

21 2014 Hadoop Summit, San Jose, California

§  Browsing o  Tables / Databases

o  Schema, format, properties

o  Partitions and metadata about each partition

§  Searches for tables

o  Table name (regex) or Comments

o  Column name or comments

o  Ownership, File format

o  Location

o  Properties (Dev/Prod)

Page 22: Data discoveryonhadoop@yahoo! hadoopsummit2014

Discovery UI

22 2014 Hadoop Summit, San Jose, California

Search Tables Search

The Best Cluster

audience_db  

tumblr_db  

user_db  

adv_warehouse  

flickr_db  

page_clicks   Hourly  clickstream  table  

ad_clicks   Hourly  ad  clicks  table    

user_info   User  registration  info  

session_info   Session  feed  info  

audience_info   Primary  audience  table  

GLOBAL HCATALOG DASHBOARD

Available Databases

Available Tables (audience_db)

Search the HCat tables

Browse the DBs by cluster

Search results or browse db results

1 2 Next 1 2 Next

ILLUSTRATIVE

Page 23: Data discoveryonhadoop@yahoo! hadoopsummit2014

Table Display UI

23 2014 Hadoop Summit, San Jose, California

ILLUSTRATIVE

GLOBAL HCATALOG DASHBOARD

HCat Instance The  Best  Cluster  

Database audience_db  

Table page_clicks  

Owner Awesome  Yahoo  

Schema

…more table information and properties (e.g. data format etc.)

Partitions

…list of partitions

Column Type Description

bcookie   string   Standard  browser  cookie  

timestamp   string   DD-­‐MON-­‐YYYY  HH:MI:SS  (AM/PM)  

uid   string   User  id  . . .

Page 24: Data discoveryonhadoop@yahoo! hadoopsummit2014

Data Discovery Design Approach

24 2014 Hadoop Summit, San Jose, California

§  A single web interface connects to all HCatalog instances (same and cross-colo)

§  Select an appropriate HCat instance and browse all metadata o  Each HCatalog instance runs a webserver (Templeton/ WebHCat) to read

metadata o  All reads audited o  ACL’s apply

§  Search functionality will be added to Templeton and HCatalog o  New Thrift interface to support search o  All searches audited o  ACL’s apply

§  Long term design o  Read and Write HCatalog instances

Page 25: Data discoveryonhadoop@yahoo! hadoopsummit2014

Data Discovery Going Forward

25 2014 Hadoop Summit, San Jose, California

§  Lineage o  Source datasets o  Derived datasets

§  Data Quality

o  Statistics help in heuristics instead of running a job

Table 1 / Partition 1

HBase

ORC Table Partition 1

Dimension Table

Statistics/ Agg. Table

Daily Stats Table

Copied by distcp / external registrar

Hourly

ILLUSTRATIVE

Page 26: Data discoveryonhadoop@yahoo! hadoopsummit2014

Data Discovery Going Forward (cont’d)

26 2014 Hadoop Summit, San Jose, California

ILLUSTRATIVE

Schema Column Type Description

bcookie   string   Standard  browser  cookie  

timestamp   string   DD-­‐MON-­‐YYYY  HH:MI:SS  (AM/PM)  

uid   string   User  id  

File Format ORC  

Table Properties Compression  

Type  

zlib  

External  

§  User ‘awesome_yahoo’

added ‘foo string’ to the table on May 29, 2014 at ‘1:10 AM’

§  User ‘me_too’ added table properties ‘orc.compress=ZLIB’ on May 30, 2014 at ‘9:00 AM’

§  User ‘me_too’ changed the file format from ‘RCFile’ to ‘ORC’ on Jun 1, 2014 at ‘10:30 AM’

.

.

.

. . .

Page 27: Data discoveryonhadoop@yahoo! hadoopsummit2014

HCatalog is Part of a Broader Solution Set

27 2014 Hadoop Summit, San Jose, California

Hive

HiveServer2

HCatalog

§  Data warehousing software that facilitates querying and managing large datasets in HDFS

§  Provides a mechanism to project structure onto HDFS data and query the data using a SQL-like language called HiveQL

§  Server process (Thrift-based RPC interface) to support concurrent clients connecting over ODBC/JDBC

§  Provides authentication and enforces authorization for ODBC/JDBC clients for metadata access

§  Table and storage management layer that enables users with different tools (Pig, M/R, and Hive) to more easily share data

§  Presents a relational view of data in HDFS, abstracts where or in what format data is stored, and enables notifications of data availability

Starling §  Hadoop log warehouse for analytics on grid usage (job history, tasks, job

counters etc.) §  1TB of raw logs processed / day, 24 TB of processed data

Product Role in the Grid Stack

Page 28: Data discoveryonhadoop@yahoo! hadoopsummit2014

28

Deployment Layout

Tez and MapReduce

on YARN +

HDFS Oracle DBMS

LoadBalancer

HCatalog

Thrift HS2

ODBC/JDBC Launcher Gateway

LoadBalancer

Data Out Client

Client/ CLI

HiveQL

M/R Jobs Pig M/R

Cloud Messaging

ActiveMQ notifications

HiveServer2

Hadoop

Hive

HCatalog

2014 Hadoop Summit, San Jose, California

Page 29: Data discoveryonhadoop@yahoo! hadoopsummit2014

29 2014 Hadoop Summit, San Jose, California

Hive for Both Batch and Interactive Adhoc Analytics

Tez §  Computation expressed as a dataflow graph

with reusable primitives §  No intermediate outputs to HDFS §  Built on top of YARN §  Hive generates Tez plans for lower latency Query Engine Improvements §  Cost-based optimizations §  In-memory joins §  Caching hot tables §  Vectorized processing Better Columnar Store §  ORCFile with predicate pushdown §  Built for both speed and storage efficiency Tez Service §  Always-on pool of AMs / container re-use

Improved Latency and Throughput

Analytics Functions §  SQL 2003 Compliant §  OVER with PARTITION BY and ORDER BY §  Wide variety of windowing functions:

o  RANK o  LEAD/LAG o  ROW_NUMBER o  FIRST_VALUE o  LAST_VALUE o  Many more

§  Aligns well with BI ecosystem

Improving SQL Coverage §  Non-correlated sub-queries using IN in

WHERE §  Expanded SQL types including DATETIME,

VARCHAR, etc.

Extended Analytical Ability

Page 30: Data discoveryonhadoop@yahoo! hadoopsummit2014

HiveServer2 as ODBC / JDBC Endpoint

§  Gateway that Hive clients can talk to

§  Supports concurrent clients

§  User/ global session/configuration information

§  Support for secure clusters and encryption

§  DoAs support allows Hive queries to run as the requester

30 2014 Hadoop Summit, San Jose, California

Page 31: Data discoveryonhadoop@yahoo! hadoopsummit2014

31 2014 Hadoop Summit, San Jose, California

Data to Desktop (D2D) – BI and Reporting on ODBC

HiveServer2

Hive

Hadoop

Desktop Web

Intelligence Server

Metadata Database

Grid ODBC driver

Page 32: Data discoveryonhadoop@yahoo! hadoopsummit2014

32 2014 Hadoop Summit, San Jose, California

DataOut – Data to Any Off-Grid Destination on JDBC

HiveSplit HiveSplit

HiveServer2 M

S

FS/DB

S

FS/DB

HiveSplit

S

FS/DB

Execute Query Prepare Splits

Fetch Splits

Legend: M – Master, S – Slave, FS/ DB – Filesystem/ Database

§  DataOut is an efficient method of moving data off the grid

§  Advantages: o  API based on well-known

JDBC interface

o  Works with HCatalog / Hive

o  Agnostic to the underlying

storage format

o  Parts of the whole data can

be pulled in parallel

Page 33: Data discoveryonhadoop@yahoo! hadoopsummit2014

SQL-based Authorization for Controlled Access

33 2014 Hadoop Summit, San Jose, California

§  SQL-compliant authorization model (Users, Roles, Privileges, Objects)

§  Fine-grain authorization and access control patterns (row and column in conjunction with views)

§  Can be used in conjunction with storage-based authorization

Privileges Access Control §  Objects consist of databases, tables,

and views

§  Privileges are GRANTed on objects

o  SELECT: read access to an object

o  INSERT: write (insert) access to an object

o  UPDATE: write (update) access to an object

o  DELETE: delete access for an object

o  ALL PRIVILEGES: all privileges

§  Roles can be associated with objects

§  Privileges are associated with roles

§  CREATE, DROP, and SET ROLE statements manipulate roles and membership

§  SUPERUSER role for databases can grant access control to users or roles (not limited to HDFS permissions)

§  PUBLIC role includes all users

§  Prevents undesirable operations on objects by unauthorized users

Page 34: Data discoveryonhadoop@yahoo! hadoopsummit2014

Starling (Log Warehouse) for Historical Analysis and Trends

34 2014 Hadoop Summit, San Jose, California

Cluster 1 Cluster 2 Cluster 3 Cluster N

Oozie

HCatalog HDFS

Hive

Starling Dashboard

Discovery Portal

Query Server

Sour

ce

Clu

ster

s

War

ehou

se

Clu

ster

s

Page 35: Data discoveryonhadoop@yahoo! hadoopsummit2014

35 2014 Hadoop Summit, San Jose, California

SQL on Hadoop the Fastest Growing Product on Grid

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

8.0%

9.0%

10.0%

0

5

10

15

20

25

30

Mar-13 Apr-13 May-13 Jun-13 Jul-13 Aug-13 Sep-13 Oct-13 Nov-13 Dec-13 Jan-14 Feb-14 Mar-14 Apr-14 May-14

Hiv

e Jo

bs (%

of A

ll Jo

bs)

All

Grid

Job

s (in

Mill

ions

)

All Jobs Hive (% of all jobs)

2.5 million queries

Page 36: Data discoveryonhadoop@yahoo! hadoopsummit2014

In Summary

36 2014 Hadoop Summit, San Jose, California

Data shared across tools such as MR, Pig, and Hive Apache HCatalog

Schema and semantics knowledge across the company Data Discovery

Support for schema evolution and downstream change communication Apache HCatalog

Fine-grained access controls (row / column) vs. all or nothing

SQL-based Authorization

Clear ownership of data Data Discovery

Data lineage and integrity Data Discovery / Starling

Audits and compliance (e.g. SOX) Data Discovery / Starling

Retention, duplication, and waste Data Discovery / Starling

Page 37: Data discoveryonhadoop@yahoo! hadoopsummit2014

Acknowledge

37 2014 Hadoop Summit, San Jose, California

1 Apache Hive (and HiveServer2, HCatalog) Community

http://hive.apache.org/people.html

2 HCatalog and Hive Development Team at Yahoo

Olga Natkovich Annie Lin Fangyue Wang

Chris Drome Jin Sun Selina Zhang

Mithun Radhakrishnan Viraj Bhat

3 Oozie Development Team

Rohini Palaniswamy Ryota Egashira Purshotam Shah

Mona Chitnis Michelle Chiang

4 Grid Data Management (GDM) Team

Mark Holderbaugh Aaron Gresch Lawrence Prem Kumar

Scott Preece Yan Braun

5 Service Engineering and Data Operations

Rob Realini David Kuder Chuck Sheldon

Rajiv Chittajallu Vineeth Vadrevu Andy Rhee

6 Product Management

Sid Shaik Amrit Lal Kimsukh Kundu

Page 38: Data discoveryonhadoop@yahoo! hadoopsummit2014

Thank You @thiruvel @sumeetksingh

We are hiring! Stop by Kiosk P9 or reach out to us at [email protected].