BIPD Tech Tuesday Presentation - Qubole

9

Click here to load reader

Transcript of BIPD Tech Tuesday Presentation - Qubole

Page 1: BIPD Tech Tuesday Presentation - Qubole

Qubole Click to Query your Big Data on the Cloud

Page 2: BIPD Tech Tuesday Presentation - Qubole

A company like Facebook provides Data infrastructure as a service (created by the founders of Qubole)

- More than 30% of the company uses this infrastructure every month

- Users range from developers, analysts, business analysts or business users

- Manages over an Exabyte of data

- Has made the company more data driven and agile with data use

-It took the founders a team of over 30 people to create this infrastructure and currently the team managing this infrastructure has more than 100 people

2

Operations Analyst

Marketing Ops

Analyst

Data Architect

Business Users

Product Support

Customer Support

Developer

Sales Ops

Product Managers

Data Infrastructure

QUBOLE VISION DATA FOR ALL CLICK-T0-QUERY

Page 3: BIPD Tech Tuesday Presentation - Qubole

3

~ 170+ PB of data processed per month

10 – 3000 node clusters on a daily basis

300,000 machines per month

20,000 jobs on a daily basis

AGILITY TIME-TO-INSIGHT CLICK-T0-QUERY

Page 4: BIPD Tech Tuesday Presentation - Qubole

CONFIDENTIAL. SUBJECT TO NDA PROVISIONS.

Industries and Use Cases

Media & Advertising

Oil & Gas Retail Life Sciences Financial Services

SecuritySocial

Networking & Gaming

Targeted Advertising

Seismic Analysis

Image and Video

Processing

Customer Profile

Transaction Analysis

Genome Analysis

Monte Carlo Simulations

Risk Analysis

Fraud Detection

Anti-virus

Image Recognition

In-game Metrics

Usage Analysis

User Demographics

Predefined Reporting

Ad Hoc Analytics

Statistical Analytics

Predictive Analytics

Machine Learning MapReduce Streaming

Workload Classifications

Match Your Processing Engines to Your Workload ParametersSQL Data Pipeline MapReduce Spark NoSQL Store

Page 5: BIPD Tech Tuesday Presentation - Qubole

AGILITY TIME-TO-INSIGHT CLICK-T0-QUERY

55

• 10-1000+ Nodes in <5min • Flexible - different nodes for different loads • Data For All - usable by many • Low TCO - Only ON when needed

• Extensive planning required - Inflexible and Static. • Not built for Cloud. • Need Hadoop experts to install, maintain and use. • High TCO - Always ON

Page 6: BIPD Tech Tuesday Presentation - Qubole

Qubole UI via Browser

SDK

ODBC

User Access

Qubole’sAWS Account

Customer’s AWS Account

REST API (HTTPS)

SSH

Ephemeral Hadoop Clusters, Managed by Qubole

Slave

Master

Data Flow within Customer’s AWS

(optional) Other RDS, Redshift

Ephemeral Web Tier

Web Servers

Encrypted Result Cache

Encrypted HDFS

Slave

Encrypted HDFS

RDS – Qubole User, Account Configurations

(Encrypted credentials

Amazon S3 No HDFS Load

w/S3 Server Side Encryption

Default Hive Metastore

Encryption Options: a) Qubole can encrypt the result cache b) Qubole supports encryption of the ephemeral drives used for HDFS c) Qubole supports S3 Server Side Encryption

(c)

(b)

(a)

(optional) Custom

Hive Metastore

SSH

BUILT FOR CLOUD PERFORMANCE COST-EFFICIENT

Ephemeral Clusters: • Auto-Scaling - both up and down • Spot Instances - data management and back-fill • VMs deployed with awareness of time

Page 7: BIPD Tech Tuesday Presentation - Qubole

Demo

7

Page 8: BIPD Tech Tuesday Presentation - Qubole

Why Qubole?

8

“Qubole has enabled more users within Pinterest to get to the data and has made the data platform lot more scalable and

stable”

Mohammad Shahangian - Lead, Data Science and Infrastructure

Moved to Qubole from Amazon EMR because of stability and rapidly expanded big data usage by giving access to data to users beyond developers.

Rapid expansion of big data beyond developers (240 users out of 600 person company)

Use CasesUser and Query Growth

Rapid expansion in use cases ranging from ETL, search, adhoc querying, product analytics etc.

Rock solid infrastructure sees 50% less failures as compared to AWS Elastic Map/Reduce

Enterprise scale processing and data access

Page 9: BIPD Tech Tuesday Presentation - Qubole

Why Qubole?

9

“We needed something that was reliable and easy to learn, setup, use and put into production without the risk and high

expectations that comes with committing millions of dollars in upfront investment. Qubole was that thing.”

Marc Rosen - Sr. Director, Data Analytics

Moved to Big data on the cloud (from internal Oracle clusters) because getting to analysis was much quicker than operating infrastructure themselves. Used to answer client queries and power client dashboards.

Use Cases# Commands Per Month

0

1250

2500

3750

5000

Aug-13

Sept-13

Oct-13

Nov-13

Dec-13

Jan-14

Feb-14

Number of queries Segment audiences based on their behavior including such topics as user pathway and multi-dimensional recency analysis

Build customer profiles (both uni/multivariate) across thousands of first party (i.e., client CRM files) and third party (i.e., demographic) segments

Simplify attribution insights showing the effects of upper funnel prospecting on lower funnel remarketing media strategies