Critical Breakthroughs and Challenges in Big Data and Analytics

57
Critical Breakthroughs and technical Challenges in Big Data Driven Innovation Paolo Spreafico Head of EMEA Data Solution Engineers, Google Cloud Platform

Transcript of Critical Breakthroughs and Challenges in Big Data and Analytics

Page 1: Critical Breakthroughs and Challenges in Big Data and Analytics

Critical Breakthroughs and technicalChallenges in Big Data Driven Innovation

Paolo Spreafico

Head of EMEA Data Solution Engineers, Google Cloud Platform

Page 2: Critical Breakthroughs and Challenges in Big Data and Analytics

Google Cloud Platform 2

Organize the world’s information and make it universally accessible and useful.Google’s Mission

2

Page 3: Critical Breakthroughs and Challenges in Big Data and Analytics

#cloudconf2016

Page 4: Critical Breakthroughs and Challenges in Big Data and Analytics

#cloudconf2016

Page 5: Critical Breakthroughs and Challenges in Big Data and Analytics

Google Cloud Platform 5

By 2020, there will be 8 Billion connected smart phones

Source: Boston Consulting Group : The Mobile Revolution: How Mobile Technologies Drive a Trillion-Dollar ImpactIDC, 2015

— 2X more than today.And 32 Billion connected “IOT” devices

— 6X more than today.

Page 6: Critical Breakthroughs and Challenges in Big Data and Analytics

Building what’s next 6

Source: IDC

increase in data (4ZB to 45ZB)

connected devices

of data “touched” by the cloud

40%35B10x

Page 7: Critical Breakthroughs and Challenges in Big Data and Analytics

OrganisationData Questions

Tech

nolo

gy

Data is key (among others)

“Companies in the top third of their industry in the use of data- driven decision making were, on average, 5% more productive and 6% more profitable than their competitors.”

Andrew McAfee and Erik Brynjolfsson, MIT

Page 8: Critical Breakthroughs and Challenges in Big Data and Analytics

What does Cloud 3.0 look like?

Page 9: Critical Breakthroughs and Challenges in Big Data and Analytics

Google Cloud Platform 9

Storage Processing Memory Network

Single-node computing“Some assembly required”

True, on-demand cloud

An actual, global elastic cloud

Cloud 3.0

Invest your energy in great apps

Colocation

Your kit, someone else’s building.

Yours to manage.

Cloud 1.0Today's Cloud:

Virtualized Data Centers

Standard virtual kit, for rent. Still yours

to manage.

Cloud 2.0

Aut

omat

ion

Google Cloud Platform Vision

Messaging Big Data Containers NoSQL

Page 10: Critical Breakthroughs and Challenges in Big Data and Analytics

http://googleasiapacific.blogspot.se/2015/06/growing-our-data-center-in-singapore.html

For the past 17 years, Google has been building out the fastest, most powerful, highest quality cloud infrastructure on the planet.

Page 11: Critical Breakthroughs and Challenges in Big Data and Analytics

Edge locations in virtually every country in the world

Our Network

Page 12: Critical Breakthroughs and Challenges in Big Data and Analytics

77Peering locations

Page 13: Critical Breakthroughs and Challenges in Big Data and Analytics

10+ Years of Tackling Big Data Problems

Google Cloud Platform 13

Google Papers

20082002 2004 2006 2010 2012 2014 2015

GFS MapReduce

Flume Java Millwheel

OpenSource

2005

GoogleCloudProducts BigQuery Pub/Sub Dataflow Bigtable

BigTable Dremel PubSub

Apache Beam

Tensorflow

Page 14: Critical Breakthroughs and Challenges in Big Data and Analytics
Page 15: Critical Breakthroughs and Challenges in Big Data and Analytics

Google’s Data Services for everyone

Page 16: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential + Proprietary

Storage and Databases

Cloud Storage

The Google Cloud data toolbox

Cloud SQL

Cloud Bigtable

Cloud Datastore

Big Data and Analytics

BigQuery

Cloud Pub/Sub

Cloud Dataflow

Cloud Dataproc

Cloud Datalab

Machine Learning

Cloud Machine Learning

Cloud Translate API

Cloud Vision API

Cloud Speech API

Page 17: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential + Proprietary

A common configuration: draw conclusions

Events, metrics, etc.

Stream

Batch

Spreadsheets

BI Tools

Coworkers

Applications and Reports

Cloud Datalab

Visualization and BI

Co-workers

Batch

B CA

Raw logs, files, assets, Google

Analytics data etc.

A serverless big data stackthat scales automatically

Page 18: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential & ProprietaryGoogle Cloud Platform 18

Complexities of Big Data ProcessingProgramming

Resource provisioning

Performance tuning

Monitoring

ReliabilityDeployment & configuration

Handling growing scale

Utilization improvements

Time to Understanding

Typical Big Data Processing

Page 19: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential & ProprietaryGoogle Cloud Platform 19

Spend Time on ‘What’ not ‘How’

Time to Understanding

Big Data Processing with Google Cloud Platform

Programming

More time to dig into your data

Page 20: Critical Breakthroughs and Challenges in Big Data and Analytics

Cloud 3.0 Big Data Lifecycle

Cloud Logs

Google App Engine

Google Analytics Premium

Cloud Pub/Sub

BigQuery Storage(tables)

Cloud Bigtable(NoSQL)

Cloud Storage(files)

Cloud Dataflow

BigQuery Analytics(SQL)

Capture Store Analyze

Batch

Process

Stream

Cloud Monitoring

Real-time analytics

Cloud Dataflow

Cloud ML

Real-timedashboard

Real-timealerts

Use

DataScientists

Analysts

Smartapps

Catalog & Data Lifecycle Automation

Cloud Datalab

Cloud Dataproc

Data Studio

Page 21: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential & ProprietaryGoogle Cloud Platform 21

Emerging Big Data Challenges

Real-timedata ingestion

Machine learningat scale

Batch or streaming?

Analytics at the speed of thought

Page 22: Critical Breakthroughs and Challenges in Big Data and Analytics

Batch or Streaming?Why do you have to choose?

Breakthrough #1

Page 23: Critical Breakthroughs and Challenges in Big Data and Analytics

Google Cloud Platform Confidential & Proprietary 23

We don’t really use MapReduce anymoreUrs Hölzle

SVP TechnicalInfrastructure Google

“ ”

Page 24: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential + Proprietary

A common configuration: capturing input

Cloud Pub/SubReliable, many-to-many, asynchronous messaging

Cloud StoragePowerful, simple and cost-effective object storage

Raw logs, files, assets, Google

Analytics data etc.

Events, metrics, etc.

Page 25: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential + Proprietary

A common configuration: process and transform

Events, metrics, etc.

Cloud DataflowData processing engine forbatch and stream processing

Stream

Batch

Raw logs, files, assets, Google

Analytics data etc.

Page 26: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential + Proprietary

A common configuration: process and transform

Events, metrics, etc.

Cloud DataflowData processing engine forbatch and stream processing

Stream

Batch

Cloud DataprocManaged Spark and Hadoop

Batch

Raw logs, files, assets, Google

Analytics data etc.

Page 27: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential + Proprietary

A common configuration: analyze and store

Events, metrics, etc.

Stream

Batch

BigQueryExtremely fastand cheap on-demandanalytics engine

BigtableHigh performance NoSQL database for large workloadsBatch

Raw logs, files, assets, Google

Analytics data etc.

Page 28: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential + Proprietary

A common configuration: draw conclusions

Events, metrics, etc.

Stream

Batch

Spreadsheets

BI Tools

Coworkers

Applications and Reports

Cloud Datalab

Visualization and BI

Co-workers

Batch

B CA

Raw logs, files, assets, Google

Analytics data etc.

Page 29: Critical Breakthroughs and Challenges in Big Data and Analytics

Real-time data ingestion(and at scale)

Breakthrough #2

Page 30: Critical Breakthroughs and Challenges in Big Data and Analytics

Google confidential │ Do not distribute

Overview:Data to process: Data in the Consolidated Audit Trail (CAT). A data repository of all equities and options orders, quotes, and events

Challenges:How to process the CAT and organize 100 billion market events into an “order lifecycle” in a 4 hour windowStore 6 years (~30PB) of data

Cloud Bigtable to process and run queries and tolerate volume increases

6 BILLIONMARKET EVENTS

WRITTEN PER HOUR

1.7 GIGsPER SECOND

PER HOUR

6 TBs

10 BNWRITTEN

PER HOUR BURSTS

1.7 GIGABYTESPER SECOND

10 TERABYTESPER HOUR

Page 31: Critical Breakthroughs and Challenges in Big Data and Analytics

Google confidential │ Do not distribute

https://www.youtube.com/watch?v=fqOpaCS117Q

Page 32: Critical Breakthroughs and Challenges in Big Data and Analytics

Analytics at the speed of thought

(and at scale)

Breakthrough #3

Page 33: Critical Breakthroughs and Challenges in Big Data and Analytics

Building what’s next 33

Scales automatically

No setup or administration

Stream up to 100,000 rows p/sec

Easily integrates with third-party software

Google BigQuerymakes complex data analysis simple

Page 34: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential + Proprietary

Google BigQuery Performance Example ?

Running an inefficient regular expression over 100 billion rows in

less than 60 seconds

Source: https://cloud.google.com/blog/big-data/2016/01/anatomy-of-a-bigquery-query

Page 35: Critical Breakthroughs and Challenges in Big Data and Analytics

1000-core Hadoop Cluster = 2.5 hours

Before

Making ad hoc Queries with BigQuery < 5min

After

● 500+ Games● Hundreds of Analysts● Terabytes of Data Daily

Page 36: Critical Breakthroughs and Challenges in Big Data and Analytics

Google BigQueryThe Power of Google Dremel for everyone

Storage Compute

Fast Ingest Query

Terabit Network

Page 37: Critical Breakthroughs and Challenges in Big Data and Analytics
Page 38: Critical Breakthroughs and Challenges in Big Data and Analytics

“Right at the start of the partnership we were able to reduce time to insight from 96 hours to 30 minutes by using BigQuery, allowing us to react in real time to customer needs and provide better service..”

Gary SandersHead of the bank's digital analytics function

https://www.finextra.com/newsarticle/28566/lloyds-partners-google-on-data-analytics

Page 39: Critical Breakthroughs and Challenges in Big Data and Analytics

Machine learning for everyone

Breakthrough #4

Page 40: Critical Breakthroughs and Challenges in Big Data and Analytics

Google Cloud Platform 4040

"Machine learning is a core, transformative way by which we're rethinking everything we're doing … we're thoughtfully applying it across all our products, be it search, ads, YouTube or Play."

Page 41: Critical Breakthroughs and Challenges in Big Data and Analytics

Google confidential | Do not distribute

Applications that can see, hear and understand

Page 42: Critical Breakthroughs and Challenges in Big Data and Analytics

Confidential & ProprietaryGoogle Cloud Platform 42

TensorFlow

Deep Learning technology currently powering over 100 Google services

Generalizable to vision, sound, text, video and other data

Runs on CPUs or GPUs, desktop, server, or mobile computing platforms.

Distributed via Apache 2.0 OSS license

Page 43: Critical Breakthroughs and Challenges in Big Data and Analytics

Use your own data to train models

Page 44: Critical Breakthroughs and Challenges in Big Data and Analytics

Google Cloud Platform Confidential & Proprietary 44

What Cloud Machine Learning Can Do

● Fully managed service

● Train using a custom Tensor Flow

graph

● Batch and online predictions, at scale

● Integrated Datalab experience

● Regression and classification tasks

Page 45: Critical Breakthroughs and Challenges in Big Data and Analytics

Fully trained, easy to use Machine Learning models

CloudTranslate API

CloudVision API

CloudSpeech API

Page 46: Critical Breakthroughs and Challenges in Big Data and Analytics

CloudVision API

LabelDetection

LandmarkDetectionOCR

LogoDetection

FaceDetection

Explicit Content

Detection

Page 47: Critical Breakthroughs and Challenges in Big Data and Analytics

{"landmarkAnnotations": [

"description":"Arc de Triomphe","locations": [{"latLng": {

"latitude":48.873667,“longitude":2.295134}}],

"score":0.94231218]}

Page 48: Critical Breakthroughs and Challenges in Big Data and Analytics

CloudSpeech API

Recognizes over 80 languages and variants

Can return text in real-time

Highly accurate, even in noisy environments

Access from any device

Powered by Google’s machine learning

Page 50: Critical Breakthroughs and Challenges in Big Data and Analytics

Machine Learning Use Cases

Structured Data

Classification/ Regression● Customer Churn Analysis● Product Diagnostics● Forecasting

Recommendation● Content Personalization● Product X-Sells/Up-sells

Anomaly Detection● Fraud Detection● Asset Sensor Diagnostics● Log Metric Anomalies

Unstructured Data

Image Analytics● Identify damaged shipments● Explicit Content Classification

Text Analytics● Call Center log analysis● Language Identification● Topic Classification● Sentiment Analysis

Page 51: Critical Breakthroughs and Challenges in Big Data and Analytics

cloud.google.com

Page 52: Critical Breakthroughs and Challenges in Big Data and Analytics

Google Cloud Platform Confidential & Proprietary 52

Page 53: Critical Breakthroughs and Challenges in Big Data and Analytics

Google’s Approach to

Cloud Security & Compliance

Page 54: Critical Breakthroughs and Challenges in Big Data and Analytics

● Tens of thousands of custom built, homogenous systems

● Dozens of datacenters for redundancy● Data encryption in transit and at rest● Secure software development process● External security verifications● 500+ security engineers● 160+ academic research papers on security● Vulnerability Reward Program

We store our own data in this environment

Page 55: Critical Breakthroughs and Challenges in Big Data and Analytics

SSAE-16SOC 1

SSAE-16SOC 2

SSAE-16SOC 3

ISO27001

HIPAA(BAA)

PCI DSS v3.0 FISMA FedRamp

GAE Complete Complete Complete Complete H2 15 Complete FISMA (Moderate) H2 15

GCS Complete Complete Complete Complete Complete Complete n/a H2 15

GCE Complete Complete Complete Complete Complete Complete n/a H2 15

Datastore Complete Complete Complete Complete H2 15 Complete n/a H2 15

Big Query Complete Complete Complete Complete Complete Complete n/a H2 15

Cloud SQL Complete Complete Complete Complete Complete Complete n/a H2 15

Genomics Complete Complete Complete Complete Complete n/a n/a H2 15

Apps Complete Complete Complete Complete Complete n/a GAFG only H2 15

Certifications

Page 56: Critical Breakthroughs and Challenges in Big Data and Analytics

Google Cloud Platform Confidential & Proprietary 56

https://cloud.google.com/solutions/machine-learning-with-financial-time-series-data

Demo: Predicting the NYSE daily outcome

Page 57: Critical Breakthroughs and Challenges in Big Data and Analytics

Google Cloud Platform Confidential & Proprietary 57

Get more info: Google Cloud for Financial Serviceshttps://cloud.google.com/solutions/finserv/