Big Data & Rocket Fuel -...

Post on 12-Feb-2018

215 views 0 download

Transcript of Big Data & Rocket Fuel -...

Big Data & Rocket Fuel

Dr Raj Subramani, HSBCReza Rokni, Google Cloud, Solutions ArchitectAdrian Poole, Google Cloud,

&

Eight cloud products with

ONE BILLIONUsers

Organize the world’s information and make it universally accessible and useful

Google’s Mission

18 years of Google R&D /

Investment

Prohibitively Expensive

Mar

gina

l cos

t of

chan

ge

$

Increasing complexity of systems and processes

Trad

itiona

l Arc

hitec

ture

s

Google Cloud Native Architectures (GCP)

Increasing Marginal Cost of Change

Containers at Google

4

2004 2016

Core Ops Team

Number of running jobs

Enabled Google to grow our fleet over 10x faster than we grew our ops team

55

Google’s innovation in data

2012 20132002 2004 2006 2008 2010

GFS

MapReduce

Bigtable Colossus

Dremel Flume

Megastore

Spanner

Millwheel

Pub/Sub

F1

2016

Dataflow

TensorFlow

Proprietary + Confidential

6

2012 20132002 2004 2006 2008 2010

GCS

Dataproc

Bigtable GCS

BigQuery Dataflow

Datastore

Dataflow

Pub/Sub

2016

Dataflow

NoSQL

Google’s innovation in data

Proprietary + Confidential

Spanner

Spanner

Cloud ML

Now available on Google Cloud Platform

Big Data

Compute

ComputeEngine

App Engine ContainerEngine

Storage & Databases

Storage Cloud SQLBigtable

Machine Learning

Spanner Datastore

BigQuery Pub/Sub Dataflow Dataproc Datalab Speech APIMachine Learning

Translate APIVision API

● Democratise ML

● Big datasets beat fancy algorithms

● Good Models

● Lots of compute

Lesson of the last 10 years...

Google BigQueryBigQuery is Google's fully managed, petabyte scale, low cost enterprise data warehouse for analytics. BigQuery is serverless. There is no infrastructure to manage and you don't need a database administrator, so you can focus on analyzing data to find meaningful insights using familiar SQL. BigQuery is a powerful Big Data analytics platform used by all types of organizations, from startups to Fortune 500 companies.

Simple: Fully Managed and Serverless

Convenient: Mb -> Pb Scale and Fast Convenience of SQL

Secure: Encrypted, Durable and Highly Available

What is Cloud Dataflow?

Intelligently scales to millions of QPS

Open source programming model

Unified batch and streaming processing

Fully managed, no-ops data processing

Confidential + Proprietary

Google Cloud Dataflow

Big Data at HSBC Scale

Dr Raj Subramani, HSBC

Fundamental Review of the Trading Book

Fundamental Review of the Trading Book (FRTB)● Basel Committee on Banking Supervision (BCBS) conducted two

assessments (The Regulatory Consistency Assessment Programme - February and December 2013) for capital charges of market risks in trading books for institutions with approved internal models

● The significant differences in capital charges confirmed that the market risk framework was in need for reform

The regulations, in their final form, were published in January 2016

National supervisors are expected to finalize implementation by January 2019

Banks are expected to report under the new standards by end of 2019

Fundamental Review of the Trading BookTrading Book and Banking

Book Boundary

FRTB

TreatmentOf Credit

(securitised v/s non-securitised)

ApproachTo Risk

Management(VaR to Expected

Shortfall)

Incorporation of liquidity horizons

Treatment of Hedging and Diversification

Relationship between Internal Model (IM) and Standardized Approach (SA)

Working in the Cloud – the tradeoffs

Technologyoutcomes

Public CloudRisks

CostOutcomes

GovernanceRisks

● Business focused IT solution● Access to latest technology● Rapid prototyping● Quicker time to market

● Reduced capacity lag● Scalability and performance● Reduced total cost of ownership

● Internal Security clearance● Regulatory approval● Data sharing across borders● Geo-political issues

● Data security risks● Lock-in risks● Third party dependency risks

Proprietary + Confidential

Cloud Dataflow

Compute and storage

Unbounded

Bounded

Resource management

Resource auto-scaler

Dynamic work rebalancer

Work scheduler

Monitoring

Log collection

Graph optimization

Auto-healing

Intelligent watermarking SOURCE

SINK

Trade & Market DataTransferred to the Cloud (batch or stream)

Storage

Market Data

Trade Data

Pub/Sub

Unbounded

Bounded

Dataflow

Analytics

BigQuery

Post

Processing

Store results Post-process

The Anatomy of a Risk Engine

Data distribution and workflow across the analytics

● 2 million (dummy) plain vanilla mono currency interest rate swaps in 12 currencies● Dummy interest rate market data build from Bond, Futures and Swaps● Analytics was open source Quantlib (C++ compiled on Linux)

Dataflow as Risk Engine - Scale and Performance

JVMrunning

C++

● Performance gains are not always obtained straight out of the box

● Application of domain knowledge and expertise will always help tease out the best desired performance

Dataflow as Risk Engine - Stateful Analytics

The Cloud Journey

• Bring the business problem not a technical solution

• Beware the frog in the well

• Big Data in Google is just data; the separation of the data from the processing, in Google, allows for clever combinations to address both scenarios

What next ?

• Sign up for a Google Cloud account - first $300 free !

• Google Cloud courses @ https://www.coursera.org/ including Qwiklabs

• Contact Ian O’Shea ( ianoshea@google.com ) for further info.

Thank you

Dr Raj Subramani, HSBCReza Rokni, Google Cloud, Solutions ArchitectAdrian Poole, Google Cloud, Financial Services

&