(FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

Post on 29-Jun-2015

452 views 4 download

Tags:

description

Jason Timmes led the migration of the primary data warehouse for Nasdaq's Transaction Services U.S. business unit (which operates Nasdaq's U.S. equity and options exchanges) from a traditional on-premises MPP database to Amazon Redshift. The project significantly reduced operational expenses. Jason, who is an Associate Vice President of Software Development at Nasdaq, describes how his team migrated a warehouse that loads approximately 7 billion rows a day into the cloud, satisfied several security and regulatory audits, optimized read and write performance, ensures high availability, and orchestrates other back-office activities that depend on the warehouse daily loads completing. Along with sharing several technical lessons learned, Jason will discuss Nasdaq's roadmap to integrating Redshift with more AWS services, as well as with more Nasdaq products, to offer even greater benefit to clients (internal and external) in the months ahead.

Transcript of (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

2

We make the

world’s capital markets

move faster more efficientmore transparent

Public company

in S&P 500

Develop and run markets globally in

all asset classes

We provide technology, trading,

intelligence and listing services

Intense Operational Focus

on Efficiency and Competitiveness

We provide the infrastructure, tools and strategic

insight to help our customers navigate the complexity of global capital markets and realize their capital ambitions.

Get to know usWe have uniquely transformed our business from predominately a U.S. equities exchange to a

global provider of corporate, trading, technology and information solutions.

3

LEADING INDEX PROVIDER WITH

41,000+ INDEXES ACROSS ASSET CLASSES AND

GEOGRAPHIES

Over 10,000 Corporate Clients in

60 countries

Our technology

powers over

70

MARKETPLACES,

regulators, CSDs

and clearing-

houses

in over

50 COUNTRIES

100+ DATA

PRODUCT OFFERINGS

supporting 2.5+ millioninvestment professionals

and users

IN 98 COUNTRIES

26 Markets

3 Clearing Houses

5 Central Securities

Depositories

Lists more than 3,500

companies in 35 countries,

representing more than $8.8

trillion in total market value

Our warehouse can be used to

analyze market share, client

activity, surveillance, power our

billing, and more…

• A quality of an action such that repetitions of the

action have no further effect on outcome– In other words, f(x) = f(f(x)) = f(f(f(x))), etc.

• Ingest process is designed as a workflow engine

with each step in each workflow being idempotent.

• Failures are easily recovered by repeating the failed

step after resolving the root cause of any failure.

• Use a manifest file inside a transaction with a table

lock, and keep a record of completed ingests

• If the S3 COPY (insert) fails, rollback the transaction

• If the insert succeeds, write a record of the

completed ingest, and commit the transaction

• Idempotence: start transaction, lock destination

table, check for prior successful ingest, and only

start insert if data hasn’t already been loaded today

• Pay close attention to the mandatory flag!

• Redshift UNLOAD always sets this to false!!!

• TableIngestStatus– We originally put this table in Redshift itself

– Turns out Redshift is not efficient on really small data sets

– Significantly impacted performance, and increased concurrency

contention

• Solution: Moved TableIngestStatus to a separate

transactional RDBMS (MySQL)– We were already using a MySQL instance to persist workflow

states

• Multiple layers of security– Direct Connect (private lines)

– VPC

– HTTPS/SSL/TLS (Encryption in flight)

– AES-256 (Encryption at rest in S3)

– Redshift encryption (Encryption at rest in Redshift)

– HSM integration (Redshift master key managed on premise)

– CloudTrail/STL_CONNECTION_LOG to monitor for unauthorized

DB connections

• Direct Connect– No company data travels over internet circuits

• VPC– Isolate our Redshift servers from other tenets/internet connectivity

– Security Groups restrict inbound/outbound connectivity

• All AWS API calls are made over HTTPS

• All Redshift JDBC connections must use SSL/TLS– Parameter Group: require_ssl = true

– Use Redshift cluster SSL certificate to verify cluster identity

• See http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-

support.html for details

• All Redshift load files staged in S3 are AES-256

encrypted (client side, not S3 SSE)– Key is provided to Redshift in the S3 COPY command:

• Enable cluster encryption on Redshift– Only specified during cluster creation, cannot be changed

– Applies to backups/snapshots as well

– Performance penalty, but not optional for Nasdaq

copy nbbo from 's3://my_ingest/2014-09-17/nbbo.manifest'

credentials 'aws_access_key_id=<access-key-id>;

aws_secret_access_key=<secret-access-key>;master_symmetric_key=<master_key>'

manifest encrypted gzip;

• Redshift will store the cluster key in a single

customer premise HSM (or CloudHSM)– SafeNet Luna SA HSM, firmware version should match CloudHSM

– Requires certificate exchange between cluster and HSM

– Requires cluster have an EIP

• On our side, required static 1-to-1 NAT of HSM private IP

• VPC Security Groups still apply; can still isolate cluster from others

– Encrypted database key decrypted in HSM, passed over encrypted

channel to cluster on startup, stored in memory to decrypt data

encryption (block) keys

– If running an HSM HA group, must synchronize keys after creation

• HSM integration was critical to Nasdaq adoption

• Monitor cluster access, react to any unauthorized

connections– STL_CONNECTION_LOG

• Query system table on a timed basis, alert to any unexpected access

– CloudTrail to Splunk Redshift connection & user logs

• Captures all API calls, not activity inside Redshift

– STL_DDLTEXT

• Audits all schema changes in the cluster

• In response to an alert, Redshift/HSM connectivity is

severed, and cluster is immediately shut down

• With validation, data integrity, and security

requirements met, the challenge remains to

optimize ingest

• Why?– Concurrency is a huge performance factor; can’t afford to be

loading yesterday’s data when clients are running queries

-

20

40

60

80

100

120

140

1 2 4 6 8 10 12 14 16 18

Th

rou

gh

pu

t (M

B/s

ec)

Concurrent Threads

S3 (over HTTPS) Multithreaded Throughput

On premise AWS Regional (Multi-AZ) Scope AWS (US-East,

primary AZ/VPC)

S3

SNS

Redshift

Database

Cluster

HSM Key

Appliance

Cluster

MySQL

Redshift

Load files/

Manifests

Redshift

Snapshots/

Backups

Data

Loaded

Topic

RMS Input

Sources

(multiple

systems)

Data Ingest

Process

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevals