AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
-
Upload
amazon-web-services -
Category
Technology
-
view
870 -
download
1
Transcript of AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tigran Khrimian, VP of Data Platforms
Mark Ryland, Chief Architect, AWS Worldwide Public Sector
December 1, 2016
ENT313
FINRA in the Cloud
The Big Data Enterprise
What to Expect from the Session
• FINRA’s Enterprise Class cloud architecture
• Business Value FINRA has realized from cloud migration
• Technology skillsets required
• Tools (data management) and processes required
• Other (unexpected) benefits from cloud migration
• View from AWS: partnership and platform evolution
2
3
Data Is Central to Our Mission
Reconstruct the market from trillions of
events• Data from broker-dealers and exchanges
• Equities, Options, Fixed Income
• Build a graph of market order events
Analyze the data looking for financial
fraud• Insider trading, layering, cross-product
manipulation, front running & many more
• Looking for a needle in a haystack
4
Volume Challenges
Market volumes are volatile and
steadily increasing
Exchanges are dynamically evolving
Regulatory landscape is changing
Market manipulators innovate
5
Legacy Architecture
6
Tier 1 Tier 3Tier 2
SAN NAS
Pain PointsDoes not scale well as volumes and
workloads increase
Duplication of effort in data management
(data lifecycle, retention, versioning, etc.)
Data sync issues – manual effort to keep
data in sync
Challenges to run analytics across
fragmented data
Costly system maintenance and upgrades
7
Summary of Cloud Drivers: The Problems
• Fast-growing data volumes YoY
• High cost of pre-building for peak
• Escalating costs of in-house technology infrastructure
• Appliance platforms were facing obsolescence and end-of
life as a result of new Big Data technologies
Keep spending on infrastructure or redirect
dollars to core business (financial regulation)?
8
AWS Architecture
9
Where Is My Data?
One location of master data, security, versioning,
availability, cross-region data replication, etc…
10
How Do I Access the Data?
11
FINRA’s AWS Architecture
On-premises data center
NASFTPIncoming Files
Validation Data Management
Linkage
Data Analytics
Normalization Amazon
EC2
Amazon
S3
Amazon
Glacier
Amazon
Redshift
Amazon
EMR
VPC
Amazon
EMR
Amazon
RDSMachine
Learning
AWS
KMS
12
FINRA Usage Statistics on AWS
30k+ EC2 nodes per day
93%+ of EC2 usage is EMR
based (mostly SPOT)
20Pb+ Storage (Amazon
S3, Amazon Glacier)
60% PROD, 25% QC/UAT,
15% DEV
Node lifecycle:o 50%: Under 2h
o 35%: 2h to 5h
o 15%: over 5h
0
10,000
20,000
30,000
40,00031,044
35,44432,919
36,916
29,330
25,935
20,523
Redshift Web, App & RDS
Hadoop/Spark
Node Distribution for June 19-25 (~32k/day)
13
Information Security
14
FINRA’s Use of VPC is Highly Secure and Auditable
• Network security even more tightly controlled than
traditional data centers (i.e., “micro-segmentation”)
• Encrypt non-public data both in-motion and at-rest
• AWS IAM function with fine-grained entitlements and
SoD integrated with FINRA’s existing IAM processes
• Comprehensive audit trail – AWS CloudTrail & Amazon
CloudWatch
• Custom AWS compliance reporting system to ensure
“identity perimeter”
15
AWS Compliance & Certifications
AWS Foundation Services
Compute Storage Database Networking
AWS Global Infrastructure Regions
Availability Zones
Edge Locations
GxP
ISO 13485
AS9100
ISO/TS 16949
16Source: Amazon Web Services
Benefits
Improved performance (from min to seconds)
Ability to expand and contract (up-to 40K EC2 instances get
provisioned daily)
No more tech refreshes, patching, etc.
Lower cost of DR & Reg SCI testing
Superior data protection compared to in-house solution
Redirect focus and dollars to core business
17
Other (Unexpected) Benefits
Easier Data Access – no silos
• All data in one place
• Faster data discovery
• New forms of data exploration
Innovation & Engaged Staff
• Transformation from infrastructure ops to DevOps
• New technologies, new skills, challenging yet very clear goals
• Easier to try new things and innovate
18
Technology Skillsets Required
Fail fast, fail cheap
Innovation
Automation
Curiosity
19
FINRA’s Future Plans
• Migrate the remaining applications to the Cloud by 2018
• Hundreds of relational databases
• Hundreds of applications
• High degree of inter-application connectivity (messaging,
workflow, data replication)
• Shut down data center operation by end of 2018
20
Key Takeaways
• Develop a compelling business case - sell to your
stakeholder; sell to your team
• Make sure to get security right
• Focus on your data strategy
• Pay attention to variable infrastructure cost
• Partner with Cloud/Big Data vendors for staffing needs
• Innovate and transform as part of Cloud journeys
21
Summary
• FINRA’s original promise (cost & performance) of Cloud
realized
• Other unplanned benefits
• superior data protection
• democratization of data
• catalyst for innovation
• Migrating the remainder of portfolio by end of 2018
22
AWS Perspective – Enterprise Account
Management
23
Enterprise Account Engagement Model
• AWS Account Team Role:o Assist FINRA in architecting AWS services for Cloud
o Support Proof of Concepts (POCs) to accelerate migration
o Help FINRA understand and influence product roadmaps
• AWS Teams Engaged:o Account Management
o Solutions Architecture
o Support and Technical Account Management (TAMs)
o Technical Delivery Management (TDM)
o Professional Services
o AWS Service Teams / Engineers
24
AWS Services That FINRA Has Requested
• Broad impact across multiple services
• Identity and access management
o Long-lived federation tokens
• Cross-region data replication (CRR) for S3:
o Copy important data to another region for catastrophic DR
o FINRA requested Data Encryption, other enhancements
• Database Migration Service (DMS):
o Input on DMS roadmap / features
o Early adopter for Oracle-Postgres migration (session DAT302)
25
Biggest Impact: EMR Enhancements
• Enhanced Hive / EMRFS support
• Presto performance improvements within EMR
• HBase on S3 (STG308 session):
o Separate storage & compute – data in S3 vs. persistent HS1 cluster
o Improved resiliency (RTO for cluster restart, S3 backup/replication)
o Improved cost performance (run less expensive nodes, no longer
storage constrained)
o Scale cluster up and down with demand
26
Thank you!
20
Remember to complete
your evaluations!
28
Related Sessions
FINRA Sessions:
• BDM203 – Building a Secure Data Science Platform
• DAT302 – Best Practices for Migrating to RDS / Aurora
• CMP316 – Aligning Billions of Time Ordered Events with
Spark
• SVR202 – What’s new with AWS Lambda
• STG308 – FINRA’s Scalable Big Data Architecture on S3
29