Moving the Elephant in the Room: Data Migration at Scale

30
Data Migration at Scale MOVING THE ELEPHANT IN THE ROOM

Transcript of Moving the Elephant in the Room: Data Migration at Scale

Page 1: Moving the Elephant in the Room: Data Migration at Scale

Data Migration at ScaleMOVING THE ELEPHANT IN THE ROOM

Page 2: Moving the Elephant in the Room: Data Migration at Scale

2

· BDPA Los Angeles Chapter· 4 year HSCC participant

· Columbia University, CC ‘14· Conductor, Inc.· linkedin.com/in/calltyrone

WHO AM I?

Page 3: Moving the Elephant in the Room: Data Migration at Scale

3

· Web Presence Management· SAAS· Big data

· Collect 6TB of raw web data a week· Scalable Collection & ETL pipelines· Final Product: reports

· 6 years running· Tons of data!

CONDUCTOR, INC.

Page 4: Moving the Elephant in the Room: Data Migration at Scale

4

· Growth· More users· More data

· Systems have to keep up!

WHY WE CARE ABOUT SCALABILITY

Page 5: Moving the Elephant in the Room: Data Migration at Scale

5

HORIZONTAL SCALING

Page 6: Moving the Elephant in the Room: Data Migration at Scale

6

VERTICAL SCALING

Page 7: Moving the Elephant in the Room: Data Migration at Scale

7

· Yesterday’s solution is tomorrow’s problem· Under-prioritized· It’s hard!

· Can require massive changes· No cure-all

SCALABILITY IN THE REAL WORLD

Page 8: Moving the Elephant in the Room: Data Migration at Scale

8

· Save money· Improve performance· Clear the way for progress

WHY REPLACE AN UNSCALABLE SYSTEM?

Page 9: Moving the Elephant in the Room: Data Migration at Scale

9

· If it ain’t broke…· Significant Resource Investment

· Time· Money

· Software Downtime· Data Quality Concerns

WHY NOT?

Page 10: Moving the Elephant in the Room: Data Migration at Scale

10

1. Identify an unscalable system2. Discover and vet a suitable successor3. Replace the legacy system with the new system

· while minimizing risk and cost

Simple, no???

YOUR TASK, AT A GLANCE

Page 11: Moving the Elephant in the Room: Data Migration at Scale

TALKING ABOUT THE ELEPHANTIdentifying an Unscalable System

Page 12: Moving the Elephant in the Room: Data Migration at Scale

12

· MySql· Normalized data model

· Helpful for initial modeling of our problem space· Hosted by a single, very powerful machine

OverviewCASE STUDY: LEGACY REPORTING DATABASE

Talking about the Elephant: Diagnosing an Unscalable System

Page 13: Moving the Elephant in the Room: Data Migration at Scale

13

· Powerful hardware isn’t cheap.· Vertical Scaling· Obsolete Schema· Difficult to backup· Queries aren’t getting any faster.

UnsustainableCASE STUDY: LEGACY REPORTING DATABASE

Talking about the Elephant: Diagnosing an Unscalable System

Page 14: Moving the Elephant in the Room: Data Migration at Scale

14

· If your solution…· Scales vertically· Prevents progress· Can’t perform at scale· Is difficult/slow/expensive to upgrade

…It’s time for a change!

SEE FOR YOURSELF

Talking about the Elephant: Diagnosing an Unscalable System

Page 15: Moving the Elephant in the Room: Data Migration at Scale

FINDING A BIGGER ROOMVetting Scalable Alternatives

Page 16: Moving the Elephant in the Room: Data Migration at Scale

16

· Price-efficient· Easy to maintain· Scales Horizontally

WHAT TO LOOK FOR

Finding a Bigger Room: Vetting Scalable Alternatives

Page 17: Moving the Elephant in the Room: Data Migration at Scale

17

· Write once, read many· De-normalized reports· High storage capacity· High Availability

Our Use CaseCASE STUDY: AWS S3 DATASTORE

Tyrone
I
Page 18: Moving the Elephant in the Room: Data Migration at Scale

18

· Write once, read many· Decent write performance, great read performance

· De-normalized reports· Flat files

· High storage capacity· No defined space limit

· High Availability· Configurable file replication

Technical OverviewCASE STUDY: AWS S3 DATASTORE

Finding a Bigger Room: Vetting Scalable Alternatives

Page 19: Moving the Elephant in the Room: Data Migration at Scale

19

· Cheap· Cloud-based· Architecture facilitates testing· Easy to back up

BenefitsCASE STUDY: AWS S3 DATASTORE

Finding a Bigger Room: Vetting Scalable Alternatives

Page 20: Moving the Elephant in the Room: Data Migration at Scale

20

· “Eventual Consistency”· Switching to non-relational storage is nontrivial

· Application code must change· Migration path gets complicated

CaveatsCASE STUDY: AWS S3 DATASTORE

Finding a Bigger Room: Vetting Scalable Alternatives

Page 21: Moving the Elephant in the Room: Data Migration at Scale

MOVING THE ELEPHANTMigrating Legacy Data to the New System

Page 22: Moving the Elephant in the Room: Data Migration at Scale

22

· Time Frame· Scheduling Constraints

· Operational Cost· Resource Constraints

· Standards for data parity

INITIAL CONSIDERATIONS

Moving the Elephant: Migrating Legacy Data to the New System

Page 23: Moving the Elephant in the Room: Data Migration at Scale

23

· Two-month finish line· Developed COGS models· Built data validation software

CASE STUDY: OUR UPFRONT PLANNING

Moving the Elephant: Migrating Legacy Data to the New System

Page 24: Moving the Elephant in the Room: Data Migration at Scale

24

· Can be scaled up or down· Speed up to save time· Slow down to save resources

· Can be run in a testing capacity· Configurable data sources/sinks· Configurable hardware resource use

IDEAL MIGRATION SOFTWARE CHARACTERISTICS

Moving the Elephant: Migrating Legacy Data to the New System

Page 25: Moving the Elephant in the Room: Data Migration at Scale

25

· Oozie and Hive· Controllable time/resource tradeoff· Testable in a qa environment

OUR MIGRATION SOFTWARE

Page 26: Moving the Elephant in the Room: Data Migration at Scale

26

· Easy to track progress· Enables concurrency· Dilutes failure risks· E.g. Conductor “Time Periods”

AN INCREMENTAL MIGRATION: PARTITIONING DATA

Moving the Elephant: Migrating Legacy Data to the New System

Page 27: Moving the Elephant in the Room: Data Migration at Scale

27

· Limit client exposure to subtler bugs· Incorporate customer feedback· Demonstrate progress early· E.g. Conductor Searchlight 3.0 Beta Program

AN INCREMENTAL RELEASE

Page 28: Moving the Elephant in the Room: Data Migration at Scale

28

YOU CAN DO IT!

Page 29: Moving the Elephant in the Room: Data Migration at Scale

29

QUESTIONS?Thanks for Listening!

Page 30: Moving the Elephant in the Room: Data Migration at Scale

30

(We’re Hiring!)