From Oracle to MongoDB

56
Pablo Enfedaque [email protected] 06.10.2012 A real use case at Telefónica PDI From Oracle to MongoDB

description

From Oracle to MongoDB, a real use case at Telefónica R&D The talk will cover the use case of the Personalisation Server, a master customers profile storage for the companies of the Telefonica Group (Telefonica, O2…). It provides real-time (ReST API) and batch interfaces to update, retrieve and share customers profile. Initially the PS used Oracle, but due to scalability and cost issues we implemented a new version with MongoDB. In the talk we will see the problems that made us move to MongoDB and all the benefits that we obtained (with real performance figures, ofc). Right now the Oracle version is being used at UK and Ireland (aprox. 30M user profiles stored) and the NoSQL version is being deployed at Mexico (18M customers) and other Latam countries.

Transcript of From Oracle to MongoDB

Page 1: From Oracle to MongoDB

Pablo Enfedaque [email protected] 06.10.2012

A real use case at Telefónica PDI

From Oracle to MongoDB

Page 2: From Oracle to MongoDB

01

02

03

04

Introduction • Telefónica PDI. Who? • Personalisation Server. Why? What?

The SQL version • Data model and architecture • Integrations, problems and improvements

The NoSQL version • Data model and architecture • Performance boost • The bad

Conclusions • Conclusions • Personal thoughts

Content

Page 3: From Oracle to MongoDB

Título del capítulo Máximo 3 líneas

01 Introduction

Page 4: From Oracle to MongoDB

4 Telefónica PDI

Telefónica PDI. Who?

• Telefónica §  Fifth largest telecommunications company in the world §  Operations in Europe (7 countries), the United States and Latin America

(15 countries)

• Telefónica Digital §  Web and mobile digital contents and services division

• Product Development and Innovation unit §  Formerly Telefónica R&D §  Product & service development, platforms development, research,

technology strategy, user experience and deployment & operation §  Around 70 different on going projects at all time.

01

Page 5: From Oracle to MongoDB

5 Telefónica PDI

Personalisation Server. What?

• User profiling system

• Machine learning

• Recommendations

• Customer’s profile storage

01

Page 6: From Oracle to MongoDB

6 Telefónica PDI

Opt-in and profile module. Why?

• Users data, profile and permissions, was scattered across different storages

01

• Gender • Film and music preferences IPTV service

• Permission to contact by SMS? • Gender

Mobile service

• Address • Music preferences

Music tickets service

• Address • Permission to contact by SMS?

Location based offers

So you want to know my address… AGAIN?!

Page 7: From Oracle to MongoDB

7 Telefónica PDI

Opt-in and profile module. Why?

• Users data, profile and permissions, was scattered across different storages

01

• Gender • Film and music preferences IPTV service

• Permission to contact by SMS? • Gender

Mobile service

• Address • Music preferences

Music tickets service

• Address • Permission to contact by SMS?

Location based offers

Page 8: From Oracle to MongoDB

8 Telefónica PDI

Opt-in and profile module. Why?

• Provide a module to become master customer’s data storage

01

•  Gender •  Film and music

preferences •  Permission to contact

by SMS? •  Address

IPTV service

Mobile service

Music tickets service

Location based offers

Page 9: From Oracle to MongoDB

9 Telefónica PDI

Opt-in and profile module. What?

• Features:

§  Flexible profile definition, classified in services

§  Profile sharing options between different services

§  Real time API

§  Supplementary offline batch interface

§  Authorization system

§  High availability

§  Inexpensive solution & hardware

01

Page 10: From Oracle to MongoDB

Título del capítulo Máximo 3 líneas

02 The SQL solution

Page 11: From Oracle to MongoDB

11 Telefónica PDI

Data model

• Services defined a set of attributes (their profile), with default value and data type

• Users were registered in services • Users defined values for some of the services attributes • Each attribute value had an update date to avoid overwriting newer changes through batch loads

Services, users and their profile

02

Page 12: From Oracle to MongoDB

12 Telefónica PDI

Data model

• Services could access attributes declared inside other services • There were sharing rights for read or read and write • The user had to be registered in both services

Services profile sharing matrix

02

Page 13: From Oracle to MongoDB

13 Telefónica PDI

Data model

• Everything that could be accessed in the PS was a resource • Roles defined access rights (read or read and write) of resources • Auth users had roles • Roles could include other roles

Authorization system

02

Page 14: From Oracle to MongoDB

14 Telefónica PDI

Data model

• Multiple IDS: §  Users profile could be accessed with different equivalent IDs depending

on the service §  Each user ID was defined by an ID type (phone number, email, portal ID,

hash…) and the ID value

Bonus features!

02

Page 15: From Oracle to MongoDB

15 Telefónica PDI

High level logical architecture

§  Everything running on Red Hat EL 5.4 64 bits

02

Page 16: From Oracle to MongoDB

16 Telefónica PDI

High level logical architecture

§  Everything running on Red Hat EL 5.4 64 bits

02

Page 17: From Oracle to MongoDB

17 Telefónica PDI

Integration

• PS replaces all customers profile and permissions DBs

• All systems access this data through PS real time API

•  In special cases, some PS-consumers could use the batch interface.

• The same way new services could be added quite easily

Planned integration

02

Page 18: From Oracle to MongoDB

18 Telefónica PDI

Integration

• Budget restrictions: adapt all services to use the API was too expensive

• Keep independent systems DBs and synchronize PS through batch

• Use DBs built-in massive extraction feature to generate daily batch files

• However… in most cases those DBs were not able to generate Delta (only changes) extractions §  Provide full daily snapshots!

Problems arise

02

Page 19: From Oracle to MongoDB

19 Telefónica PDI

First version performance

• 1.8M customers, 180 profile attributes, 6 services

• Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes

• Batch §  Full DWH customer’s profile import: > 24 hours §  Delta extractions: 4 - 6 hours §  Loads and extractions performance proportional to data size

• API: §  Response time with average traffic: 110ms

02

Ireland

Page 20: From Oracle to MongoDB

Título del capítulo Máximo 3 líneas

03 The SQL solution Second version

Page 21: From Oracle to MongoDB

21 Telefónica PDI

Second version

• New approach: batch processes access directly DB

03

High level logical architecture

Page 22: From Oracle to MongoDB

22 Telefónica PDI

Second version

• Batch processes had to

§  Validate authentication and authorization

§  Verify user, service and attribute existence

§  Check equivalent IDs

§  Validate sharing matrix rights

§  Validate values data type

§  Check the update date of the existing values

03

Batch processes

Page 23: From Oracle to MongoDB

23 Telefónica PDI

Second version 03

DB Batch processing

Our DBAs

Page 24: From Oracle to MongoDB

24 Telefónica PDI

Second version

• Preprocess incoming batch file in BE servers §  Validate format, services and attributes existence and values data types §  Generate intermediate file with structure like target DB table

• Load intermediate file (Oracle’s SQL*Loader) to a temporal table

• Switch DB to “deferred writing”, storing all incoming modifications

• Merge temporal table and final table, checking values update date

• Replace old users attributes values table with merge result

• Apply deferred writing operations

03

New DB-based batch loading process

Page 25: From Oracle to MongoDB

25 Telefónica PDI

Second version

• Generate a temporal DB table with format similar to final batch file. Two loops over users attributes values table required: §  Select format of the table; number and order of columns / attributes §  Fill the new table

• Loop the whole temporal table for final formatting (empty fields…)

• From batch side loop across the whole table (SELECT * FROM …) • Write each retrieved row as a line in the resulting file

03

New batch extraction process

Page 26: From Oracle to MongoDB

26 Telefónica PDI

Second version performance

• Batch time window: 3:30 hours §  Full DWH load §  Two Delta loads §  Three Delta extractions

• API: §  Ireland requirement: < 500ms

03

Ireland performance requirements

Page 27: From Oracle to MongoDB

27 Telefónica PDI

Second version performance

• 1.8M customers, 180 profile attributes, 6 services • Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes §  Temporal tables size increases almost exponentially: 15Gb and above §  Intermediate file size: from 700Mb to 7Gb • Batch §  Full DWH customer’s profile import: 2:30 hours §  Delta extractions: 1:00 hour §  Loads performance worsened quickly (almost exp): 6:00 hours §  Extractions performance proportional to data size §  Concurrent batch processes may halt the DB • API: §  Response time with average traffic: 80ms §  Response time while loading was unpredictable: >300ms

03

Ireland

Page 28: From Oracle to MongoDB

Título del capítulo Máximo 3 líneas

04 The SQL solution Third version

Page 29: From Oracle to MongoDB

29 Telefónica PDI

Third version 04

Speed up DB Batch processes

Our DBAs (again)

Page 30: From Oracle to MongoDB

30 Telefónica PDI

Third version

• Minor preprocessing of incoming batch file in BE servers §  Just validate format §  No intermediate file needed!

• Load validated file (Oracle’s SQL*Loader) to a temporal table • Loop the temporal table merging the values into final table, checking

values update date and data types §  Use several concurrent writing jobs

• Store results on real table, no need to replace!

• No “deferred writing”!

04

New (second) DB-based batch loading process

Page 31: From Oracle to MongoDB

31 Telefónica PDI

Third version

• Optimized loops to generate temporal output table. §  Use several concurrent writing jobs §  We achieved a speed-up of between 1.5 and 2

• Loop the whole temporal table for final formatting (empty fields…) • Download and write lines directly inside Oracle’s sqlplus

• No SELECT * FROM … query from Batch side!

04

Enhancements to extraction process

Page 32: From Oracle to MongoDB

32 Telefónica PDI

Our DBAs

F**K YEAH

Third version performance

• 1.8M customers, 180 profile attributes, 6 services • Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes §  Temporal tables: 15Gb

• Batch §  Full DWH customer’s profile import: 1:10 hours (vs. 2:30 hours) §  Three Delta extractions: 2:15 hours (vs. 3:00 hours) §  Loads and extractions performance proportional to data size §  Concurrent batch processes not so harmful

• API: §  Response time with average traffic: 110ms §  Response time while loading: 400ms

04

Ireland

Page 33: From Oracle to MongoDB

33 Telefónica PDI

Our DBAs

F**K YEAH

Third version performance

• 25M customers, 150 profile attributes, 15 services • Sizes §  Tables + indexes size: 700Gb §  40% of the size were indexes

• Batch §  Two Delta imports: < 2:00 hours §  Two Delta extractions: < 2:00 hours §  Loads and extractions performance proportional to data size

• API: §  Response time with average traffic: 90ms

04

United Kingdom

Page 34: From Oracle to MongoDB

34 Telefónica PDI

Our DBAs

F**K YEAH

Third version performance 04

Ireland 3rd version 2nd version

DB size 65Gb + 15Gb (temp) 65Gb + > 15Gb

Full DWH load 1:10 hours 2:30 hours

Three Delta exports 2:15 hours 3:00 hours

Batch stability Stable, linear Unstable, exponential

API response time 110ms 110ms

API while loading 400ms Unpredictable

United Kingdom 3rd version

DB size 700Gb

Two Delta loads < 2:00 hours

Three Delta exports < 2:00 hours

API response time 90ms

Page 35: From Oracle to MongoDB

35 Telefónica PDI

Third version performance

• 20 database tables • API: several queries with up to 35 joins and even some unions • Authorization: 5 joins to validate auth users access • Batch: §  Load: 1700 lines of PL/SQL §  Extraction: 1200 of PL/SQL

04

DB stats

Page 36: From Oracle to MongoDB

36 Telefónica PDI

Mission completed? 04

Page 37: From Oracle to MongoDB

37 Telefónica PDI

Third version performance

• 20M customers, 200 profile attributes, 10 services

• Mexico time window: 4:00 hours §  Full DWH load! §  Additional Delta feeds loads §  At least two Delta extractions

04

Mexico

Our DBAs

Page 38: From Oracle to MongoDB

Título del capítulo Máximo 3 líneas

05 The NoSQL solution

Page 39: From Oracle to MongoDB

39 Telefónica PDI

MongoDB Data Model Services and their profile + sharing matrix

05

{ _id : 7, service_name : "root", id_type : 1, default_values: false, owned_attribs : [ { attrib_id : 70005, attrib_nane : “marketing.consent", attrib_data_type : 1, attrib_def_value : "no", attrib_status : 1 }, ... ], shared_attribs : [ {attrib_id : 20144, sharing_mode : 0}, ... ] }

attrib_id = service_id * 10000 + num attribs + 1

attrib_id = service_id * 10000 + num attribs + 1

Page 40: From Oracle to MongoDB

40 Telefónica PDI

MongoDB Data Model Users and their profile + multiple IDs

05

{ _id : "011234" services_list : [ { service_id : 1, reg_date : {"$date" : 1318040693000} }, ... ], user_values : [ { attrib_id : 10140, attrib_value : "Open", update_date : {"$date" : 1317110161000} }, ... ] }

Equivalent ID document: { _id : “05abcd" ue : "011234" }

_id = “id type” + “user ID”

attrib_id = service_id * 10000 + num attribs + 1

_id = “id type” + “user ID”

Page 41: From Oracle to MongoDB

41 Telefónica PDI

MongoDB Data Model Authorization system

05

AUTH USERS COLLECTION: { _id: "admin" auth_pswd: ”XXX", auth_roles: ['PS_ADMIN_ROLE’, …], auth_uris: [ {uri_path: "/**", method: 'R'}, {uri_path: "/stats/**", method: 'RW'}, {uri_path: "/kpis/**", method: ’IMPORT'}, ... ] } RESOURCES COLLECTION:

{ _id: "admin.**", role_uri: "/**" }

ROLES COLLECTION: { _id: 'PS_ADMIN_ROLE', roles_resources: [ { resource_id: "admin.**”, method: 'R' }, { resource_id: "stats.**”, method: 'IMPORT' }, ... ] }

Replicate uris (from resources) and methods (from roles)

Page 42: From Oracle to MongoDB

42 Telefónica PDI

MongoDB Data Model

• Only 5 collections • API: typically 2 accesses (services and users collections) • Authorization: access only 1 collection to grant access • Batch: all processing done outside DB

05

DB stats

Page 43: From Oracle to MongoDB

43 Telefónica PDI

NoSQL version

§  Everything running on Red Hat EL 6.2 64 bits

05

High level logical architecture

Page 44: From Oracle to MongoDB

44 Telefónica PDI

NoSQL version performance

• 1.8M customers, 180 profile attributes, 6 services • Sizes §  Collections + indexes size: 20Gb (vs. 65Gb) §  < 5% of the size are indexes (vs. 30%)

• Batch §  Full DWH customer’s profile import: 0:12 hours (vs. 1:10 hours) §  Three Delta extractions: 0:40 hours (vs. 2:15 hours) §  Loads and extractions performance proportional to data size §  Concurrent batch processes without performance affection

• API: §  Response time with average traffic: < 10ms (vs. 110ms) §  Response time while loading: the same §  High load (600 TPS) response time while loading: 300ms

05

Ireland (at PDI lab)

Page 45: From Oracle to MongoDB

45 Telefónica PDI

NoSQL version performance

• 25M customers, 150 profile attributes, 15 services • Sizes §  Collections + indexes size: 210Gb (vs. 700Gb) §  < 5% of the size were indexes

• Batch §  Two Delta imports: < 0:40 hours (vs. 2:00 hours) §  Loads and extractions performance proportional to data size

05

United Kingdom (at PDI lab)

Page 46: From Oracle to MongoDB

46 Telefónica PDI

NoSQL version performance

• 20M customers, 200 profile attributes, 15 services • Sizes §  Collections + indexes size: 320Gb §  Indexes size: 1.2Gb

• Batch §  Initial Full import (20M, 40 attributes): 2:00 hours §  Small Full import (20M, 6 attributes): 0:40 hours

• API: §  Response time with average traffic: < 10ms (vs. 90ms) §  Response time while loading: the same §  High load (500 TPS) response time while loading: 270ms

05

Mexico

Page 47: From Oracle to MongoDB

47 Telefónica PDI

Our DBAs

NoSQL version performance 04

Ireland NoSQL version SQL version

DB size 20Gb 80Gb

Full DWH load 0:12 hours 1:10 hours

Three Delta exports 0:40 hours 2:15 hours

API while loading < 10ms 400ms

API 600TPS + loading 300ms Timeout / failure

United Kingdom NoSQL version SQL version

DB size 210Gb 700Gb

Two Delta loads < 0:40hours < 2:00 hours

Mexico NoSQL version

DB size 320Gb

Initial Full load (40 attr) 2:00 hours

Daily Full load (6 attr) 0:40 hours

API while loading < 10ms

API 500TPS + loading 270ms

Page 48: From Oracle to MongoDB

48 Telefónica PDI

Mission completed? 05

Page 49: From Oracle to MongoDB

49 Telefónica PDI

The bad

• Batch load process was too fast §  To keep secondary nodes synched we needed oplog of 16 or 24Gb §  We had to disable journaling for the first migrations

• Labels of documents fields take up disc space §  Reduced them to just 2 chars: “attribute_id” -> “ai”

• Respect the unwritten law of at least 70% of size in RAM

• Take care with compound indexes, order matters §  You can save one index… or you can have problems §  Put most important key (never nullable) the first one

• DBAs whining and complaining about NoSQL §  “If we had enough RAM for all data, Oracle would outperform MongoDB”

05

Page 50: From Oracle to MongoDB

50 Telefónica PDI

The ugly

• Second migration once the PS is already running §  Full import adding 30 new attributes values: 10:00 hours §  Full import adding 150 new attributes values: 40:00 hours

•  Increase considerably documents size (i.e. adding lots of new values to the users) makes MongoDB rearrange the documents, performing around 5 times slower §  That’s a problem when you are updating 10k documents per second

• Solutions? §  Avoid this situation at all cost. Run away! §  Normalize users values; move to a new individual collection §  Prealloc the size with a faux field

•  You could waste space! §  Load in new collection, merge and swap, like we did in Oracle

05

Page 51: From Oracle to MongoDB

Título del capítulo Máximo 3 líneas

06 Título del capítulo Máximo 3 líneas Conclusions

Page 52: From Oracle to MongoDB

52 Telefónica PDI

Conclusions & personal thoughts

• Awesome performance boost §  But not all use cases fit in a MongoDB / NoSQL solution!

• New technology, different limitations

• Fear of the unknown §  SSDs performance? §  Long term performance and stability?

• Python + MongoDB + pymongo = fast development §  I mean, really fast

• MongoDB Monitoring Service (MMS)

• 10gen people were very helpful

06

Page 53: From Oracle to MongoDB

53 Telefónica PDI

Questions? 06

Page 54: From Oracle to MongoDB
Page 55: From Oracle to MongoDB

55 Telefónica PDI

SQL Physical architecture

§  Scale horizontally adding more BE or DB servers or disks in the SAN §  Virtualized or physical servers depending on the deployment

0X

Page 56: From Oracle to MongoDB

56 Telefónica PDI

MongoDB Physical architecture

§  MongoDB arbiters running on BE servers §  Scale horizontally adding more BE servers or disks in the SAN §  Sharding may already be configured to scale adding more replica sets

0X