Reltio: Powering Enterprise Data-driven Applications with Cassandra

Post on 08-Jan-2017

864 views 0 download

Transcript of Reltio: Powering Enterprise Data-driven Applications with Cassandra

Powering  Enterprise  Data-­driven  Applications  with  Cassandra

“ ”2

Be  Right  Fasterwith  

Reliable  Data,  Relevant  Insights,

Recommended  Actions

TM

#DataManagement

#BigData

#ML

©  2015.  All  Rights  Reserved.    

Anastasia  ZamyshlyaevaVP  Platform  Product  Management  and  Co-­founder  @  Reltio  

• 2011  – started  working  with  C*

• 2012  – selected  C*  as  the  persistence  store  for  creating  a  hybrid  Columnar  &  Graph  data-­store

• Since  2012  – Running  in  Production  to  support:  

– 24/7  uptime  with  99.995%  availability

– Multi-­Tenancy  across  customers

– both  Operational  and  Analytical  workloads

stasia@reltio.comwww.linkedin.com/in/azamyshlyaeva

©  2015.  All  Rights  Reserved.     3

“If you focus on the smallest details, you never get the big picture right”

~  Leroy  Hood

©  2015.  All  Rights  Reserved.     4

©  2015.  All  Rights  Reserved.     5

©  2015.  All  Rights  Reserved.     6

©  2015.  All  Rights  Reserved.     7

©  2015.  All  Rights  Reserved.     8

Sales

Web  site

Support

Supply

Marketing

©  2015.  All  Rights  Reserved.     9

Sales

Web  site

Supply

MarketingSupport

©  2015.  All  Rights  Reserved.     10

Sales

Web  site

Supply

MarketingSupport

Enterprise  Applications  Ecosystem11©  2015.  All  Rights  Reserved.    

Is  data  up-­to-­date?

Is  data  correct?

?? ?Is  data  complete?

©  2015.  All  Rights  Reserved.     12

©  2015.  All  Rights  Reserved.     13

Sales

Web  site

Data  Unification  Application

Supply

(based  on  Relational  Databases)• Fixed  structure• No  big  data• Expensive• Hard  to  support  graphs  and  complex  attributes• Single  point  of  failure  (often) MarketingSupport

©  2015.  All  Rights  Reserved.     14

Sales

Web  site

Supply

MarketingSupport (based  on  Cassandra)

Why  Cassandra?üHigh performance

üFault tolerance

üLinear scalability

üMulti-datacenter

©  2015.  All  Rights  Reserved.     15

Reltio Metadata-driven Model and Operations

©  2015.  All  Rights  Reserved.     16

Doctors  and  HospitalsSchema

configureUI,  REST  API,  Analytics

©  2015.  All  Rights  Reserved.     17

Oil  &  GasSchema

Reltio Metadata-driven Model and Operations

UI,  REST  API,  Analyticsconfigure

©  2015.  All  Rights  Reserved.     18

Asset  CatalogSchema

Reltio Metadata-driven Model and Operations

UI,  REST  API,  Analyticsconfigure

AMan

Cassandra   is  a  primary  datastore

©  2015.  All  Rights  Reserved.     19

©  2015.  All  Rights  Reserved.     20

ID: doc1Type: IndividualName: JohnEmail: john@gmail.com

john@yahoo.comAddress: CA, shipping

NY, billing

Entity type: Individual- Name: String- Email: List- Address: Complex

- State: String- Type: List

Metadata Entity

doc1<Name>.1 …

John

Simple  metadata-­driven  attributes  in  Cassandra  (Thrift  API)

Metadata-­driven  Documents  in  Columnar  storage

ID: doc1Type: IndividualName: JohnEmail: john@gmail.com

john@yahoo.comAddress: CA, shipping

NY, billing

Entity type: Individual- Name: String- Email: List- Address: Complex

- State: String- Type: List

©  2015.  All  Rights  Reserved.     21

Entity

doc1… <Email>.1 <Email>.2 …

… john@gmail.com john@yahoo.com

Multi-­value  metadata-­driven attributes  in  Cassandra  (Thrift  API)

Metadata

Metadata-­driven  Documents  in  Columnar  storage

ID: doc1Type: IndividualName: JohnEmail: john@gmail.com

john@yahoo.comAddress: CA, shipping (1)

NY, billing (2)

©  2015.  All  Rights  Reserved.     22

Entity

doc1… <Address>.1.<State>.1 <Address>.1.<Type>.1 <Address>.2.<State>.1 …

… CA billing NY

Complex  metadata-­driven  attributes  in  Cassandra  (Thrift  API)

Metadata

Metadata-­driven  Documents  in  Columnar  storage

Entity type: Individual- Name: String- Email: List- Address: Complex

- State: String- Type: List

©  2015.  All  Rights  Reserved.     23

Metadata-­driven  Documents  – CQL  wide  rowsCREATE TABLE ENTITIES(

doc_id int,attribute_name String,attribute_value String,…PRIMARY KEY (doc_id, attribute_name)

);

SELECT * -- select all addressesFROM ENTITIESWHERE doc_id = 1AND attribute_name >= Address.0 AND attribute_name <= Address.9;

©  2015.  All  Rights  Reserved.     24

John

DunderMifflin

Dwight

CopyPaper

Employee Individual

ProductOrganization Cassandra-­ Records  storage  across  datacenters

Reltio-­ Metadata-­driven  graphs-­ Rich  model  for  entities,  relations-­ Partitioning-­ Effective  joins-­ Graph  operations

Hybrid  Graphs  -­ linked  entities  with  infinite  attribution

25

Reltio  de-­duplication

John Smith

Jon Smith

©  2015.  All  Rights  Reserved.     26

Cassandra+ = Hybrid searchElasticsearch**  excluded  documents

Hybrid  Search  – without  documents!

0

0.5

1

1.5

Data  volume  in  Elasticsearch index  (Tb)

0

1000

2000

Elasticsearch indexing  performance  (OPS)

0

10

20

30

Search  performance  on  large  documents  (sec)

-­ Elasticsearch

-­ Hybrid  search:  Elasticsearch +  Cassandra

Reltio  Cloud  Data  Components

©  2015.  All  Rights  Reserved.    

Spark

AWS

AWS  Redshift

Cassandra

Elasticsearch

Reltio  Use  Cases

©  2015.  All  Rights  Reserved.     28

AManag

Thank  you