Cassandra advanced data modeling

Post on 21-Apr-2017

2.229 views 5 download

Transcript of Cassandra advanced data modeling

CassandraAdvanceddata modeling

Lyon Cassandra UsersRomain Hardouin2016-05-31

$ whoRomain

$ pgrep -fl workCassandra architect

$ whatis teadsNo.1 Video Advertising Marketplace

I. Introduction

II. Key principles

III. Chebotko methodology

IV. Time handling

Data modeling

I. Introduction

Theory

Theory

Chebotko diagrams

E&R

II. Key principles

Know your data

DenormalizeKnow your queries

Key Principles

Nest DataDuplicate Data

Know your domain

Conceptual Data Model, E&R● Entities● Relationships● Attributes / Keys● Cardinalities● Constraints

Know your data

Entities & relationships

Know your data

Query-driven model

Application Workflow

New needs?● New queries => new tables● Alter table possible?

Know your data

Know your queries

Goal: one partition per query

Anti-pattern:● Table scan● Client joins (a.k.a multi-table)● Secondary index● Allow filtering

Know your data

Know your queries

Nest Data

Clustering columns

Collection columns

UDT columns

Know your data

Denormalize

Nest Data

Know your data

Denormalize

CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id),

actor_name, character_name));

Duplicate data

Writes are cheap: « Joins on write »

Duplication occurs at different levels:● Table: Materialized views● Partition● Rows

Know your data

Denormalize

III. Chebotko Methodology

From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »

Application workflowApplication workflow

Query workflow Query list

From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »

Chebotko DiagramChebotko Diagram

actors_by_video

video_id uuid Kactor_name text C↑character_name text C↑

CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id), actor_name, character_name));

Chebotko DiagramChebotko Diagram

MR 1Entities & Relationships

MR 2Equality search attributes

MR 3Inequality search attribues

Chebotko mapping rules

MR 5Key attributes, uniqueness

MR 4Ordering attributes

<>=

↑↓

From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »

Chebotko mapping rulesChebotko mapping rules

Internet of ThingsDemo

Kashlev Data Modeler

IV. Time handling- Tombstones

- TTL

- UPSERTs

IV. Time handling- Tombstones

- TTL

- UPSERTs

Eventually consistency

No instant deletes

Deletes are writes

SSTables are immutable files

Writes are spread across many files

Goal: avoid to read too many* tombstones

...

...

* see tombstone_warn_threshold & tombstone_failure_threshold

IV. Time handling- Tombstones

- TTL

- UPSERTs

TTLsTTLs

Data must be designed to be TTL'ed

tombstones

Why?

What we add?

TIMEdimension

IV. Time handling- Tombstones

- TTL

- UPSERTs

UPSERTsUPSERTs

Same INSERT over and over again?

UPSERTs hide this behavior

What if… one day you want to add time

Questions?

Resources« A Big Data Modeling Methodology for Apache Cassandra »

- Artem Chebotko, Andrey Kashlev & Shiyong Lu - www.cs.wayne.edu/andrey/papers/TR-BIGDATA-05-2015-CKL.pdf

KDM- Andrey Kashlev- kdm.dataview.org