Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra
-
Upload
datastax-academy -
Category
Technology
-
view
1.325 -
download
1
Transcript of Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra
World’s Best Data Modeling Tool
for Apache Cassandra
1 © 2015. All Rights Reserved.
Artem Chebotko
Andrey Kashlev
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
2 © 2015. All Rights Reserved.
Data Modeling Process
• Data requirements
• Application requirements
• Schema Design
• Optimization
3 © 2015. All Rights Reserved.
Cassandra Data Modeling Methodology
© 2015. All Rights Reserved. 4
Conceptual
Data Model
Application
Workflow
Logical
Data Model
Physical
Data Model Mapping Optimization
Methodology Models
© 2015. All Rights Reserved. 5
Model Representation
Conceptual Data Model ERD
Application Workflow Model Graph
Logical Data Model Chebotko Diagram
Physical Data Model Chebotko Diagram, CQL
Methodology Protocols
© 2015. All Rights Reserved. 6
• Conceptual-to-logical mapping
– Mapping rules
– Mapping patterns
• Physical optimizations
– Partition size analysis
– Duplication factor analysis
– Keys, aggregation, transactions, …
Example
© 2015. All Rights Reserved. 7
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ? AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1
Example
© 2015. All Rights Reserved. 8
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Entity and Relationship Types
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1 2
Example
© 2015. All Rights Reserved. 9
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Equality Search Atributes
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↑
id C↑
value
1 2 3
Example
© 2015. All Rights Reserved. 10
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Inequality Search Attributes
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↑
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1 2 3 4
Example
© 2015. All Rights Reserved. 11
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Ordering Attributes
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↑
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1 2 3 4 5
Example
© 2015. All Rights Reserved. 12
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Key Attributes
Methodology Pros and Cons
Correctness
Completeness
© 2015. All Rights Reserved. 13
Complexity
Time investment
Human Errors Happen …
© 2015. All Rights Reserved. 14
Automation
© 2015. All Rights Reserved. 15
Complexity
Time investment
Human Error
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
16 © 2015. All Rights Reserved.
The KDM Tool
• Streamlines the methodology
• Guides the user
• Automates data modeling tasks:
– Conceptual-to-logical mapping
– Physical optimization
– CQL generation
17 © 2015. All Rights Reserved.
KDM Automation Workflow
18 © 2015. All Rights Reserved.
KDM Automation Workflow
19 © 2015. All Rights Reserved.
Design
Conceptual
Data Model
Step1
Solution
architect
KDM Automation Workflow
20 © 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Solution
architect
Step1 Step2
Solution
architect
KDM Automation Workflow
21 © 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
KDM Solution
architect
Step1 Step2 Automated
Solution
architect
KDM Automation Workflow
22 © 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
KDM Solution
architect
Step1 Step2 Step3 Automated
Solution
architect
Solution
architect
KDM Automation Workflow
23 © 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
KDM Solution
architect
Step1 Step2 Step3 Automated Automated
Solution
architect
Solution
architect KDM
KDM Automation Workflow
24 © 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
Configure
Physical
Data
Model
KDM Solution
architect
Step1 Step2 Step3 Step4 Automated Automated
Solution
architect
Solution
architect
Solution
architect KDM
KDM Automation Workflow
25 © 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
Configure
Physical
Data
Model
Generate
Physical
Schema
KDM Solution
architect
Step1 Step2 Step3 Step4 Automated Automated Automated
Solution
architect
Solution
architect
Solution
architect KDM KDM
KDM Automation Workflow
26 © 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
Configure
Physical
Data
Model
Generate
Physical
Schema
Download
CQL
Script
KDM Solution
architect
Step1 Step2 Step3 Step4 Step5 Automated Automated Automated
Solution
architect
Solution
architect
Solution
architect
Solution
architect KDM KDM
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
27 © 2015. All Rights Reserved.
28
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
29 © 2015. All Rights Reserved.
© 2015. All Rights Reserved. 30
31 © 2015. All Rights Reserved.
• KDM:
– automates most complex tasks
– eliminates human error
– simplifies data modeling
– guides
– is a general purpose tool
Summary
32 © 2015. All Rights Reserved.
• build new data models
• verify existing data models
• teach/learn data modeling
How Can KDM Help You?
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
33 © 2015. All Rights Reserved.
Future Work
• Materialized views
© 2015. All Rights Reserved. 34
Future Work
• Materialized views
• User Defined Types
© 2015. All Rights Reserved. 35
Future Work
• Materialized views
• User Defined Types
• Analysis and physical optimization
© 2015. All Rights Reserved. 36
Future Work
• Materialized views
• User Defined Types
• Analysis and physical optimization
• Support for application workflow design
© 2015. All Rights Reserved. 37
Future Work
• Materialized views
• User Defined Types
• Analysis and physical optimization
• Support for application workflow design
• Support for Chebotko Diagrams
© 2015. All Rights Reserved. 38
Sign up for KDM – it’s FREE!
• KDM: kdm.dataview.org
• Methodology: academy.datastax.com
• Planet Cassandra blog posts:
– KDM: An Automated Data Modeling Tool for Apache
Cassandra, Pt. 1, Pt. 2
• Artem Chebotko, Andrey Kashlev, Shiyong Lu,
“A Big Data Modeling Methodology for Apache Cassandra”,
IEEE International Congress on Big Data, 2015.
© 2015. All Rights Reserved. 39
Acknowledgements
• Andrey Kashlev would like to thank:
– Dr. Shiyong Lu
– Anthony Piazza
• Artem Chebotko would like to thank:
– Anthony Piazza
– Patrick McFadin
– Jonathan Ellis
– Tim Berglund
© 2015. All Rights Reserved. 40
Thank you