NoSQL Database- cassandra column Base DB

34
+ NoSQL – Part 2 CAP Theorem & Column Oriented Mohammad Sadegh Salehi Dr.Baraani Winter2015 Sheikh Bahaie University

Transcript of NoSQL Database- cassandra column Base DB

Page 1: NoSQL Database- cassandra column Base DB

+

NoSQL – Part 2CAP Theorem & Column Oriented

Mohammad Sadegh Salehi

Dr.Baraani

Winter2015 Sheikh Bahaie

University

Page 2: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

2

Winter 2015

Agenda

— Review NoSQL

— Dynamo and BigTable

—NoSQL Classification

— Key-value Stores

— Column Oriented

—Casandra

— Why Casandra

— Question

Page 3: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

3

Winter 2015

What is NoSQLreview

Stands for Not Only SQL

Class of non-relational data storage systems

Usually do not require a fixed table schema nor do

they use the concept of joins

All NoSQL offerings relax one or more of the ACID

properties (will talk about the CAP theorem)

Page 4: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

4

Winter 2015

Dynamo and BigTable

Three major papers were the seeds of the NoSQL

movement

• BigTable (Google)

• Dynamo (Amazon)

—Gossip protocol (discovery and error detection)

— Distributed key-value data store

— Eventual consistency

• CAP Theorem (discuss in a sec ..)

Page 5: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

5

Winter 2015

Page 6: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

7

Winter 2015

What kinds of NoSQLReview

NoSQL solutions fall into two major areas:

• Key/Value or ‘the big hash table’.

— Amazon S3 (Dynamo)

— Voldemort

—Scalaris

• Schema-less which comes in multiple flavors, column-

based, document-based or graph-based.

—Cassandra (column-based)

— CouchDB (document-based)

— Neo4J (graph-based)

— HBase (column-based)

Page 7: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

8

Winter 2015

Key-Value Stores

Extremely simple interface• Data model: (key, value) pairs• Operations:

— Insert(key,value), —Fetch(key),—Update(key), —Delete(key).

Implementation: efficiency, scalability, fault-tolerance• Records distributed to nodes based on key• Replication• Single-record transactions, “eventual consistency”

Page 8: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

9

Winter 2015

Key-Value Data Stores

Storing Session Information User Profiles, Preferences: Almost every user has

a unique userID as well as preferences such as language, color, timezone, which products the user has access to , and so on.

Suitable Use Cases

Page 9: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

10

Winter 2015

Key-Value Data Stores

As we want the shopping carts to be available all the time, across browsers, machines, and sessions, all the shopping information can be put into value where the key is the userID

Shopping Cart Data

Page 10: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

11

Winter 2015

Key-Value Data Stores

Relationships among data

Multi-operation Transactions

Query by Data

Operations by Sets

Not to Use

Page 11: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

12

Winter 2015

Column-oriented

Store data in column order Allow key-value pairs to be stored (and retrieved

on key) in a massively parallel system,

• Data model: families of attributes defined in a schema, new attributes can be added,

• Storing principle: big hashed distributed tables,

• Properties: partitioning (horizontally and/or vertically), high availability etc. completely transparent to application,

Intro

Page 12: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

13

Winter 2015

Page 13: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

14

Winter 2015

Cassandra

Apache Cassandra™ is a free Distributed… High performance… Extremely scalable… Fault tolerant (i.e. no single point of failure)…

Post-relational database solution.

Cassandra can serve as both real-time datastore and as a read-intensive database.

Compiles to: C++, Java, PHP, Ruby, Erlang, Perl, ...

Thrift

Page 14: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

15

Winter 2015

CassandraInfographic

Page 15: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

16

Winter 2015

Cassandra

Originally developed at Facebook Follows the BigTable data model: column-oriented Uses the Dynamo Eventual Consistency model Written in Java Open-sourced and exists within the Apache family Uses Apache Thrift as it’s API Some of its myriad users:

Page 16: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

17

Winter 2015

Cassandra

keyspace: Usually the name of the application; e.g., 'Twitter', 'Wordpress‘.

column family: structure containing an unlimited number of rows• Simple• Super (nested Column Families)

column: a tuple with name, value and time stamp• Each Column has

— Name— Value— Timestamp

key: name of record super column: contains more columns

Data Model

Page 17: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

18

Winter 2015

Cassandra – Data Model

keyspace

settings

column family

settings

column

name value timestamp

Page 18: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

19

Winter 2015

CassandraColumn Family & Super Column Family

Page 19: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

20

Winter 2015

Cassandra

Cassandra was designed with the understanding that system/hardware failures can and do occur

Peer-to-peer, distributed system All nodes the same Data partitioned among all nodes

in the cluster Custom data replication to ensure

fault tolerance Read/Write-anywhere design

Architecture Overview

Page 20: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

21

Winter 2015

Cassandra

Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second,

A commit log is used on each node to capture write activity. Data durability is assured,

Data also written to an in-memorystructure (memtable) and then to disk once the memory structure is full (an SStable).

Architecture Overview

Page 21: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

22

Winter 2015

Why Cassandra?

Gigabyte to Petabyte scalability Linear performance gains through adding nodes No single point of failure Easy replication / data distribution Multi-data center and Cloud capable No need for separate caching layer Tunable data consistency Flexible schema design Data Compression CQL language (like SQL) Support for key languages and platforms No need for special hardware or software

Page 22: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

23

Winter 2015

Why Cassandra?

Capable of comfortably scaling to petabytes New nodes = Linear performance increases Add new nodes online

Big Data Scalability

1

2

Double Throughput

Capabilities

1

2

3

4

Page 23: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

24

Winter 2015

Why Cassandra?

All nodes the same Customized replication affords tunable data redundancy Read/write from any node Can replicate data among different physical data center

racks

No Single Point of Failure

Page 24: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

25

Winter 2015

Why Cassandra?

Peer-to-peer architecture removes need for special caching layer and the programming that goes with it

The database cluster uses the memory from all participating nodes to cache the data assigned to each node

No irregularities between a memory cache and database are encountered

No Need for Caching Software

Database Server

Memcached Servers

Application ServersW

rite

s

Re

ad

s

Page 25: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

26

Winter 2015

Why Cassandra?

Uses Google’s Snappy data compression algorithm Compresses data on a per column family level Internal tests at DataStax show up to 80%+ compression

of raw data No performance penalty (and some increases in overall

performance due to less physical I/O)!

Data Compression

Portfolio Keyspace

Customer Column Family

Page 26: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

27

Winter 2015

Why Cassandra?

Very similar to RDBMS SQL syntax Create objects via DDL (e.g. CREATE…) Core DML commands supported: INSERT, UPDATE,

DELETE Query data with SELECT

CQL Language

Portfolio Keyspace1

2

3

4

5

6

SELECT *

FROM USERS

WHERE STATE = ‘TX’;

Page 27: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

28

Winter 2015

Comparison with MySQL

MySQL > 50 GB Data Writes Average : ~300 msReads Average : ~350 ms

Stats provided by Authors using facebook data.

Cassandra > 50 GB DataWrites Average : 0.12 msReads Average : 15 ms

Page 28: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

29

Winter 2015

Cassandra Tools

..\..\..\..\Desktop\noSqlCassandra-sadegh\noSqlCassandra-sadegh.mp4

Page 29: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

30

Winter 2015

Where to get Cassandra?

Go to www.datastax.com DataStax makes free smart start installers available for

Cassandra that include: • The most up-to-date Cassandra version that is production quality• A version of DataStax OpsCenter, which is a visual, browser-

based management tool for managing and monitoring Cassandra

• Drivers and connectors for popular development languages • Same database and application• Automatic configuration assistance for ensuring optimal

performance and setup for either stand-alone or cluster implementations

• Getting Started Guide

Page 30: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

31

Winter 2015

Where Can I Learn More?

www.datastax.com

Free Online Documentation User/Customer Cas Studies Technical White Papers Software downloads Technical Articles

User Forums Videos Tutorials FAQ’s Blogs

Page 31: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

32

Winter 2015

ResourcesSites

Cassandra• http://cassandra.apache.org

NoSQL News websites• http://nosql.mypopescu.com• http://www.nosqldatabases.com

“a practical guide to noSQL”, Posted by Denise Miura on March 17, 2011 at • http://blogs.marklogic.com/2011/03/17/a-practical-

guide-to-nosql/

Page 32: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

33

Winter 2015

ResourcesBooks

“Cassandra The Definition Guide”, O'Reilly Media, nov2013

“Cassandra Essential Toturial”, DataStax 2014

“Professional NoSQL”, Wrox, 2011

“NoSQL Distilled”, Martin Fowler, 2013

Page 33: NoSQL Database- cassandra column Base DB

+

NoSQL (part 2) - CAP Theorem & Column Oriented

33

34

Winter 2015

Questions

Page 34: NoSQL Database- cassandra column Base DB

+

Mohammad Sadegh [email protected]

Thank You