Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in...

9
Apache Cassandra tour (lab exercise) COSC430—Advanced Databases David Eyers

Transcript of Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in...

Page 1: Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in Cassandra: • Cluster • Data center(s) • Rack(s) • Server(s) • Node(s) • Uses

Apache Cassandra tour (lab exercise)

COSC430—Advanced Databases David Eyers

Page 2: Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in Cassandra: • Cluster • Data center(s) • Rack(s) • Server(s) • Node(s) • Uses

Learning objectives

• You should be able to • understand the architecture of Cassandra and its replication

strategies • explain how a distributed database works using Cassandra as

an example • understand the installation and configuration of Cassandra • understand why Cassandra can provide high availability with

no single point failure

• There is no assessment for this lab

2COSC430 Apache Cassandra lab exercise, 2020

Page 3: Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in Cassandra: • Cluster • Data center(s) • Rack(s) • Server(s) • Node(s) • Uses

What is Apache Cassandra?

• Apache Cassandra is a free and open-source distributed NoSQL DBMS signed to handle vast amounts of data across large clusters of commodity servers, providing high availability with no single point of failure

3COSC430 Apache Cassandra lab exercise, 2020

Page 4: Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in Cassandra: • Cluster • Data center(s) • Rack(s) • Server(s) • Node(s) • Uses

Cassandra uses peer-to-peer architecture

Elements in Cassandra: • Cluster • Data center(s) • Rack(s)

• Server(s) • Node(s)

• Uses a gossip protocol for communication between nodes • Cassandra Query Language

(CQL)—many similarities to SQL4COSC430 Apache Cassandra lab exercise, 2020

Application

Page 5: Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in Cassandra: • Cluster • Data center(s) • Rack(s) • Server(s) • Node(s) • Uses

Data replication

• Nodes are logically structured in a ring topology

• Each data item replicated at N (replication factor) nodes • Two replication strategies: • SimpleStrategy • use only for a single data centre and one rack • replicas are placed on the next node clockwise in the ring without

considering topology (i.e., rack or datacenter location)

• NetworkTopologyStrategy • cluster can be deployed across multiple data centres • attempts to place replicas on distinct racks because nodes in the same

rack (or similar physical grouping) often fail at the same time5COSC430 Apache Cassandra lab exercise, 2020

Page 6: Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in Cassandra: • Cluster • Data center(s) • Rack(s) • Server(s) • Node(s) • Uses

Apache Cassandra’s data model

6COSC430 Apache Cassandra lab exercise, 2020COSC430 Lecture 6: Apache Cassandra Tour 6

Data Model

keyspace

settings

column family

settings column

name value timestamp

Page 7: Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in Cassandra: • Cluster • Data center(s) • Rack(s) • Server(s) • Node(s) • Uses

Virtualisation: abstracting over resources

• Single OS per machine

• Software and hardware tightly coupled

• Underutilised resources

• Inflexible and costly infrastructure

• Hardware independent of OS and applications

• Virtual machines to any system

• OS and application as a single unit into virtual

7COSC430 Apache Cassandra lab exercise, 2020COSC430 Lecture 6: Apache Cassandra Tour 7

Virtualization •  Separation of resource from the underlying hardware •  An abstraction layer on top of the hardware

7.8"3& �8)*5&�1"1&3

*OUSPEVDUJPO"NPOH�UIF�MFBEJOH�CVTJOFTT�DIBMMFOHFT�DPOGSPOUJOH�$*0T�BOE�*5�NBOBHFST�UPEBZ�BSF��DPTU�FGGFDUJWF�VUJMJ[BUJPO�PG�*5�JOGSBTUSVD�UVSF��SFTQPOTJWFOFTT�JO�TVQQPSUJOH�OFX�CVTJOFTT�JOJUJBUJWFT��BOE�GMFYJCJMJUZ�JO�BEBQUJOH�UP�PSHBOJ[BUJPOBM�DIBOHFT��%SJWJOH�BO�BEEJUJPOBM�TFOTF�PG�VSHFODZ�JT�UIF�DPOUJOVFE�DMJNBUF�PG�*5�CVEHFU�DPOTUSBJOUT�BOE�NPSF�TUSJOHFOU�SFHVMBUPSZ�SFRVJSFNFOUT��7JSUVBMJ[BUJPO�JT�B�GVOEBNFOUBM�UFDIOPMPHJDBM�JOOPWBUJPO�UIBU�BMMPXT�TLJMMFE�*5�NBOBHFST�UP�EFQMPZ�DSFBUJWF�TPMVUJPOT�UP�TVDI�CVTJOFTT�DIBMMFOHFT�

7JSUVBMJ[BUJPO�0WFSWJFX

7JSUVBMJ[BUJPO�JO�B�/VUTIFMM4JNQMZ�QVU �WJSUVBMJ[BUJPO�JT�BO�JEFB�XIPTF�UJNF�IBT�DPNF��5IF�UFSN�WJSUVBMJ[BUJPO�CSPBEMZ�EFTDSJCFT�UIF�TFQBSBUJPO�PG�B�SFTPVSDF�PS�SFRVFTU�GPS�B�TFSWJDF�GSPN�UIF�VOEFSMZJOH�QIZTJDBM�EFMJWFSZ�PG�UIBU�TFSWJDF��8JUI�WJSUVBM�NFNPSZ �GPS�FYBNQMF �DPNQVUFS�TPGUXBSF�HBJOT�BDDFTT�UP�NPSF�NFNPSZ�UIBO�JT�QIZTJDBMMZ�JOTUBMMFE �WJB�UIF�CBDLHSPVOE�TXBQQJOH�PG�EBUB�UP�EJTL�TUPSBHF��4JNJMBSMZ �WJSUVBMJ[BUJPO�UFDIOJRVFT�DBO�CF�BQQMJFE�UP�PUIFS�*5�JOGSBTUSVDUVSF�MBZFST���JODMVEJOH�OFUXPSLT �TUPSBHF �MBQUPQ�PS�TFSWFS�IBSEXBSF �PQFSBUJOH�TZTUFNT�BOE�BQQMJDBUJPOT�

5IJT�CMFOE�PG�WJSUVBMJ[BUJPO�UFDIOPMPHJFT���PS�WJSUVBM�JOGSBTUSVD�UVSF���QSPWJEFT�B�MBZFS�PG�BCTUSBDUJPO�CFUXFFO�DPNQVUJOH �TUPSBHF�BOE�OFUXPSLJOH�IBSEXBSF �BOE�UIF�BQQMJDBUJPOT�SVOOJOH�PO�JU�TFF�'JHVSF����5IF�EFQMPZNFOU�PG�WJSUVBM�JOGSBTUSVDUVSF�JT�OPO�EJTSVQUJWF �TJODF�UIF�VTFS�FYQFSJFODFT�BSF�MBSHFMZ�VODIBOHFE��)PXFWFS �WJSUVBM�JOGSBTUSVDUVSF�HJWFT�BENJOJTUSBUPST�UIF�BEWBOUBHF�PG�NBOBHJOH�QPPMFE�SFTPVSDFT�BDSPTT�UIF�FOUFS�QSJTF �BMMPXJOH�*5�NBOBHFST�UP�CF�NPSF�SFTQPOTJWF�UP�EZOBNJD�PSHBOJ[BUJPOBM�OFFET�BOE�UP�CFUUFS�MFWFSBHF�JOGSBTUSVDUVSF�JOWFTUNFOUT�

'JHVSF����7JSUVBMJ[BUJPO

0QFSBUJOH�4ZTUFN

"QQMJDBUJPO"QQMJDBUJPO

0QFSBUJOH�4ZTUFN

7.XBSF�7JSUVBMJ[BUJPO�-BZFS

Y���"SDIJUFDUVSF

$16 .FNPSZ /*$ %JTL

"GUFS�7JSUVBMJ[BUJPO�t�)BSEXBSF�JOEFQFOEFODF�PG�PQFSBUJOH�TZTUFN�BOE�BQQMJDBUJPOT

t�7JSUVBM�NBDIJOFT�DBO�CF�QSPWJTJPOFE�UP�BOZ�TZTUFN�

t�$BO�NBOBHF�04�BOE�BQQMJDBUJPO�BT�B�TJOHMF�VOJU�CZ�FODBQTVMBUJOH�UIFN�JOUP�WJSUVBM�NBDIJOFT

#FGPSF�7JSUVBMJ[BUJPO�t�4JOHMF�04�JNBHF�QFS�NBDIJOF

t�4PGUXBSF�BOE�IBSEXBSF�UJHIUMZ�DPVQMFE�

t�3VOOJOH�NVMUJQMF�BQQMJDBUJPOT�PO�TBNF�NBDIJOF�PGUFO�DSFBUFT�DPOGMJDU

t�6OEFSVUJMJ[FE�SFTPVSDFT

t�*OGMFYJCMF�BOE�DPTUMZ�JOGSBTUSVDUVSF

"QQMJDBUJPO

0QFSBUJOH�4ZTUFN

Y���"SDIJUFDUVSF

$16 /*$ %JTL.FNPS Z

7.8"3& �8)*5&�1"1&3

*OUSPEVDUJPO"NPOH�UIF�MFBEJOH�CVTJOFTT�DIBMMFOHFT�DPOGSPOUJOH�$*0T�BOE�*5�NBOBHFST�UPEBZ�BSF��DPTU�FGGFDUJWF�VUJMJ[BUJPO�PG�*5�JOGSBTUSVD�UVSF��SFTQPOTJWFOFTT�JO�TVQQPSUJOH�OFX�CVTJOFTT�JOJUJBUJWFT��BOE�GMFYJCJMJUZ�JO�BEBQUJOH�UP�PSHBOJ[BUJPOBM�DIBOHFT��%SJWJOH�BO�BEEJUJPOBM�TFOTF�PG�VSHFODZ�JT�UIF�DPOUJOVFE�DMJNBUF�PG�*5�CVEHFU�DPOTUSBJOUT�BOE�NPSF�TUSJOHFOU�SFHVMBUPSZ�SFRVJSFNFOUT��7JSUVBMJ[BUJPO�JT�B�GVOEBNFOUBM�UFDIOPMPHJDBM�JOOPWBUJPO�UIBU�BMMPXT�TLJMMFE�*5�NBOBHFST�UP�EFQMPZ�DSFBUJWF�TPMVUJPOT�UP�TVDI�CVTJOFTT�DIBMMFOHFT�

7JSUVBMJ[BUJPO�0WFSWJFX

7JSUVBMJ[BUJPO�JO�B�/VUTIFMM4JNQMZ�QVU �WJSUVBMJ[BUJPO�JT�BO�JEFB�XIPTF�UJNF�IBT�DPNF��5IF�UFSN�WJSUVBMJ[BUJPO�CSPBEMZ�EFTDSJCFT�UIF�TFQBSBUJPO�PG�B�SFTPVSDF�PS�SFRVFTU�GPS�B�TFSWJDF�GSPN�UIF�VOEFSMZJOH�QIZTJDBM�EFMJWFSZ�PG�UIBU�TFSWJDF��8JUI�WJSUVBM�NFNPSZ �GPS�FYBNQMF �DPNQVUFS�TPGUXBSF�HBJOT�BDDFTT�UP�NPSF�NFNPSZ�UIBO�JT�QIZTJDBMMZ�JOTUBMMFE �WJB�UIF�CBDLHSPVOE�TXBQQJOH�PG�EBUB�UP�EJTL�TUPSBHF��4JNJMBSMZ �WJSUVBMJ[BUJPO�UFDIOJRVFT�DBO�CF�BQQMJFE�UP�PUIFS�*5�JOGSBTUSVDUVSF�MBZFST���JODMVEJOH�OFUXPSLT �TUPSBHF �MBQUPQ�PS�TFSWFS�IBSEXBSF �PQFSBUJOH�TZTUFNT�BOE�BQQMJDBUJPOT�

5IJT�CMFOE�PG�WJSUVBMJ[BUJPO�UFDIOPMPHJFT���PS�WJSUVBM�JOGSBTUSVD�UVSF���QSPWJEFT�B�MBZFS�PG�BCTUSBDUJPO�CFUXFFO�DPNQVUJOH �TUPSBHF�BOE�OFUXPSLJOH�IBSEXBSF �BOE�UIF�BQQMJDBUJPOT�SVOOJOH�PO�JU�TFF�'JHVSF����5IF�EFQMPZNFOU�PG�WJSUVBM�JOGSBTUSVDUVSF�JT�OPO�EJTSVQUJWF �TJODF�UIF�VTFS�FYQFSJFODFT�BSF�MBSHFMZ�VODIBOHFE��)PXFWFS �WJSUVBM�JOGSBTUSVDUVSF�HJWFT�BENJOJTUSBUPST�UIF�BEWBOUBHF�PG�NBOBHJOH�QPPMFE�SFTPVSDFT�BDSPTT�UIF�FOUFS�QSJTF �BMMPXJOH�*5�NBOBHFST�UP�CF�NPSF�SFTQPOTJWF�UP�EZOBNJD�PSHBOJ[BUJPOBM�OFFET�BOE�UP�CFUUFS�MFWFSBHF�JOGSBTUSVDUVSF�JOWFTUNFOUT�

'JHVSF����7JSUVBMJ[BUJPO

0QFSBUJOH�4ZTUFN

"QQMJDBUJPO"QQMJDBUJPO

0QFSBUJOH�4ZTUFN

7.XBSF�7JSUVBMJ[BUJPO�-BZFS

Y���"SDIJUFDUVSF

$16 .FNPSZ /*$ %JTL

"GUFS�7JSUVBMJ[BUJPO�t�)BSEXBSF�JOEFQFOEFODF�PG�PQFSBUJOH�TZTUFN�BOE�BQQMJDBUJPOT

t�7JSUVBM�NBDIJOFT�DBO�CF�QSPWJTJPOFE�UP�BOZ�TZTUFN�

t�$BO�NBOBHF�04�BOE�BQQMJDBUJPO�BT�B�TJOHMF�VOJU�CZ�FODBQTVMBUJOH�UIFN�JOUP�WJSUVBM�NBDIJOFT

#FGPSF�7JSUVBMJ[BUJPO�t�4JOHMF�04�JNBHF�QFS�NBDIJOF

t�4PGUXBSF�BOE�IBSEXBSF�UJHIUMZ�DPVQMFE�

t�3VOOJOH�NVMUJQMF�BQQMJDBUJPOT�PO�TBNF�NBDIJOF�PGUFO�DSFBUFT�DPOGMJDU

t�6OEFSVUJMJ[FE�SFTPVSDFT

t�*OGMFYJCMF�BOE�DPTUMZ�JOGSBTUSVDUVSF

"QQMJDBUJPO

0QFSBUJOH�4ZTUFN

Y���"SDIJUFDUVSF

$16 /*$ %JTL.FNPS Z

•  Single OS per machine•  Software and hardware tightly coupled•  Underutilized resources•  Inflexible and costly infrastructure

•  Hardware independent of OS and applications•  Virtual machines to any system•  OS and application as a single unit into virtual

machines

Page 8: Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in Cassandra: • Cluster • Data center(s) • Rack(s) • Server(s) • Node(s) • Uses

Docker and Vagrant

• Docker • Provides OS-level virtualisation, also known as containerisation • Package an application and its dependencies in a virtual

container that can be installed and run on any Linux server • Lightweight—a single server or virtual machine can run a large

number of containers simultaneously

• Vagrant • An open-source software platform for managing virtual

software development environments • Vagrant sits as a layer over the top of virtualisation software

8COSC430 Apache Cassandra lab exercise, 2020

Page 9: Apache Cassandra tour (lab exercise) · Cassandra uses peer-to-peer architecture Elements in Cassandra: • Cluster • Data center(s) • Rack(s) • Server(s) • Node(s) • Uses

Apache Cassandra lab exercise

• You can view a formatted version of the Markdown file containing the instructions at the following URL:https://altitude.otago.ac.nz/cosc430/cassandra-intro/-/blob/master/README.md

• In the past a PDF version of the instructions was provided, however some people’s PDF viewers were copy/pasting commands with extra spaces, so I have removed the PDF version

9COSC430 Apache Cassandra lab exercise, 2020