Toronto jaspersoft meetup
-
Upload
patrick-mcfadin -
Category
Documents
-
view
118 -
download
9
description
Transcript of Toronto jaspersoft meetup
©2012 DataStax1
Move. Faster.
Toronto Jaspersoft User Group
Patrick McFadin, Principal Solution Architect@PatrickMcFadin
2©2012 DataStax©2012 DataStax
About Me/Moi?
2
• Principal Solution Architect at DataStax, THE Cassandra company
• Cassandra user since .7
• Prior
- Chief Architect at Hobsons
- Started a software services company. Link-11
• Follow me here: @PatrickMcFadin
3©2012 DataStax©2012 DataStax
Who is
3
• We employ most of the Cassandra committers• 24/7 support• Consulting• DataStax enterprise
4©2012 DataStax
4
And beer!
And cupcakes! (??)
©2012 DataStax5
Our Solution
DataStax Enterprise allows you to focus on your Big Data applications instead of battling your underlying infrastructure:
•Velocity
•Volume
•Variety
•Complexity
•Distribution
6©2012 DataStax
6
DATASTAX Enterprise also includes…
•Log4j application log integration•A single graphical management tool •World-class support
7©2012 DataStax
7
Cassandra as real-time foundation
•Continuous availability•Extreme scale •Multi-datacenter support •Cloud enablement•Operational simplicity
8©2012 DataStax
8
Hadoop in the same system:
•Batch analytics •Reduced data movement, less ETL operations•No complex architectures•Integrated mahout, sqoop, hive, pig, etc.
9©2012 DataStax
9
And we integrate Solr:
•Enterprise search •Always indexed data•Scalable performance•Mission-critical dependability
10©2012 DataStax
10
Can we just talk Can we just talk about Cassandraabout Cassandra
... and aliens. ... and aliens.
11©2012 DataStax
11
Roots
DynamoDynamo
BigTableBigTable
12©2012 DataStax
12
Shared NothingCore concepts
13©2012 DataStax
13
Core concepts Replicated
14©2012 DataStax
14
Core concepts WAN Replication
15©2012 DataStax
15
Core concepts Scaling
• Need more write throughput? - add nodes
• Need more read throughput? - add nodes
• Cassandra scales in a linear fashion
• Massive number of ops/sec
16©2012 DataStax
16
Core concepts Scaling
Source: Solving big data challenges for enterprise application performance managementProceedings of the VLDB Endowment, Volume 5 Issue 12, August 2012, Pages 1724-1735
17©2012 DataStax
17
Core concepts
CConsistencyonsistency--
Eventual, but
Cassandra will not
lose your data.
PPartition-artition-Nodes canNodes can’’t see t see each other but each other but
cluster is still upcluster is still up
AAvailabilityvailability- -
Max uptime for Max uptime for clientsclients
CAP Theorem
Cassandra lives here
...and sometimes lives here
It’s your choice!
18©2012 DataStax
18
Core concepts Availability
Text
Continuous Availability > High Availability
Your infrastructure will fail ...deal with it.
19©2012 DataStax
19
Data Model Basics
20©2012 DataStax
20
Data Model Basics Cluster
Cluster - Multiple Nodes acting together. Even over WAN.Cluster - Multiple Nodes acting together. Even over WAN.
Keyspace - Logical collection of Column Families. StoresKeyspace - Logical collection of Column Families. Stores replication strategy.replication strategy.
Column Family (Table) - Stores rows of dataColumn Family (Table) - Stores rows of data
21©2012 DataStax
21
Data Model Basics Rows
• Unique in column family
• Hashed
• Randomly assigned to node*
• Indexed for speed
*You pick the partitioner. Please pick random. Please. Please. Please
22©2012 DataStax
22
Data Model Basics Columns
• Assigned to a row
• Column Name: 64k ByteArray
• Column Value: 2G ByteArray (!!)
• Timestamp of when set
• Optional: Expire TTL
• Dynamic
Column NameColumn Name
Column ValueColumn Value
TimestampTimestamp
TTLTTL
RowRow ...
23©2012 DataStax
23
Data Model Basics Wide Rows
• How wide? 2 Billion columns!!!
• No schema needed
• Row key, many columns
• Add columns as needed per row
24©2012 DataStax
24
Data Model Basics Data Access
Thrift
• Cassandra's client API built entirely on top of Thrift*
• Provides for manipulation of Data Model and Data
• Almost all current clients implement this API
CQL
• Cassandra Query Language
• New binary driver as of 1.2
• Extends functionality beyond Thrift
25©2012 DataStax
25
Data Model Basics Data Access
More about CQL
• Rapidly evolving spec
- Version 1 since Cassandra 0.8
- Version 2 since Cassandra 1.0
- Version 3 since Cassandra 1.1
- Final cut in 1.2
• Offers more enhanced features than thrift
• DataStax Drivers
26©2012 DataStax
26
Data Model Basics Fixed schema
CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username));
• Similar to a RDBMS table. Fairly fixed columns • This example: Row key = username and is unique• Use secondary indexes on firstname and lastname for lookup• Adding columns with Cassandra is super easy (no downtime)
CREATE INDEX user_firstname ON users (firstname);CREATE INDEX user_lastname ON users (lastname);
27©2012 DataStax
27
Data Model Basics One-to-many
CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts));
• Videos have many comments• Comments have many users• Order is as inserted (Reversable if needed)• Use getSlice() to pull some or all of the comments
28©2012 DataStax
28
Wide rowTime ordered
Data Model Basics One-to-many pt2
• Underlying storage model is still wide rows
• CQL presents as a table
• username and comment_ts are filterable
SELECT comment FROM commentsWHERE username = ‘ctodd’ AND comment_ts > ‘2012-07-12 10:30:00’;
29©2012 DataStax
29
Data Model Basics Query Tables
• No joins in Cassandra
• Filtering and scans can be expensive• Tag is unique regardless of video• Great for “List videos with X tag”• Tags have to be updated in Video and Tag at the same time• Index integrity is maintained in app logic
CREATE TABLE tag_index ( tag varchar, videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid));
Powerful performance tool!
30©2012 DataStax
30
Data Model Basics Loading data
sstableloader -d 10.0.0.100 /home/pmcfadin/dbfiles
> 1 Million rows• BI Tools - Talend, Pentaho, JasperSoft
• Custom code - My personal favorite
• sstable loader - Only for specific file types
Requires files to be in sstable format
31©2012 DataStax
31
Data Model Basics Loading data
< 1 Million rows• Everything that worked for 1 Million +
• CQL copy command
• Loads a delimited file into a table
COPY customers(Card_ID, Registration_Date, Gender, Birth_Date) FROM 'Customers_File.txt' WITH HEADER=true AND DELIMITER=’,';
32©2012 DataStax
32
Cassandra 1.2 Data Access
•Collections (maps, sets, lists)Support for virtual nodes (vnodes)Query ProfilerAtomic batchesEnhanced JBOD supportNative binary CQL transport (no Thrift)Parallel leveled compactionsOff-heap bloom filters
33©2012 DataStax
Collections
•Structure to column values
•Insert and update
•Map
•List
•Set
33
cqlsh> CREATE TABLE users ( user_id text PRIMARY KEY, first_name text, last_name text, emails set<text> );
http://www.datastax.com/dev/blog/cql3_collections
34©2012 DataStax
Request tracing
•Automatically stored for 24h
•Full path trace
•Includes node info
34
cqlsh> tracing on;Now tracing requests.
cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example');Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9
activity | timestamp | source | source_elapsed-------------------------------------+--------------+-----------+---------------- execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779
Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888
Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550 Request complete | 00:02:37,017 | 127.0.0.1 | 2581
http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
35©2012 DataStax
Virtual Nodes (vnodes)
•Many nodes per JVM
•Tokens are auto-assigned (!!!)
•Faster...
✓repair
✓bootstrap
✓decommission
35
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
36©2012 DataStax
36
Data Model Basics Data Access
DEMO