Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
-
Upload
yahoo-developer-network -
Category
Technology
-
view
2.309 -
download
0
description
Transcript of Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
sqrrl data, INC.Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Adam Fuchs, Chief Technology Officer
Who We are
2
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
is the commercial provider of
Mature Database Technology - Apache Accumulo
Fine-Grained Access Controls - Data Integration and Sharing
Proven Performance - Petabytes and Beyond
Advanced Analytics - Search, Statistics, and Graphs
Contents
Core Philosophy
Technology
Techniques
Application APIs
3
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Integration across:
Multiple business linesMultiple data setsMultiple applicationsMultiple security, privacy, legal, policy, regulatory, and compliance constraintsNew demands
Apache Accumulo Perspective
Application
Data Data Data
Application Application
4
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Accumulo Design Drivers
Scalability Near linear performance improvements at thousands of nodes Durable and reliable under increased failures that come with scale
2
Diverse, Interactive Analytics Sorted key/value core performs well in a diverse set of domains Information retrieval, statistics, graph analysis, geo indexing, and more
3
Cell-Level Security Express common security requirements in the infrastructure, not just in the application Data-centric approach encourages secure sharing
1
5
Secure. Scale. Adapt.
Flexible, Adaptive Schema Start with universal structures and indexing Refine the schema over time
4
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Contents
Core Philosophy
Technology
Techniques
Application APIs
6
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Accumulo Key Structure
An Accumulo key is a 5-tuple, consisting of:
Row: Controls AtomicityColumn Family: Controls Locality Column Qualifier: Controls UniquenessVisibility Label: Controls AccessTimestamp: Controls Versioning
Row Col. Fam. Col. Qual. Visibility Timestamp Value
John Doe Notes PCP PCP_JD 20120912 Patient suffers from an acute …
John Doe Test Results Cholesterol JD|PCP_JD 20120912 183
John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass
John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100…
Accumulo Key/Value Example
7
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Visibility Syntax & Semantics
8
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Tablets
9
Collections of KV pairs form Tables
Tables are partitioned into Tablets
Metadata tablets hold info about other tablets, forming a 3-level hierarchy
A Tablet is a unit of work for a Tablet Server
Root Tablet-∞ to ∞
Metadata Tablet 1-∞ to “Encyclopedia:Ocelot”
Data Tablet-∞ : thing
Data Tabletthing : ∞
Data Tablet-∞ : Ocelot
Data TabletOcelot : Yak
Data TabletYak : ∞
Data Tablet-∞ to ∞
Metadata Tablet 2 “Encyclopedia:Ocelot” to ∞
Well-Known Location
(zookeeper)
Table: Adam’s Table Table: Encyclopedia Table: Foo
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Accumulo Architecture
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Application
Zookeeper
Zookeeper
Zookeeper
Master
Hadoop
Read/Write
Store/Replicate
Assign/Balance
Delegate Authority
Delegate Authority
Application
Application
10
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Tablet Data Flow
In-Memory Map
Write AheadLog
(For Recovery)
Sorted, Indexed
File
Sorted, Indexed
FileSorted, Indexed
File
Tablet
ReadsIterator
TreeMinor
Compaction
Merging / Major Compaction
Iterator Tree
Writes
11
Secure. Scale. Adapt.
Iterator Tree
Scan
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Contents
Core Philosophy
Technology
Techniques
Application APIs
16
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Hierarchical Decomposition
17
Row:
Column Family:
Column Qualifier:
Value:
<person>
attribute purchases returns
age
<age>
discount
<cost>
hat
<cost>
sneakers
<40%>
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Materialized Table
18
Row:
Column Family:
Column Qualifier:
Value:
george
attribute purchases returns
age
27 $83
hat
$42
sneakers
bill
attribute purchases
40%
sneakers
$100
discount
49
age
Secure. Scale. Adapt.
Key/Value Pair
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Forward and Inverted Index
19
Table:
Row:
Column Family:
Column Qualifier:
Value:
Forward Index
<UUID>
<Type>
<Field>
<Term>
Inverted Index
<Term>
<Type> + <Field>
<UUID>
<Digest of Event>
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Forward and Inverted Index
20
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Graph Analysis
21
Table:
Row:
Column Family:
Column Qualifier:(Tuples):
Value:
Graph Table
<Node ID>
“Node Info” “Out Edges” “In Edges”
<Field>
<Value>
<Node ID>
<Edge ID>
<Edge Info>
<Node ID>
<Edge ID>
<Edge Info>
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Geospatial Queries
22
Table:
Row:
Column Family:
Column Qualifier:
Value:
Geo Index
<GeoHash>
<Event Type>
<UUID>
<Digest of Event>
Secure. Scale. Adapt.
Latitude10110101001
Longitude00111010010
101001110111010101011100001011100
Depth11010110110
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Document Partitioning
23
Table:
Row:
Column Family:
Column Qualifier(Tuples):
Value:
Shard Table
<Partition ID>
“Docs” “Inv. Index” “Field Index”
<UUID>
<Value>
<Term>
<UUID>
<Field:Term>
<UUID>
Secure. Scale. Adapt.
<Field>
“Geo”
<Hash>
<UUID>
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Document Partitioning
24
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Intersecting Iterator
26
Secure. Scale. Adapt.
‘foo’ and (‘bar’ or ‘baz’)
<Partition ID>
“Docs” “Inv. Index”
<UUID>
<Value>
<Term>
<UUID><Field>
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Contents
Core Philosophy
Technology
Techniques
Application APIs
27
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
acorn
28
Key/Value pairs are great! How do I construct a document partitioning key again?
Techniques should be built into an APILet the people have polyglotLucene, SQL, SPARQL, JAQL, Matlab (not just Key, Value, Range)
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
=
+
+
Combined IR + Graph Search
29
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Schema-less Stats
30
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Get Involved
http://accumulo.apache.org
Help us make Accumulo even better!
31
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
Contact
32
Adam Fuchs, CTO
sqrrl data, Inc.617-520-4375
www.sqrrl.com@sqrrl_inc
Secure. Scale. Adapt.
[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved