Post on 15-Jan-2015
description
Security approaches in BigTable-like storage systems
22951 Research Seminar: Information Security and Privacy July 2014
Open University of Israel
Grisha Weintraub
Abstract
• BigTable - Google’s scalable storage system. • Designed for internal(i.e. trusted) use. • Open sources implementations (e.g. HBase).• Can be deployed in a public cloud (i.e. DBaaS). • However one may not trust the public cloud
provider.• Our focus is on the approaches to make
BigTable-like systems secure.
Outline
• BigTable
• Security approaches :
Integrity(iBigTable)
Encryption(BigSecret)
Access Control(Accumulo)
BigTable - Introduction
• Fay Chang et al., Bigtable: A Distributed Storage System for Structured Data, OSDI2006 (Best Paper)
• Distributed storage system for managing structured data that is designed to scale to a very large size.
BigTable – Data Model
• BigTable is a sparse, distributed, persistent multidimensional sorted map.
• The map is indexed by a row key, column key, and a timestamp.
• (row_key,column_key,time) string
BigTable – Data Modelphone name user_id
178145 John 15
email name user_id
bob@gmail.com 29Bob t1Robert t2
row_keycolumn_key
timestamp
(29, name, t2) “Robert”
email phone name user_id
RDBMSApproach
null 178145 John 15bob@gmail.com null Bob 29
BigTable – Data Model• Columns are grouped into Column Families:
– family : optional qualifier
contactInfo : email contactInfo : phone name: user_id
john@yahoo.com 17814552 John 15
Column Family
Optional Qualifier
name user_id RDBMSApproach
John 15
value type user_id
178145 phone 15john@yahoo.com email 15
BigTable – Data Model
Value Timestamp Column Row-Key
Qualifier Family
Key Value
• Sorting order:– Row-Key Family Qualifier Timestamp
BigTable – Data Model
• Tablets :– Large tables broken into tablets at row boundaries.– Tablet holds contiguous range of rows.– Approximately 100-200 MB of data per tablet.
..… id
..… 15000
Tablet 1..… .…
..… 20000
..… 20001
Tablet 2..… .…
..… 25000
BigTable – API
• Metadata operations :– Creating and deleting tables, column families, modify access control
rights.
• Client operations :– Write/delete values– Read values– Scan row ranges
// Open the tableTable *T = OpenOrDie("/bigtable/users");
// Update name and delete a phoneRowMutation r1(T, “29");r1.Set(“name:", “Robert");r1.Delete(“contactInfo:phone");Operation op;Apply(&op, &r1);
BigTable – System Structure • Three major components:
– Client library
– Master (exactly one) :• Assigning tablets to tablet servers.• Detecting the addition and expiration of tablet servers.• Balancing tablet-server load.• Garbage collection of files in GFS.• Schema changes such as table and column family creations.
– Tablet Servers(multiple, dynamically added) :• Manages 10-1000 tablets• Handles read and write requests to the tablets.• Splits tablets that have grown too large.
BigTable – System Structure
BigTable – Tablet Location
• Three-level hierarchy analogous to that of a B+ tree to store tablet location information.
• Client library caches tablet locations.
BigTable – Tablet Serving• Writes :
– Updates committed to a commit log.– Recently committed updates are stored in memory – memtable.– Older updates are stored in a sequence of SSTables.
• Reads :– Read operation is executed on a merged view of the sequence of SSTables and the memtable.– Since the SSTables and the memtable are sorted, the merged view can be formed efficiently.
BigTable - Compactions
• Minor compaction:– Converts the memtable into SSTable.– Reduces memory usage.– Reduces log reads during recovery.
• Major compaction:– Merging compaction that results in a single SSTable.– No deletion records, only live data.– Good place to apply policy “keep only N versions”
Outline
• BigTable √
• Security approaches :
Integrity(iBigTable)
Encryption(BigSecret)
Access Control(Accumulo)
iBigTable - Introduction
• Wei Wei, Ting Yu, Rui Xue: iBigTable: practical data integrity for bigtable in public cloud. CODASPY 2013
• Enhancement of BigTable that provides scalable data integrity assurance.
iBigTable – System Model
BigTable
Data Owner
Clients
writes
reads
iBigTable - Goals
• Correctness:– returned records have not been modified in any way
• Completeness:– no answers have been omitted from the result
• Freshness:– results are based on the most current version of the data
iBigTable – System Design• Basic Idea:
– Build Merkle Hash Tree based Authenticated Data Structure for each tablet.
• Verification Object(VO) - Data returned along with result and used to authenticate the result.
• Example – VO for Data block 1 – {Hash 0-1, Hash 1}
iBigTable – System Design
Merkle B+ Tree
iBigTable – System Design
User Tablet User Tablet
Meta Tablet
Root Tablet
Data Owner
Root hash
• Pros:– Only maintain one hash for all data
• Cons:– Require update propagation– Concurrent updates could cause issues
User Tablet User Tablet
Meta Tablet
Root Tablet
Data OwnerRoot hash
Root hash
Root hash
Root hash
……
iBigTable – System Design
iBigTable – Reads
1.1 getMetaTabletLocation(table name, row key)
Tablet Server serving ROOT tabletClient
1.3 meta tablet location
1.4
verif
y
2.1 getUserTabletLocation(table name, row key)
Tablet Server serving META tabletClient
2.3 user tablet location
2.4
verif
y
3.1 getRow(row key)
Tablet Server serving USER tabletClient
3.3 row data
3.4
verif
y
1.2 generate VO
2.2 generate VO
2.2 generate VO
, VO
, VO
, VO
iBigTable – Updates
3.1 new/updated row
Tablet Server serving USER tabletData Owner
3.3 PT-VO
3.4 verify and update tablet root hash 3.2 generate PT-VO
Partial Tree Verification Object (PT-VO) – The difference between a VO and a PT-VO is that a PT-VO contains keys along with hashes, while a VO does not.
iBigTable – Updates
6030
10 50 80
0 10 20 5030 40 80 9060 70
70
Initial MB+ row tree of a tablet in a tablet server.
iBigTable – Updates
6030
50
5030 40
45
New Key 45
Insert a row with key 45 into partial tree VO
40 45
6030
50
5030
New Key 45
40
Partial tree VO after 45 is inserted
iBigTable – Authenticated Data Structure
• Projected range queries - expensive to generate and verify VOs.
SL-MBT: A single-level Merkle B+ tree
iBigTable – Authenticated Data Structure
TL-MBT: A two-level Merkle B+ tree.
Outline
• BigTable √
• Security approaches :
Integrity(iBigTable) √
Encryption(BigSecret)
Access Control(Accumulo)
BigSecret - Introduction
• Erman Pattuk et al., BigSecret: A Secure Data Management Framework for Key-Value Stores. IEEE CLOUD 2013
• A secure data management framework for BigTable-like storage systems.
BigSecret – System Model
BigTable
Clients
BigSecret
get(“Bob”, “email”) Get(“A4Vc”, “Zx$23”)
“DF77Xs9”“bob@gmail.com”
BigSecret – Goals• Secure storage of data on untrusted servers.
• Efficient query execution on encrypted data.
• Supported queries :– Put– Get– Delete– Scan
BigSecret – Preliminaries• Key :
– row||fam||qua||ts
• Symmetric Encryption:– E(p) c //encryption– D(c) p //decryption
• Pseudo-Random Functions(PRF):– H(m) h //deterministic random
• Bucketization:– Partitions p1,p2,… of domain Z.– Ident function that assigns unique random identifiers to each partition.– Map function that takes a partitioned domain, a value v from the domain, and returns
Ident(p), where v belongs to p.
BigSecret – Bucketization
0 100002000 4000 6000 8000
34 97 123 266 771
Map(100) = 34 Map(6451) = 266
Order-preserving mapping:x<y Map(x) < Map(y)
BigSecret – Encryption Models
Naive approach – encrypt values only
BigSecret BigTable
Put(row, fam, qua, ts, value ) Put (row, fam, qua, ts, E(value))
E(value)D(E(value))
– All operations are supported.– Relatively good performance.– Only minor changes to the system are required.– Poor privacy.
BigSecret – Encryption Models
Model-1– bucketization for all key parts
BigSecret BigTable
Put(row, fam, qua, ts, value ) Put (Map(row), Map(fam), Map(qua)||E(key), Map(ts), E(value))
– All operations are supported.– Relatively bad performance.– Privacy-performance trade-off.
Scan(row_from, row_to, fam)
Scan(200, 300, contactInfo)
Scan(Map(row_from), Map(row_to), Map(fam))
Scan(34, 34, 452)
BigSecret – Encryption Models
Model-2– PRF for all key parts
BigSecret BigTable
Put(row, fam, qua, ts, value ) Put (H(row), H(fam), H(qua)||E(key), H(ts), E(value))
– Scan is not supported.– Relatively good performance.– Frequency-based attacks.
Get(row, fam, qua)
Get(200, contactInfo, email)
Get(H(row), H(fam), H(qua))
Get(Az54Et, q8dj8, qWd29h)
BigSecret – Encryption Models
Frequency-based attacks(Damiani et al. 2003)
Possible solutions:• Decreasing the range of the PRFs.• Model-3
city name id
Tel-Aviv Alice 19New York Bob 24
Paris Carol 32New York Alice 38
city name id
$ 27 j
& 14 a
* 23 t
& 27 z
27 = “Alice”& = “New York”
Alice lives in NY
BigSecret – Encryption Models
Model-3– PRF only for row-key
BigSecret BigTable
Put(row, fam, qua, ts, value ) Put (H(row), 0, E(key), 1, E(value))
– Scan is not supported.– Relatively good privacy.– Performance ?
Get(row, fam, qua)
Get(200, contactInfo, email)
Get(H(row), 0, null)
Get(Az54Et, 0, null)
BigSecret – Encryption Models
Outline
• BigTable √
• Security approaches :
Integrity(iBigTable) √
Encryption(BigSecret) √
Access Control(Accumulo)
Accumulo- Introduction
• Adam Fuchs, Apache Accumulo: Extensions to Google's Bigtable Design, 2012, lecture conducted from Morgan State University
• An extension of BigTable that provides cell-level access control.
Accumulo – System Model
BigTable
Value Qualifier Family Row
Bob name 14bob@g.com email contactInfo 14sodium : 137 …
blood test
healthData 14
Patient suffers from .…
doctor’s notes
healthData 14
… … .… …
email, blood test
blood test, notes
Bob
Accumulo – System Model
BigTable
credentials, query
lookup user user authorization set
auth, query
datadata
Accumulo- Data Model
Value Timestamp Column Row-Key
Visibility Qualifier Family
Value Timestamp Column Row-Key
Qualifier Family
Security labels (e.g. A|(B&C) )
Accumulo- Visibility
• Syntax:– A&B – both A and B required– A|B – must have either A or B – A|(B & C) – must have A or both B and C
• Examples:– Admin|(Manager & Sales)– Citizen & Adult– Secret | Top Secret
Accumulo- Visibility
Value Visibility Qualifier Family RowBob name 14bob@g.com bob14 email contactInfo 14
sodium : 137 …
bob14|doctor blood test healthData 14
Patient suffers from .…
doctor doctor’s notes
healthData 14
… … .… …
Accumulo – Visibility
BigTable
(bob, ***), health data
lookup user {bob14}
{bob14}, health data
blood testblood test
Visibility Qualifier Family
doctor notes HealthData
bob14|doctor
blood test HealthData
Bob
Accumulo- Iterators
Iterator
Accumulo- Iterators
Outline
• BigTable √
• Security approaches :
Integrity(iBigTable) √
Encryption(BigSecret) √
Access Control(Accumulo) √
References
• Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert Gruber: Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!). OSDI 2006:205-218
• Wei Wei, Ting Yu, Rui Xue: iBigTable: practical data integrity for bigtable in public cloud. CODASPY 2013:341-352
• Erman Pattuk, Murat Kantarcioglu, Vaibhav Khadilkar, Huseyin Ulusoy, Sharad Mehrotra: BigSecret: A Secure Data Management Framework for Key-Value Stores. IEEE CLOUD 2013:147-154
• http://accumulo.apache.org/