Security approaches in BigTable-like storage systems

Post on 15-Jan-2015

270 views 2 download

Tags:

description

BigTable is a Google's distributed storage system that is designed to manage large-scale structured data. BigTable was designed for internal (i.e. trusted) use and therefore no security considerations were taken into account. Since 2006, following the publication of the paper that describes BigTable's architecture, several open-source BigTable-like systems have been developed (e.g. HBase, Hypertable). One of the primary uses of such systems is cloud storage - service that provides users with access to data without the need for managing hardware or software. However users may not trust cloud provider and hence appropriate security techniques should be applied. In this seminar three different security approaches for BigTable-like systems are reviewed: 1. iBigTable - enhancement of BigTable that provides scalable data integrity assurance. 2. BigSecret - secure data management framework for BigTable-like storage systems. 3. Accumulo – extension of BigTable that provides cell-level access control.

Transcript of Security approaches in BigTable-like storage systems

Security approaches in BigTable-like storage systems

22951 Research Seminar: Information Security and Privacy July 2014

Open University of Israel

Grisha Weintraub

Abstract

• BigTable - Google’s scalable storage system. • Designed for internal(i.e. trusted) use. • Open sources implementations (e.g. HBase).• Can be deployed in a public cloud (i.e. DBaaS). • However one may not trust the public cloud

provider.• Our focus is on the approaches to make

BigTable-like systems secure.

Outline

• BigTable

• Security approaches :

Integrity(iBigTable)

Encryption(BigSecret)

Access Control(Accumulo)

BigTable - Introduction

• Fay Chang et al., Bigtable: A Distributed Storage System for Structured Data, OSDI2006 (Best Paper)

• Distributed storage system for managing structured data that is designed to scale to a very large size.

BigTable – Data Model

• BigTable is a sparse, distributed, persistent multidimensional sorted map.

• The map is indexed by a row key, column key, and a timestamp.

• (row_key,column_key,time) string

BigTable – Data Modelphone name user_id

178145 John 15

email name user_id

bob@gmail.com 29Bob t1Robert t2

row_keycolumn_key

timestamp

(29, name, t2) “Robert”

email phone name user_id

RDBMSApproach

null 178145 John 15bob@gmail.com null Bob 29

BigTable – Data Model• Columns are grouped into Column Families:

– family : optional qualifier

contactInfo : email contactInfo : phone name: user_id

john@yahoo.com 17814552 John 15

Column Family

Optional Qualifier

name user_id RDBMSApproach

John 15

value type user_id

178145 phone 15john@yahoo.com email 15

BigTable – Data Model

Value Timestamp Column Row-Key

Qualifier Family

Key Value

• Sorting order:– Row-Key Family Qualifier Timestamp

BigTable – Data Model

• Tablets :– Large tables broken into tablets at row boundaries.– Tablet holds contiguous range of rows.– Approximately 100-200 MB of data per tablet.

..… id

..… 15000

Tablet 1..… .…

..… 20000

..… 20001

Tablet 2..… .…

..… 25000

BigTable – API

• Metadata operations :– Creating and deleting tables, column families, modify access control

rights.

• Client operations :– Write/delete values– Read values– Scan row ranges

// Open the tableTable *T = OpenOrDie("/bigtable/users");

// Update name and delete a phoneRowMutation r1(T, “29");r1.Set(“name:", “Robert");r1.Delete(“contactInfo:phone");Operation op;Apply(&op, &r1);

BigTable – System Structure • Three major components:

– Client library

– Master (exactly one) :• Assigning tablets to tablet servers.• Detecting the addition and expiration of tablet servers.• Balancing tablet-server load.• Garbage collection of files in GFS.• Schema changes such as table and column family creations.

– Tablet Servers(multiple, dynamically added) :• Manages 10-1000 tablets• Handles read and write requests to the tablets.• Splits tablets that have grown too large.

BigTable – System Structure

BigTable – Tablet Location

• Three-level hierarchy analogous to that of a B+ tree to store tablet location information.

• Client library caches tablet locations.

BigTable – Tablet Serving• Writes :

– Updates committed to a commit log.– Recently committed updates are stored in memory – memtable.– Older updates are stored in a sequence of SSTables.

• Reads :– Read operation is executed on a merged view of the sequence of SSTables and the memtable.– Since the SSTables and the memtable are sorted, the merged view can be formed efficiently.

BigTable - Compactions

• Minor compaction:– Converts the memtable into SSTable.– Reduces memory usage.– Reduces log reads during recovery.

• Major compaction:– Merging compaction that results in a single SSTable.– No deletion records, only live data.– Good place to apply policy “keep only N versions”

Outline

• BigTable √

• Security approaches :

Integrity(iBigTable)

Encryption(BigSecret)

Access Control(Accumulo)

iBigTable - Introduction

• Wei Wei, Ting Yu, Rui Xue: iBigTable: practical data integrity for bigtable in public cloud. CODASPY 2013

• Enhancement of BigTable that provides scalable data integrity assurance.

iBigTable – System Model

BigTable

Data Owner

Clients

writes

reads

iBigTable - Goals

• Correctness:– returned records have not been modified in any way

• Completeness:– no answers have been omitted from the result

• Freshness:– results are based on the most current version of the data

iBigTable – System Design• Basic Idea:

– Build Merkle Hash Tree based Authenticated Data Structure for each tablet.

• Verification Object(VO) - Data returned along with result and used to authenticate the result.

• Example – VO for Data block 1 – {Hash 0-1, Hash 1}

iBigTable – System Design

Merkle B+ Tree

iBigTable – System Design

User Tablet User Tablet

Meta Tablet

Root Tablet

Data Owner

Root hash

• Pros:– Only maintain one hash for all data

• Cons:– Require update propagation– Concurrent updates could cause issues

User Tablet User Tablet

Meta Tablet

Root Tablet

Data OwnerRoot hash

Root hash

Root hash

Root hash

……

iBigTable – System Design

iBigTable – Reads

1.1 getMetaTabletLocation(table name, row key)

Tablet Server serving ROOT tabletClient

1.3 meta tablet location

1.4

verif

y

2.1 getUserTabletLocation(table name, row key)

Tablet Server serving META tabletClient

2.3 user tablet location

2.4

verif

y

3.1 getRow(row key)

Tablet Server serving USER tabletClient

3.3 row data

3.4

verif

y

1.2 generate VO

2.2 generate VO

2.2 generate VO

, VO

, VO

, VO

iBigTable – Updates

3.1 new/updated row

Tablet Server serving USER tabletData Owner

3.3 PT-VO

3.4 verify and update tablet root hash 3.2 generate PT-VO

Partial Tree Verification Object (PT-VO) – The difference between a VO and a PT-VO is that a PT-VO contains keys along with hashes, while a VO does not.

iBigTable – Updates

6030

10 50 80

0 10 20 5030 40 80 9060 70

70

Initial MB+ row tree of a tablet in a tablet server.

iBigTable – Updates

6030

50

5030 40

45

New Key 45

Insert a row with key 45 into partial tree VO

40 45

6030

50

5030

New Key 45

40

Partial tree VO after 45 is inserted

iBigTable – Authenticated Data Structure

• Projected range queries - expensive to generate and verify VOs.

SL-MBT: A single-level Merkle B+ tree

iBigTable – Authenticated Data Structure

TL-MBT: A two-level Merkle B+ tree.

Outline

• BigTable √

• Security approaches :

Integrity(iBigTable) √

Encryption(BigSecret)

Access Control(Accumulo)

BigSecret - Introduction

• Erman Pattuk et al., BigSecret: A Secure Data Management Framework for Key-Value Stores. IEEE CLOUD 2013

• A secure data management framework for BigTable-like storage systems.

BigSecret – System Model

BigTable

Clients

BigSecret

get(“Bob”, “email”) Get(“A4Vc”, “Zx$23”)

“DF77Xs9”“bob@gmail.com”

BigSecret – Goals• Secure storage of data on untrusted servers.

• Efficient query execution on encrypted data.

• Supported queries :– Put– Get– Delete– Scan

BigSecret – Preliminaries• Key :

– row||fam||qua||ts

• Symmetric Encryption:– E(p) c //encryption– D(c) p //decryption

• Pseudo-Random Functions(PRF):– H(m) h //deterministic random

• Bucketization:– Partitions p1,p2,… of domain Z.– Ident function that assigns unique random identifiers to each partition.– Map function that takes a partitioned domain, a value v from the domain, and returns

Ident(p), where v belongs to p.

BigSecret – Bucketization

0 100002000 4000 6000 8000

34 97 123 266 771

Map(100) = 34 Map(6451) = 266

Order-preserving mapping:x<y Map(x) < Map(y)

BigSecret – Encryption Models

Naive approach – encrypt values only

BigSecret BigTable

Put(row, fam, qua, ts, value ) Put (row, fam, qua, ts, E(value))

E(value)D(E(value))

– All operations are supported.– Relatively good performance.– Only minor changes to the system are required.– Poor privacy.

BigSecret – Encryption Models

Model-1– bucketization for all key parts

BigSecret BigTable

Put(row, fam, qua, ts, value ) Put (Map(row), Map(fam), Map(qua)||E(key), Map(ts), E(value))

– All operations are supported.– Relatively bad performance.– Privacy-performance trade-off.

Scan(row_from, row_to, fam)

Scan(200, 300, contactInfo)

Scan(Map(row_from), Map(row_to), Map(fam))

Scan(34, 34, 452)

BigSecret – Encryption Models

Model-2– PRF for all key parts

BigSecret BigTable

Put(row, fam, qua, ts, value ) Put (H(row), H(fam), H(qua)||E(key), H(ts), E(value))

– Scan is not supported.– Relatively good performance.– Frequency-based attacks.

Get(row, fam, qua)

Get(200, contactInfo, email)

Get(H(row), H(fam), H(qua))

Get(Az54Et, q8dj8, qWd29h)

BigSecret – Encryption Models

Frequency-based attacks(Damiani et al. 2003)

Possible solutions:• Decreasing the range of the PRFs.• Model-3

city name id

Tel-Aviv Alice 19New York Bob 24

Paris Carol 32New York Alice 38

city name id

$ 27 j

& 14 a

* 23 t

& 27 z

27 = “Alice”& = “New York”

Alice lives in NY

BigSecret – Encryption Models

Model-3– PRF only for row-key

BigSecret BigTable

Put(row, fam, qua, ts, value ) Put (H(row), 0, E(key), 1, E(value))

– Scan is not supported.– Relatively good privacy.– Performance ?

Get(row, fam, qua)

Get(200, contactInfo, email)

Get(H(row), 0, null)

Get(Az54Et, 0, null)

BigSecret – Encryption Models

Outline

• BigTable √

• Security approaches :

Integrity(iBigTable) √

Encryption(BigSecret) √

Access Control(Accumulo)

Accumulo- Introduction

• Adam Fuchs, Apache Accumulo: Extensions to Google's Bigtable Design, 2012, lecture conducted from Morgan State University

• An extension of BigTable that provides cell-level access control.

Accumulo – System Model

BigTable

Value Qualifier Family Row

Bob name 14bob@g.com email contactInfo 14sodium : 137 …

blood test

healthData 14

Patient suffers from .…

doctor’s notes

healthData 14

… … .… …

email, blood test

blood test, notes

Bob

Accumulo – System Model

BigTable

credentials, query

lookup user user authorization set

auth, query

datadata

Accumulo- Data Model

Value Timestamp Column Row-Key

Visibility Qualifier Family

Value Timestamp Column Row-Key

Qualifier Family

Security labels (e.g. A|(B&C) )

Accumulo- Visibility

• Syntax:– A&B – both A and B required– A|B – must have either A or B – A|(B & C) – must have A or both B and C

• Examples:– Admin|(Manager & Sales)– Citizen & Adult– Secret | Top Secret

Accumulo- Visibility

Value Visibility Qualifier Family RowBob name 14bob@g.com bob14 email contactInfo 14

sodium : 137 …

bob14|doctor blood test healthData 14

Patient suffers from .…

doctor doctor’s notes

healthData 14

… … .… …

Accumulo – Visibility

BigTable

(bob, ***), health data

lookup user {bob14}

{bob14}, health data

blood testblood test

Visibility Qualifier Family

doctor notes HealthData

bob14|doctor

blood test HealthData

Bob

Accumulo- Iterators

Iterator

Accumulo- Iterators

Outline

• BigTable √

• Security approaches :

Integrity(iBigTable) √

Encryption(BigSecret) √

Access Control(Accumulo) √

References

• Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert Gruber: Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!). OSDI 2006:205-218

• Wei Wei, Ting Yu, Rui Xue: iBigTable: practical data integrity for bigtable in public cloud. CODASPY 2013:341-352

• Erman Pattuk, Murat Kantarcioglu, Vaibhav Khadilkar, Huseyin Ulusoy, Sharad Mehrotra: BigSecret: A Secure Data Management Framework for Key-Value Stores. IEEE CLOUD 2013:147-154

• http://accumulo.apache.org/