Auditing Big Data for Privacy, Security and · PDF file2013 Fall Conference –...

Post on 27-Mar-2018

216 views 0 download

Transcript of Auditing Big Data for Privacy, Security and · PDF file2013 Fall Conference –...

CRISC

CGEIT

CISM

CISA2013 Fall Conference – “Sail to Success”

Auditing Big Data for

Privacy, Security and Compliance

Davi Ottenheimer @daviottenheimer

Senior Director of Trust, EMC

In-Depth Seminars – D21

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Introduction

• Davi Ottenheimer (@daviottenheimer)

– 19th Year InfoSec

– CISM, Platinum ISACA (1997)

– Ex-Big 5 Auditor, Ex-PCI QSA/PA-QSA

– Co-Author “Securing the Virtual Environment”

• @EMCTrustedIT

2

PivotalPivotal

VMwareVMware

EMCEMC

RS

AR

SA

Big

DataCloud

Trust

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Agenda

• Big Data

• Operations

• Auditing

3

10/3/2013 4

CRISC

CGEIT

CISM

CISA2013 Fall Conference – “Sail to Success”

BIG DATA

4

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

The Big Data Dilemma

“We have massive amounts of data. We know

who you are. We know what your history has

been on the airline. We can customize our

offerings.”

- Delta CEO

5

http://m.apnews.com/ap/db_289563/contentdetail.htm?contentguid=InpMBxrL

“Airlines have yet to find the right balance

between being helpful and being creepy.”

- Associated Press

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Big Data Definitions• Many Long-Standing Examples

– Astronomy (“Billions and billions”)

– Meteorology (Storms)

– Geology (Quakes)

– Anatomy (Disease)

– Economics (Fraud)

– Espionage (Echelon SIGINT, PRISM…)

• Three V’s (Volume, Variety, Velocity)

6

http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Structured + Unstructured Data = Big

7

Internet of Things

Telemetry, Location-Based, etc.

Non-Enterprise

Structured in

Relational Databases

Managed, Unmanaged

& Unstructured

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Global Flight Analysis

8

http://www.spatialanalysis.ca/2011/global-connectivity-mapping-out-flight-routes/

http://www.computerweekly.com/news/2240176248/GE-uses-big-data-to-power-machine-services-business

� 60,000 Total Routes

� 1 Tb/day Data Each Gas Turbine Engine

� 400K gal/yr Saved by AA Paperless Pilot (-35lb)

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

What is So Pinterest-ing?

• Lists

– Users you follow

– Boards (and related users) you follow

– Your followers

– People who follow your boards

– Boards you follow

– Boards you unfollowed after following a user

• Followers and unfollowers of each board

9

http://blog.gopivotal.com/case-studies-2/using-redis-at-pinterest-for-billions-of-relationships, http://engineering.pinterest.com/post/55272557617/building-a-follower-model-from-scratch

Stored and Ready on Login for 70m Users

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

What Makes Data “Ready”?

10

Where’s the Line for Service and Surveillance?

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Should Users Trust?

11

Credit Score

Contacts

Behavior

Preferences

Credit Score

Contacts

Behavior

Preferences

Location / MovementLocation / Movement

IdentifierIdentifier

Connection Relationship Affiliation

Connection Relationship Affiliation

Picture / Video

Picture / Video

Browsing History

Browsing History

FinancesFinances

https://www.unboundid.com/blog/2013/09/05/the-value-of-identity-data-and-identity-etiquette/

Cu

sto

me

r C

on

cern

(2) What will you do with it?(1

) Is

it

safe

?

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Should Users Trust?

12

http://money.cnn.com/2013/09/05/pf/acxiom-consumer-data/index.html

(1)

Is i

t sa

fe?

“…a 26-year-old mother of two teenagers…is just

about biologically impossible”

“…a 26-year-old mother of two teenagers…is just

about biologically impossible”

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Finding Value in Data

13

“…we know the estimated numbers of people being

served by each waste water treatment plant, we can

back-calculate daily [drug] loads…”- Dr Kasprzyk-Hordern

Chicago and suburbs = 1.5b

gal/day wastewater

• Disease

• Drugs

• Environmental Risk

http://gizmodo.com/5844925/chicagos-stickney-wastewater-treatment-plant-is-the-crappiest-place-on-earth, http://planetearth.nerc.ac.uk/news/story.aspx?id=1185&cookieConsent=A

http://www.treehugger.com/natural-sciences/fish-near-water-treatment-plants-are-harmed-by-human-drugs.html

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Finding Errors in Data

14

http://pewinternet.org/Reports/2013/Anonymity-online.aspx, http://www.connecture.com/the-connecture-difference/

of Internet users have

taken steps online to

remove or mask their

digital footprints

10/3/2013 15

CRISC

CGEIT

CISM

CISA2013 Fall Conference – “Sail to Success”

OPERATIONS

15

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Transformation of IT

16

Mobile

Virtual

Cloud

Big

Data

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

New Wave of Big Data Technology

17

Business

Objectives

InsightsAnalytics

SciPy

Mahout

MATLAB

Revolution R

SPSS

AMPL

SAS

Machine Learning

Behavior Analysis

Sentiment Analysis

Predictive Models

Network Analysis

Visualization

Simulation

Data

HadoopHadoop

Vertica

MapReduce

Esper

kdb

Pivotal

Netezza

TeradataECL

ETL

Hive

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

(Over?) Emphasis on Performance

• Nodes Distributed

• Data Shared

• Access Controls Open

• Networks Open

• Clients Unauthenticated

• Web Services Open

18

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Hadoop Machine Roles

19

1. Clients

2. Masters

3. Slaves1. Storage Management (Name Node -> HDFS)2. Compute Management (Job Tracker -> MapReduce)

1. Storage (Data Node) 2. Compute (Task Tracker)

1. Load Data to Cluster2. Submit MapReduce Jobs3. View Results

?

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Hadoop Machine Roles

20

Clients

Job Tracker Name NodeSecondary Name Node

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Slaves

Masters

HDFSMapReduce

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Hadoop Cluster

21

switch switch switch switch

name node job tracker 2ndry name node client

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

switch

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Rack 1 Rack 2 Rack 3 Rack 4 Rack n

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Typical Cluster Environment

• Petabytes

• 10,000s Slots Per Cluster

• Shared or Dedicated

• Production and Non-Production

• Mixed Software and Uses

22

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Google MapReduce: 2004

23

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

MapReduce Today

24

Output Files

Task/Data

Reduce HDFS

BlockTask/Data

Reduce HDFS

Block

Task/Data Task/Data Task/Data

Map Map Map HDFS

Block

HDFS

Block

HDFS

Block

Output Files

SplitsSplitsSplitsSplitsSplitsSplits

SplitsSplitsSplits

Task

Job TrackerJSON

RPC Read

Data

NameNode

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

MapReduce Today

25

Node0 Node1

Local File

System

HDFS

Name and

Data Node

output

tmp

input

MapReduce

Job and

Task Tracker reduce

map

dir owner perm

name hdfs 700

data hdfs 700

dir owner perm

HDFS hdfs 775

MAPRED mapred 775

$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Job Perimeter

26

ProductionAd-Hoc

• User Accounts

• Ticket at Login

• Tickets Expire

• Batch Accounts

• Tickets from Keytab

• Tickets AutoRenew

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Domain Perimeter

27

Name Node

Data Node & Task Tracker

Data Node & Task Tracker

ADAD

Hadoop AuthenticationIT Auth

LDAP

SSH

hdfs/nn@company hdfs/nn@hadoop

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Domain Perimeters

28

Job Tracker

Name Node

Data Node & Task Tracker

Data Node & Task Tracker

ADAD

Bastion

Hadoop AuthenticationIT Auth

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Authentication Perimeters?

29

NNWebHDFS

ClientSNN

Service UsersUsers

HDFS Client

MR Client JT

DN / TTDN / TT

DN / TT

HTTPFS

Map Task

Reduce Task

usernames: hdfs, httpfs, mapredend user

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Authentication Perimeters?

30

ServicesClients

service user

end user

Clients

Hadoop

Zookeeper

Hue

Oozie

Flume

Impala

Hbase

Hive

Metastore

Hbase

Oozie

www

Flume

Impala

Zookeeper

WebHdfs

Pig

Crunch

Cascading

Sqoop

Hve

Map

Red

RPC

HTTP

HTTPHTTP

Avro

RPC

ThriftRPC

RPC

Thrift

RPC

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 201331

Moment of Reflection

10/3/2013 32

CRISC

CGEIT

CISM

CISA2013 Fall Conference – “Sail to Success”

AUDITING

32

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Big Data Security Today

33

Authentication Authorization Encryption

Policy: “Cluster accessible only to trusted personnel”

Caveats– All or Nothing: Nodes without security cannot communicate with secure nodes

– Rolling upgrades to enable cluster security are impossible

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Authentication

• End User to Service

• Service to Service

• Service to Service, for a User

• Job Task to Service, for a User

34

“Big Data Security Configuration is a PITA. Do only

what you really need.” -- 2013 Hadoop Summit

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Authorization

• Data

HDFS, Hbase, Hive Metastore, Zookeeper

• Jobs

Hadoop, MR, Pig, Oozie, Hue…

• Queries

Impala, Drill

35

https://hadoop.apache.org/docs/stable/Secure_Impersonation.html

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Encryption

• Stored Data (Emerging)

– Data Nodes Only

– Keys Stored Separate from Data Node Storage

– Keys for “Secure Process” Not Users

• Transmitted Data

– RPC Supports Only SASL

– HTTPS

36

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Typical “Security” Setup

Access Controls

1. Authentication (Kerberos)

2. Role Based Authorization (ACLs)

3. Identity Management (AD or LDAP)

37

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Strong Authentication?

38

user process

hdfs namenode, datanode, secondary namenode

mapred jobtracker, tasktracker, child tasks

group users

hadoop hdfs, mapred

$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

Older versions ran all daemons as single user (hadoop)

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Trying to Get to Compliance

1. Node Access (Authentication)

2. Node API Authentication

3. Store Data (Disk Encryption)

4. Transmit Data (Net Encryption)

5. RBAC (Job ACLs)

6. Logs

Ea

se

MitM?

root?NIC control?

sudo -u hdfs hadoop fs -rmr /

39

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Threat Model Thoughts

• Confidentiality

– Regulated Data

– Intellectual Property

• Integrity

– Analysis

– Output

• Availability

– Data Load

– Jobs, Analysis

40

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Future

41

Business

Objectives

InsightsAnalyticsData

Confidentiality

Integrity

Availability

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Regulated Data Example

• Confidentiality

– Stored

– Transmitted

• Integrity

– Backup

– Restore

• Availability

42

Cost Calculations

• Error

• Load Time

• Drop Time

• Re-Load Time

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Regulated Data Example: Sqoop2

43

Bulk Data Transfer Tool – Import/Export

https://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop

“Sqoop2 Does Not

Support Security”

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Future DirectionsBuilding Controls for Risks

Data

Acquire

StoreQuery

� MapReduce

� Pig (simple query language)

� Hive (SQL queries)

� Cascading (workflow)

� Mahout (machine learning)

� Hama (scientific compute)

� Drill (ad-hoc query)

� Hbase (column-orient DB)

� Zookeeper (coordination)

� HDFS

44

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Acquire

• Data Validation

• Aggregation

– DOB

– Age Group (18-24, 25-36, etc.)

• Salt (Noise)

• Imitation (Synthetic)

• Replacement (Swapping)

• Wiping (Suppression)

Data Acquire

45

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Store

• Encryption

• Unique Users (RBAC)

• API Authentication

• “Hardened Clusters”

• Logs

• Forensics

Data Store

46

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Query

• Encryption

• Unique Users (RBAC)

• Load Restrictions

• Add, Remove, Change Jobs

• Job Alerts

• Scheduling

• Secure Code Practices / Review

Data Query

47

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

CSA “Securing Big Data”

SQL Security – Old NoSQL Security – New

Config Management Cell-Level Access Labels

Multi-Factor Authentication Keberos-Based Authentication

Data Classification ACLs

Data Encryption

Consolidated Audit/Report

Database Firewall

Vulnerability-Scanner

48

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Example: Accumulo

• NoSQL (HBase) perf…with security

• Timeline

– 2006 – Google BigTable

– 2008 – NSA

– 2011 – accumulo.apache.org

Cell-Level Access – Distributed Key/Value Store

// specify which visibilities we are allowed to seeAuthorizations auths = new Authorizations("public"); Scanner scan =

conn.createScanner("table", auths); scan.setRange(new Range("alexander","snowden")); scan.fetchFamily("attributes"); for(Entry<Key,Value> entry : scan) {

String row = entry.getKey().getRow(); Value value = entry.getValue();

}

https://accumulo.apache.org/1.5/accumulo_user_manual.html

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013

Conclusions

• Big Data

– Performance Pressure

– Customer Concern / Backlash

• Operations Risks

– Perimeter Holes

– Business Logic Error

– Weak Policies

• Auditing

– Trusted Personnel

– Perimeters

50

CRISC

CGEIT

CISM

CISA2013 Fall Conference – “Sail to Success”

Auditing Big Data for

Privacy, Security and Compliance

Davi Ottenheimer @daviottenheimer

Senior Director of Trust, EMC

In-Depth Seminars – D21

THANK YOU!