BigData and Privacy webinar at Brighttalk

48
Bridging the Gap Between Privacy and Big Data Ulf Mattsson, CTO Protegrity ulf.mattsson AT protegrity.com

description

BigData and Privacy webinar at Brighttalk

Transcript of BigData and Privacy webinar at Brighttalk

Page 1: BigData and Privacy webinar at Brighttalk

Bridging the Gap Between Privacy and Big Data

Ulf Mattsson, CTO

Protegrity

ulf.mattsson AT protegrity.com

Page 2: BigData and Privacy webinar at Brighttalk

2

20 years with IBM

• Research & Development & Global Services

Inventor

• Encryption, Tokenization & Intrusion Prevention

Member of

• PCI Security Standards Council (PCI SSC)

• American National Standards Institute (ANSI) X9

• Encryption & Tokenization

• International Federation for Information Processing

• IFIP WG 11.3 Data and Application Security

Ulf Mattsson, CTO Protegrity

Page 3: BigData and Privacy webinar at Brighttalk

Agenda

Big Data adoption rate

Security holes and threats to data

• Privacy regulations

New data protection techniques

Best practices for protecting big data

3

Page 4: BigData and Privacy webinar at Brighttalk

4

Bridging the Gap Between Privacy and Big Data

Page 5: BigData and Privacy webinar at Brighttalk

BIG DATA ADOPTION AND HOLES

5

Page 6: BigData and Privacy webinar at Brighttalk

6

Source: Gartner

Has Your Organization Already Invested in Big Data?

Page 7: BigData and Privacy webinar at Brighttalk

In 2012, Gartner predicted Big Data to be majority adopted in 2 to 5 years.

In 2013, Gartner updated this prediction to 5 to 10 years…

Why has adoption been slower than expected?

Big Data Adoption Statistics

7

Are they investing?

Source: Gartner

Page 8: BigData and Privacy webinar at Brighttalk

Holes in Big Data…

8

Source: Gartner

Page 9: BigData and Privacy webinar at Brighttalk

9

DataMonetization

CustomerSupport

CustomerProfiles

Sales &Marketing

SocialMedia

BusinessImprovement

Big Data

Regulations& Breaches Increased

profits

Increased profits

Increased profits

Increased profits

Increased profits

Balancing Security and Data Insight

Page 10: BigData and Privacy webinar at Brighttalk

Many ways to hack Big Data

Many regulations on sensitive data

Privileged User Threats

Encryption• Data Size

• Data type

• Data format

• Performance (SLA)

Analytics

Inter-node data movement

What’s the Problem with Securing Big Data?

10

Page 11: BigData and Privacy webinar at Brighttalk

Many Ways to Hack Big Data

Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase

11

HDFS(Hadoop Distributed File System)

MapReduce (Job Scheduling/Execution System)

Hbase (Column DB)

Pig (Data Flow) Hive (SQL) Sqoop

ETL Tools BI Reporting RDBMS

Avr

o (S

eria

lizat

ion)

Zoo

keep

er (

Coo

rdin

atio

n)

Hackers

PrivilegedUsers

UnvettedApplications

OrAd Hoc

Processes

Page 12: BigData and Privacy webinar at Brighttalk

Big Data and The Insider Threat

12

Page 13: BigData and Privacy webinar at Brighttalk

BALANCING SECURITY AND DATA INSIGHT

13

Page 14: BigData and Privacy webinar at Brighttalk

Balancing Security and Data Insight

Tug of war between security and data insight

Big Data is designed for access

Privacy regulations require de-identification

Granular data-level protection

Traditional security don’t allow for seamless data use

14

Page 15: BigData and Privacy webinar at Brighttalk

Reduction of Pain with New Protection Techniques

15

1970 2000 2005 2010

High

Low

Pain& TCO

Strong EncryptionAES, 3DES

Format Preserving EncryptionDTP, FPE

Vault-based Tokenization

Vaultless Tokenization

Input Value: 3872 3789 1620 3675

!@#$%a^.,mhu7///&*B()_+!@

8278 2789 2990 2789

8278 2789 2990 2789

Format Preserving

Greatly reduced Key Management

No Vault

8278 2789 2990 2789

Page 16: BigData and Privacy webinar at Brighttalk

10 000 000 -

1 000 000 -

100 000 -

10 000 -

1 000 -

100 -

Transactions per second*

I

Format

Preserving

Encryption

Speed of Different Protection Methods

I

Vaultless

Data

Tokenization

I

AES CBC

Encryption

Standard

I

Vault-based

Data

Tokenization

*: Speed will depend on the configuration

16

Page 17: BigData and Privacy webinar at Brighttalk

HOW IS TOKENIZATION BRIDGING THE GAP?

17

Page 18: BigData and Privacy webinar at Brighttalk

De-Identified Sensitive Data Field Real Data Tokenized / Pseudonymized

Name Joe Smith csu wusoj

Address 100 Main Street, Pleasantville, CA 476 srta coetse, cysieondusbak, CA

Date of Birth 12/25/1966 01/02/1966

Telephone 760-278-3389 760-389-2289

E-Mail Address [email protected] [email protected]

SSN 076-39-2778 076-28-3390

CC Number 3678 2289 3907 3378 3846 2290 3371 3378

Business URL www.surferdude.com www.sheyinctao.com

Fingerprint Encrypted

Photo Encrypted

X-Ray Encrypted

Healthcare Data – Primary Care Data

Dr. visits, prescriptions, hospital stays and discharges, clinical, billing, etc.

Protection methods can be equally applied to the actual healthcare data, but not needed with de-identification

18

Page 19: BigData and Privacy webinar at Brighttalk

Research Brief

“Tokenization Gets Traction”

Aberdeen has seen a steady increase in enterprise use of tokenization

Alternative to encryption

47% using tokenization for something other than cardholder data (PII and PHI)

Tokenization users had 50% fewer security-related incidents

19 Author: Derek Brink, VP and Research Fellow, IT Security and IT GRC

Page 20: BigData and Privacy webinar at Brighttalk

Protection Granularity: Encryption of Fields

20

Production SystemsEncryption •Reversible•Policy Control (authorized / Unauthorized Access)•Lacks Integration Transparency•Complex Key Management•Example !@#$%a^.,mhu7///&*B()_+!@

Page 21: BigData and Privacy webinar at Brighttalk

Protection Granularity: Encryption vs. Masking

21

Production Systems

Non-Production Systems

Encryption of fields•Reversible•Policy Control (authorized / Unauthorized Access)•Lacks Integration Transparency•Complex Key Management•Example

Masking of fields•Not reversible•No Policy, Everyone can access the data•Integrates Transparently•No Complex Key Management•Example 0389 3778 3652 0038

!@#$%a^.,mhu7///&*B()_+!@

Page 22: BigData and Privacy webinar at Brighttalk

Protection Granularity: Tokenization of Fields

22

Production Systems

Non-Production Systems

Vaultless Tokenization / Pseudonymization

•No Complex Key Management•Business Intelligence

•Reversible •Policy Control (Authorized / Unauthorized Access)

•Not Reversible•Integrates Transparently

Credit Card: 0389 3778 3652 0038

Page 23: BigData and Privacy webinar at Brighttalk

CLASSIFY DATA

23

Page 24: BigData and Privacy webinar at Brighttalk

Financial Services

Healthcare and Pharmaceuticals

Infrastructure and Energy

Federal Government

Select US Regulations for Security and Privacy

24

Page 25: BigData and Privacy webinar at Brighttalk

Privacy Laws

54 International Privacy Laws

30 United States Privacy Laws

25

Page 26: BigData and Privacy webinar at Brighttalk

1. Classify: Examples of Sensitive Data

26

Sensitive Information Compliance Regulation / Laws

Credit Card Numbers PCI DSS

Names HIPAA, State Privacy Laws

Address HIPAA, State Privacy Laws

Dates HIPAA, State Privacy Laws

Phone Numbers HIPAA, State Privacy Laws

Personal ID Numbers HIPAA, State Privacy Laws

Personally owned property numbers HIPAA, State Privacy Laws

Personal Characteristics HIPAA, State Privacy Laws

Asset Information HIPAA, State Privacy Laws

Page 27: BigData and Privacy webinar at Brighttalk

PCI DSS Build and maintain a secure network.

1. Install and maintain a firewall configuration to protect data

2. Do not use vendor-supplied defaults for system passwords and other security parameters

Protect cardholder data. 3. Protect stored data4. Encrypt transmission of cardholder data and

sensitive information across public networks

Maintain a vulnerability management program.

5. Use and regularly update anti-virus software6. Develop and maintain secure systems and

applications

Implement strong access control measures.

7. Restrict access to data by business need-to-know

8. Assign a unique ID to each person with computer access

9. Restrict physical access to cardholder data

Regularly monitor and test networks.

10. Track and monitor all access to network resources and cardholder data

11. Regularly test security systems and processes

Maintain an information security policy.

12. Maintain a policy that addresses information security

27

Page 28: BigData and Privacy webinar at Brighttalk

Protection of cardholder data in memory

Clarification of key management dual control and split knowledge

Recommendations on making PCI DSS business-as-usual and best practices

Security policy and operational procedures added

Increased password strength

New requirements for point-of-sale terminal security

More robust requirements for penetration testing

PCI DSS 3.0

28

Page 29: BigData and Privacy webinar at Brighttalk

HIPAA PHI: List of 18 Identifiers

1. Names

2. All geographical subdivisions smaller than a State

3. All elements of dates (except year) related to individual

4. Phone numbers

5. Fax numbers

6. Electronic mail addresses

7. Social Security numbers

8. Medical record numbers

9. Health plan beneficiary numbers

10. Account numbers

29

11. Certificate/license numbers

12. Vehicle identifiers and serial numbers

13. Device identifiers and serial numbers

14. Web Universal Resource Locators (URLs)

15. Internet Protocol (IP) address numbers

16. Biometric identifiers, including finger prints

17. Full face photographic images

18. Any other unique identifying number

Page 30: BigData and Privacy webinar at Brighttalk

Make business associates of covered entities directly liable for compliance

Strengthen the limitations on the use and disclosure of protected health information for marketing purposes

Prohibit the sale of protected health information without individual authorization.

Expand individuals’ rights to receive electronic copies of their health information

HIPAA Omnibus Final Rule – Some Examples

30

Page 31: BigData and Privacy webinar at Brighttalk

PROTECT DATA

31

Page 32: BigData and Privacy webinar at Brighttalk

Protection Beyond Kerberos

32

Volume Encryption

Field level data protection for existing and new data.

API enabled Field level data protection

API enabled Field level data protection

HDFS

(Hadoop Distributed File System)

MapReduce (Job Scheduling/Execution System)

Hbase (Column DB)

Pig (Data Flow) Hive (SQL) Sqoop

ETL Tools BI Reporting RDBMS

Page 33: BigData and Privacy webinar at Brighttalk

Volume Encryption

33

HDFS

MapReduce

Protected with Volume Encryption

Entire file is in the clear when analyzed

Page 34: BigData and Privacy webinar at Brighttalk

File Encryption – Authorized User

34

HDFS

MapReduce

Protected with File Encryption

Entire file is in the clear when analyzed

Page 35: BigData and Privacy webinar at Brighttalk

File Encryption – Non Authorized User

35

HDFS

MapReduce

Protected with File Encryption

Entire file is in unreadable when analyzed

Page 36: BigData and Privacy webinar at Brighttalk

Volume Encryption + Gateway Field Protection

36

HDFS

MapReduce

Kerberos AccessControl

Granular Field Level Protection

Protected with Volume Encryption

Data Protection File Gateway

Page 37: BigData and Privacy webinar at Brighttalk

Volume Encryption + Internal MapReduce Field Protection

37

HDFS

MapReduce

Kerberos Access Control

Granular Field Level Protection

Protected with Volume Encryption

Hadoop Staging

MapReduce

Analytics

Page 38: BigData and Privacy webinar at Brighttalk

ENFORCE DATA PROTECTION

38

Page 39: BigData and Privacy webinar at Brighttalk

A Data Security Policy

39

What is the sensitive data that needs to be protected. Data Element.

How you want to protect and present sensitive data. There are several methods for protecting sensitive data. Encryption, tokenization, monitoring, etc.

Who should have access to sensitive data and who should not. Security access control. Roles & Members.

When should sensitive data access be granted to those who have access. Day of week, time of day.

Where is the sensitive data stored? This will be where the policy is enforced. At the protector.

Audit authorized or un-authorized access to sensitive data. Optional audit of protect/unprotect.

What

Who

When

Where

How

Audit

Page 40: BigData and Privacy webinar at Brighttalk

Volume Encryption + Field Protection + Policy Enforcement

40

HDFSProtected with

Volume Encryption

MapReduce

Data Protection Policy

Page 41: BigData and Privacy webinar at Brighttalk

Volume Encryption + Field Protection + Policy Enforcement

41

HDFSProtected with

Volume Encryption

MapReduce

Data Protection Policy

Page 42: BigData and Privacy webinar at Brighttalk

4. Authorized User Example

42

Presentation to requestorName: Joe SmithAddress: 100 Main Street, Pleasantville, CA

Selected data displayed (least privilege)

Protection at restName: csu wusojAddress: 476 srta coetse, cysieondusbak, CA

Data Scientist, Business Analyst

PolicyEnforcement

Does the requestor have the authority to access the protected data?

Authorized

Request

Response

Page 43: BigData and Privacy webinar at Brighttalk

4. Un-Authorized User Example

43

Request

Response

Presentation to requestorName: csu wusojAddress: 476 srta coetse, cysieondusbak, CA

Protection at restName: csu wusojAddress: 476 srta coetse, cysieondusbak, CA

Privileged Used, DBA, System Administrators, Bad Guy

PolicyEnforcement

Does the requestor have the authority to access the protected data?

Not Authorized

Page 44: BigData and Privacy webinar at Brighttalk

MONITOR DATA PROTECTION

44

Page 45: BigData and Privacy webinar at Brighttalk

Best Practices for Protecting Big Data

Start early

Granular protection

Select the optimal protection

Enterprise coverage

Protection against insider threat

Protect highly sensitive data in a way that is mostly transparent to the analysis process

Policy based protection

Record data access events

45

Page 46: BigData and Privacy webinar at Brighttalk

Proven enterprise data security software and innovation leader

• Sole focus on the protection of data

Growth driven by risk management and compliance

• PCI (Payment Card Industry)

• PII (Personally Identifiable Information)

• PHI (Protected Health Information) – HIPAA

• State and Foreign Privacy Laws, Breach Notification Laws

Successful across many key industries

Introduction to Protegrity

46

Page 47: BigData and Privacy webinar at Brighttalk

How Protegrity Can Help

47

1

2

3

4

5

We can help you Classify the sensitive data

We can help you Discover where the sensitive data sits

We can help you Protect your sensitive data in a flexible way

We can help you Enforce policies that will enable business functions and preventing sensitive data from the wrong hands.

We can help you Monitor sensitive data to gain insights on abnormal behaviors.