Download - Security bigdata

Transcript
Page 1: Security bigdata

Big Data Security

Top 5 Security Risks and Best Practices

Jitendra Chauhan

Head R&D, iViZ Security

[email protected]

Page 2: Security bigdata

Agenda

• Key Insights of Big Data Architecture

• Top 5 Big Data Security Risks

• Top 5 Best Practices

Page 3: Security bigdata

Key Insights of Big Data

Architecture

Page 4: Security bigdata

Distributed Architecture(Hadoop as example)

Data Partition, Replication

and Distribution

Auto-tiering

Move the

Code

Page 5: Security bigdata

Real Time, Streaming and Continuous

Computation

No SQL Roadshow| 12

Integration Patterns

Real

timeVariety of

Input

Sources

Adhoc

Queries

Page 6: Security bigdata

Parallel & Powerful Programming

Framework

Example:

• 16TB Data

• 128 MB Chunks

• 82000 Maps

Java vs SQL / PLSQL

Frameworks:

• MapReduce

• Storm Topology

(Spouts & Bolts)

Page 7: Security bigdata

Big Data ArchitectureNo Single Silver Bullet

• Hadoop is already unsuitable for many Big

data problems

• Real-time analytics• Cloudscale, Storm

• Graph computation o Giraph and Pregel (Some examples graph

computation are Shortest Paths, Degree of

Separation etc.)

• Low latency queries

o Dremel

Page 8: Security bigdata

Top 5 Security Risks

Page 9: Security bigdata

Insecure Computation

Sensitive

Info

• Information Leak

• Data Corruption

• DoSHealth Data

Untrusted

Computation program

Page 10: Security bigdata

Input Validation and Filtering

• Input Validationo What kind of data is untrusted?

o What are the untrusted data sources?

• Data Filtering

o Filter Rogue or malicious data

• Challengeso GBs or TBs continuous data

o Signature based data filtering has limitations

How to filter Behavior aspect of data?

Page 11: Security bigdata

Granular Access Controls

• Designed for Performance, almost no

security in mind

• Security in Big Data still ongoing research

• Table, Row or Cell level access control gone

missing

• Adhoc Queries poses additional challenges

• Access Control is disabled by default

Page 12: Security bigdata

Insecure Data Storage

• Data at various nodes, Authentication,

Authorization & Encryption is challenging

• Autotiering moves cold data to lesser secure

medium o What if cold data is sensitive?

• Encryption of Real time data can have

performance impacts

• Secure communication among nodes,

middleware and end users are disabled by

default

Page 13: Security bigdata

Privacy Concerns in Data Mining

and Analytics

• Monetization of Big Data generally involves

Data Mining and Analytics

• Sharing of Results involve multiple

challengeso Invasion of Privacy

o Invasive Marketing

o Unintentional Disclosure of Information

• Exampleso AOL release of Anonymzed search logs, Users can

easily be identified

o Netflix faced a similar problem

Page 14: Security bigdata

Top 5 Best Practices

• Secure your Computation Code• Implement access control, code signing, dynamic

analysis of computational code

• Strategy to prevent data in case of untrusted code

• Implement Comprehensive Input Validation

and Filtering

• Implement validation and filtering of input data, from

internal or external sources

• Evaluate input validation filtering of your Big Data

solution

Page 15: Security bigdata

Top 5 Best Practices

• Implement Granular Access Control• Review Role and Privilege Matrix

• Review permission to execute Adhoc queries

• Enable Access Control

• Secure your Data Storage and Computation• Sensitive Data should be segregated

• Enable Data encryption for sensitive data

• Audit Administrative Access on Data Nodes

• API Security

Page 16: Security bigdata

Top 5 Best Practices

• Review and Implement Privacy Preserving

Data Mining and Analytics• Analytics data should not disclose sensitive

information

• Get the Big Data Audited

Page 17: Security bigdata

Thank You

[email protected]

http://www.ivizsecurity.com/blog/

Page 18: Security bigdata

Big Data ArchitectureKey Insights

• Distributed Architecture & Auto Tiering

• Real Time, Streaming and Continuous

Computation

• Adhoc Queries

• Parallel and Powerful Computation

Language

• Move the Code, Not the data

• Non Relational Data

• Variety of Input Sources

Page 19: Security bigdata

Top 5 Security Risks

• Insecure Computation

• End Point Input Validation and

Filtering

• Granular Access Control

• Insecure Data Storage and

Communication

• Privacy Preserving Data Mining and

Analytics