Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software...

61
Starting the Avalanche: Application DoS In Microservice Architectures Scott Behrens Jeremy Heffner

Transcript of Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software...

Page 1: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Starting the Avalanche: Application DoS In Microservice Architectures

Scott BehrensJeremy Heffner

Page 2: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Jeremy Heffner

● Senior Security Software Engineer

● Developing and securing things for 20+ years

IntroductionsScott Behrens

● Netflix senior application security engineer

● Breaking and building for 8+ years

● Contributor to a variety of open source projects (github.com/sbehrens)

Page 3: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

DoS focused on application layer logic

Page 4: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Photo of a battering ram by Flickr user Patrick Denker; License: https://creativecommons.org/licenses/by/2.0/

Page 7: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Primer: High Level View

Architecture

Client Libraries and API Gateway

Circuit Breakers / Failover

Cache

Page 8: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Primer: Architecture

Scale

Service independence

Fault isolation

Eliminates stack debt

Distributed system complexity

Deployment complexity

Cascading service failures if things aren’t set up right

GOOD BAD

Page 9: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Simplified Microservice API Architecture

INTERNET

ZUULPROXY

ZUUL PROXY PROXIES

CORE API

WEBSITE

Middle Tier Service

Middle Tier Service

Middle Tier Service

Middle Tier Service

Backend Tier ServiceBackend Tier Service

Backend Tier Service

Backend Tier Service

Backend Tier Service

Backend Tier Service

Backend Tier Service

Backend Tier Service

EDGE Middle Backend

Page 10: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Primer: API Gateways and Client LibrariesInterface for middle tier services

Services provide client libraries to API Gateway

Diagrams provided by microservices.io

Page 11: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Primer: Circuit BreakerHelps with handling service failures

How do you know what timeout to choose?

How long should the breaker be triggered?

Diagrams provided by microservices.io

Page 12: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Primer: CacheSpeeds up response time

Reduces load on services fronted by cache

Reduces the number of servers needed to handle requests

https://github.com/netflix/evcache

Page 13: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Old school Application DoSCPU

Mem

Cache

Disk

Network

Page 14: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

New School Application DoSCPU

Mem

Cache

Disk

Network

Queueing

Client Library Timeouts

Healthchecks

Connection Pool

Hardware Operations (HSMs)

Page 15: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

New School Application DoSCPU

Mem

Cache

Disk

Network

Queueing

Client Library Timeouts

Healthchecks

Connection Pool

Hardware Operations (HSMs)

Page 16: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Difference Between Old School and New School App DoSOld School Application DoS

Often 1 to 1

New School Application DoS

Often 1 to Many

Page 17: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Simple Web Application Architecture

Page 18: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Old School Application DoS Attack

> perl create_many_profiles.pl

POST /create_profile HTTP/1.1…profile_name=$counter + “hacker”

300 requests per second

HTTP Timeouts

HTTP Timeouts

https://www.teachprivacy.com/the-funniest-hacker-stock-photos/https://openclipart.org/image/2400px/svg_to_png/241842/sad_panda.png http://www.funnyordie.com/lists/f64f7beefd/brent-rambo-approves-of-these-gifs

Page 19: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

ZUULPROXY

ZUUL PROXY PROXIES

CORE API

WEBSITE Middle Tier ServiceMiddle Tier ServiceMiddle Tier ServiceMiddle Tier Service

Backend Tier Service

Backend Tier Service

Backend Tier Service

Backend Tier Service Backend Tier Service Backend Tier Service

Backend Tier Service

Backend Tier Service

EDGE Middle Backend

New School Microservice API DoS

> python grizzly.py

POST /recommendations HTTP/1.1…{“recommendations”: {“range”: [0,10000]}}

Page 20: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

ZUULPROXY

ZUUL PROXY PROXIES

CORE API

WEBSITE Middle Tier ServiceMiddle Tier ServiceMiddle Tier ServiceMiddle Tier Service

Backend Tier Service

Backend Tier Service

Backend Tier Service

Backend Tier Service Backend Tier Service Backend Tier Service

Backend Tier Service

Backend Tier Service

EDGE Middle Backend

New School Microservice API DoS

> python grizzly.py

POST /recommendations HTTP/1.1…{“recommendations”: {“range”: [0,10000]}}

Fallback or

Site Error

Core API making many client requests

Middle tier services making many calls to backend services

Backend service queues filling up with expensive requests

Client Timeouts, circuit breakers triggered, fallback experience triggered

Page 21: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Workflow for Identifying Application DoS - Part 1Identify the most latent service calls

Investigate if latent calls allow for manipulation

Tune payload to fly under WAF/Rate Limiting

Test hypothesis

Scale your test using Cloudy Kraken (orchestrator) and Repulsive Grizzly (attack framework)

Page 22: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Workflow for Identifying Application DoS - Part 1Identify the most latent service calls

Investigate if latent calls allow for manipulation

Tune payload to fly under WAF/Rate Limiting

Test hypothesis

Scale your test using Cloudy Kraken (orchestrator) and Repulsive Grizzly (attack framework)

Page 23: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Identifying Latent Service Calls

Page 24: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Identifying Latent Service Calls

Page 25: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Application DoS: Attack Patterns

Range

Object Out per Object in

Request Size

All of the Above

Page 26: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Application DoS Technique: Range

Page 27: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Application DoS Technique: Object Out Per Object In

Page 28: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Application DoS Technique: Request Size

Page 29: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Application DoS Technique: All of the Above

<--What about N languages?

<--What about more object fields?

Page 30: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Logical Work Per Request

#Req

Service Healthy

Service Impact

Service Im

pact

Rate Limited

Service Auto-Scaling/Healthy

Page 31: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

New School Application DoS Attack: Case Study

Page 32: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Making the call more expensive

Page 33: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Workflow for Identifying Application DoS - Part 2Identify the most latent service calls

Investigate if latent calls allow for range, object out/object in, request size, or other manipulation

Tune payload to fly under WAF/Rate Limiting while causing the most application instability

Test hypothesis on a smaller scale using Repulsive Grizzly

Scale your test using Cloudy Kraken

Page 34: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Repulsive GrizzlySkunkworks application DoS framework

Written in Python3

Eventlet for high concurrency

Uses AWS SNS for logging analysis

Easily configurable

Page 35: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Repulsive Grizzly: Command File

Page 36: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Repulsive Grizzly: Payload and Header FilesProvide payloads in any format you want

Headers are provided as a JSON key/value hash

Use $$AUTH$$ placeholder to tell grizzly where to place tokens

Page 37: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Repulsive Grizzly: Bypass Rate Limiter with Sessions

http://nerdprint.com/wp-content/uploads/2014/09/Dumle-Mountain-Cookies-Advertising2.png

Page 38: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Repulsive Grizzly: Single Node

Page 40: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Cloudy Kraken Overview

Page 41: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Update Config

Push the latest configuration file and attack scripts to S3.

Reset the DynamoDB state.

Build Environment

Configure the VPCs in each region

Start up Attack Nodes

Launch instances

Collect data

Wait for data to come through SNS

Tear-Down

Tear down and reset the environment in each region

Page 42: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Cloudy Kraken Configuration

S3 Bucket

DynamoDBTable

Zip File and Configuration

Reset State

Page 43: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Cloudy Kraken: Key AWS Deployment Building Blocks

Region => AWS Geographical Region

VPC => VLAN

ASG => Automatically starts identical nodes

AZ/Subnet => Localized nodes / Subnet

Launch Config => Initial configuration

Region

VPC

AZ/Subnet

AZ/Subnet

Auto-Scaling Group

NodeNodeNodeNodeNodeNode

NodeNodeNodeNodeNodeNode

Page 44: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Cloudy Kraken Deployment phaseVPC

Security Group

Auto Scaling Group

VPC

Security Group

Auto Scaling GroupR

egio

n A

Reg

ion

B

Page 45: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Cloudy Kraken WorkersEach worker node is a single EC2 instance

Each worker runs many threads

EC2 gives you access to Enhanced Networking Driver

Minimal overhead with launch config and ASG

Page 46: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Cloudy Kraken Execution phaseOn startup, each worker node runs a cloud-init script

Enables ssh access for monitoring and debugging

Downloads and runs main config script

Downloads ZIP file with attack script

Spins up attack worker

Waits for coordinated time to start

Page 47: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Cloudy Kraken Kill-SwitchScript to set the kill switch, and bring it all down

Page 48: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Cloudy Kraken Tear-DownTerminates all the instances

Removes ASGs and Launch Configs

Removes VPC, Security group, and Instance Profiles

Page 49: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

We scaled up, time to run the test!Tested against prod

Multi-region and multi-agent

Conducted two 5 minute attacks

Monitored for success

Page 50: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Results of Test

80% Error Rate

Page 51: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

$1.715 minute outage for a single AWS region

Page 52: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

So What Failed?Expensive API calls could be invoked with non-member cookies

Expensive traffic resulted in many RPCs per request

WAF/Rate Limiter was unable to monitor middle tier RPCs

Missing fallback experience when cache missed

Page 53: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Demo● Test app ● Launching and scaling attack with Cloudy Kraken

Page 54: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Application DoS:

Mitigations

Understand which microservices impact customer experience

Page 55: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Application DoS:

Mitigations

Rate limiter (WAF) should monitor middle tier signals or cost of request*

Page 56: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Application DoS:

Mitigations

Middle tier services should provide context on abnormal behavior

Page 57: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Application DoS:

Mitigations

Rate limiter (WAF) should monitor volume of cache misses*

Page 58: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Application DoS:

Mitigations

Prioritize authenticated traffic over unauthenticated

Page 59: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Application DoS:

MitigationsConfigure reasonable client library timeouts

Page 60: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Microservice Application DoS:

Mitigations

Trigger fallback experiences when cache or lookups fail

Page 61: Starting the Avalanche - DEF CON CON 25/DEF CON 25... · Jeremy Heffner Senior Security Software Engineer Developing and securing things for 20+ years Introductions Scott Behrens

Thanks!https://github.com/netflix-skunkworks/repulsive-grizzly

https://github.com/netflix-skunkworks/cloudy-kraken

@helloarbit