Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael...

32
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1 , Theodoros Strigkos 2 , Babak Falsafi 3 , Phillip B. Gibbons 1 , Todd C. Mowry 1,2 , Vijaya Ramachandran 4 , Olatunji Ruwase 2 , Michael Ryan 1 , Evangelos Vlachos 2 Shimin Chen 1 Intel Research Pittsburgh 2 CMU 3 EPFL 4 UT Austin

Transcript of Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael...

Page 1: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program Monitoring

Joint work with

Michael Kozuch1, Theodoros Strigkos2, Babak Falsafi3, Phillip B. Gibbons1, Todd C. Mowry1,2, Vijaya Ramachandran4,

Olatunji Ruwase2, Michael Ryan1, Evangelos Vlachos2

Shimin Chen

1Intel Research Pittsburgh 2CMU 3EPFL 4UT Austin

Page 2: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 2

Instruction-Grain Monitoring

• Software often contain bugs– Memory corruptions, data races, …, crashes– Security attacks often designed to exploit bugs

• Instruction-grain lifeguards can help– Dynamic monitoring: during application execution– Instruction-grain: e.g., memory access, data flow

• Enables a wide range of powerful lifeguards

Application Lifeguard

Page 3: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 3

Example Instruction-Grain Lifeguards

• AddrCheck: – Monitor malloc/free, memory accesses– Check if all memory accesses visit allocated memory regions

• MemCheck: AddrCheck + check uninitialized values– Copying partially uninitialized structures is not an error– Lazy error detection to avoid many false positives – Track propagation of uninitialized values

• TaintCheck: detect overwrite-based security exploits– Tainted data: data from network or disk– Track propagation of tainted data to detect violations

• LockSet: detect data races in parallel programs

[Nethercote’04]

[Nethercote & Seward ’03 ’07]

[Savage et al.’97]

[Newsome & Song’05]

Page 4: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 4

Design Space of Support Platform

Specific Lifeguard General Purpose: Wide Range of Lifeguards

Dynamic binary instrumentation (DBI)10-100X slowdowns

General-Purpose HW improving DBI

3-8X slowdowns

Lifeguard-specific hardware

This paper

Perf

orm

an

ce

Good

Poor

[Bruening’04] [Luk et al’05]

[Nethercote’04]

[Crandall & Chong’04], [Dalton et al’07], [Shetty et al’06], [Shi et al’06], [Suh et al’04], [Venkataramani’07], [Venkataramani’08], [Zhou et al’07]

[Chen et al’06] [Corliss’03]

Page 5: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 5

Outline

• Introduction

• Background

• Three Hardware Acceleration Techniques

• Experimental Evaluation

• Conclusion

Page 6: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 6

Application TaintCheck Lifeguard

if (taint(F)==1) error;

Example Lifeguard: TaintCheck

Purpose: detect overwrite-based security exploits– Metadata kept for application memory and registers– Tainted data: data from network or disk– Track taint propagation– Detect violation: e.g., tainted jump target address

mov %eax Amov B %eax

add %ebx D

jmp *(F)

taint(%eax) = taint(A)taint(B) = taint(%eax)

taint(%ebx)|= taint(D)

[Newsome & Song’05]

Detect exploit before attack

code takes control

Page 7: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 7

TaintCheck w/ Detailed Tracking

TaintCheck:– Detect violation– 1 taint bit / application byte

TaintCheck w/ detailed tracking:– Construct taint propagation trail– More detailed metadata per application location

• PC of Instruction that tainted this location• “tainted from” address

• Not supported by previous lifeguard-specific HW

Input

Violation

[Newsome & Song’05]

Page 8: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 8

Instruction-Grain Lifeguard Metadata Characteristics

• Organization varies– per application byte/word– size, format, semantics vary greatly

• Frequently updated– e.g., propagation tracking

• Frequently checked– e.g., memory accesses

Page 9: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 9

Lifeguard Support

rare eventsRare

Update

Check

metadata

Event-capture and delivery

Application (unmodified)

Lifeguard (software)Event Handlers

Rare e.g., malloc/free, system calls

Frequent e.g., memory access,data movement

Events

General-Purpose HW improving DBI

Performance bottlenecks: metadata mapping, updates, and checks

1

2

3

Page 10: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 10

Our Contributions

rare eventsRare

Update

Check

metadata

Event-capture and delivery

Application (unmodified)

Lifeguard (software)Event Handlers

Rare e.g., malloc/free, system calls

Frequent e.g., memory access,data movement

Events

M-TLB

IF

IT

• Metadata-TLB for metadata mapping

• Inheritance Tracking for metadata updates

• Idempotent Filters for metadata checks

Page 11: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 11

Outline

• Introduction

• Background

• Three Hardware Acceleration Techniques– Metadata-TLB– Inheritance Tracking– Idempotent Filters

• Experimental Evaluation

• Conclusion

Page 12: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 12

Metadata-TLB: Motivation

• Metadata per app byte/word– Element size may vary

• Two-level structure:– Robustness & space efficiency

• Mapping: application address metadata address– Frequently used in almost every handler– Can be very costly

metadata

Level-1index

Level-2 chunks

Page 13: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 13

Example (TaintCheck)

  map *mp = level1_index[src_addr>>16];    mov  %eax, %ecx                  shr  $16, %ecx       mov  level1_index(,%ecx,4),%ecx   int idx = (src_addr & 0xffff)>>2;    and  $0xffff, %eax shr  $2, %eax UChar mem_taint = mp[idx];   movzbl (%ecx,%eax,1), %eax reg_taint[dest_reg] |= mem_taint;   or %al, reg_taint(%edx)       

  nlba (); nlba

void dest_reg_op_mem_4B (UINT32 src_addr /*%eax*/, UINT32 dest_reg /*%edx */) // app instruction type: dest_reg dest_reg op mem(src_addr) // handler operation: reg_taint(dest_reg)|= mem_taint(src_addr)

Metadata Mapping takes 5 out of 8

instructions !

Page 14: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 14

Our Solution: Metadata-TLB

• A TLB-like HW associative lookup table

• LMA (Load Metadata Address) instruction:– Application address lifeguard metadata address

• Managed by (user-mode) lifeguard software

Page 15: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 15

Example (TaintCheck) w/ M-TLB

  map *mp = level1_index[src_addr>>16];    mov  %eax, %ecx                  shr  $16, %ecx       mov  level1_index(,%ecx,4),%ecx   int idx = (src_addr & 0xffff)>>2;    and  $0xffff, %eax shr  $2, %eax UChar mem_taint = mp[idx];   movzbl (%ecx,%eax,1), %eax reg_taint[dest_reg] |= mem_taint;   or %al, reg_taint(%edx)       

  nlba (); nlba

void dest_reg_op_mem_4B (UINT32 src_addr /*%eax*/, UINT32 dest_reg /*%edx */) // app instruction type: dest_reg dest_reg op mem(src_addr) // handler operation: reg_taint(dest_reg)|= mem_taint(src_addr)

  UChar *p = LMA_macro(src_addr);     LMA  %eax, %ecx

UChar mem_taint = *p;     mov (%ecx), %al reg_taint[dest_reg] |= mem_taint;   or %al, reg_taint(%edx)  nlba (); nlba

Reduce handler size by half !

Page 16: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 16

Inheritance Tracking: Motivation

• Propagation tracking is expensive– Metadata updates for almost every app instruction

• Previous hardware solutions track propagation– automatically update metadata in hardware– Problem: only support simple metadata semantics

• e.g., do not support TaintCheck w/ detailed tracking

• Our goal: flexibility AND performance

• Idea: inheritance structure is common, so let’s track inheritance in hardware!

I nput

Violation

I nputI nput

ViolationViolation

Page 17: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 17

Problem with General Inheritance Tracking

Problem: state explosion for binary operations !

mov %eax Amov B %eax

taint(%eax) = taint(A)taint(B) = taint(%eax)

Application Propagation Tracking

%eax inherits from AB inherits from %eax

Inheritance Tracking

add %ebx D taint(%ebx) |= taint(D) insert D into %ebx’s inherit-from list

Page 18: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 18

Unary Inheritance Tracking

• Many lifeguards can take advantage of unary IT:– MemCheck– TaintCheck

• Large performance improvements if used– Can be disabled if unary IT does not match the lifeguard

check

check

known

Page 19: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 19

Tracking Register Inheritance

Original event

IT table for registers

State Transition& Event to

DeliverDeliver eventIT(%rs) IT(%rd)

Transformed event

More details in the paper:

• IT table and state transition table details

• Conflict detection

Page 20: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 20

Example

mem_to_regreg_to_mem

Application Before

mem_to_mem

Inheritance Tracking

mem_to_regdest_reg_op_memreg_to_mem

imm_to_mem

Can significantly reduce metadata update events!

mov %eax Amov B %eax

mov %ebx Cadd %ebx Dmov E %ebx

Page 21: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 21

Idempotent Filters: Idea

• Typically, metadata checks give the same result if– Event parameters are the same and– Metadata are the same

• Idea: filter out idempotent (redundant) events

• For example:– AddrCheck:

• After checking that a memory location is allocated• Subsequent loads/stores to the same location are safe• Until the next free() event

– LockSet: (surprisingly)• In between synchronization events (e.g., lock/unlock)• Check first load to a location• Check first store to a location

Page 22: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 22

Outline

• Introduction

• Background

• Three Hardware Acceleration Techniques

• Experimental Evaluation– Log-Based Architectures (LBA)– Simulation Study (w/ reduced input sets)– PIN-based Analysis (w/ full inputs)

• Conclusion

Page 23: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 23

Log-Based Architectures

rare eventsRare

Update

Check

metadata

Event-capture and delivery

Application (unmodified)

Lifeguard (software)Event Handlers

Rare e.g., malloc/free, system calls

Frequent e.g., memory access,data movement

Events

Log-Based Architecture (LBA)

Page 24: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 24

P P P

P P P P

P P P P

P P P P

P

Idea: Exploiting Chip Multiprocessors

LBA components

Page 25: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 25

Simulation Setup: Dual-Core LBA System

Log Transport(e.g. L2 cache)

Core 1 Core 2

decompress

Compress

capture dispatch

Operating System: Fedora Core 5

Application Lifeguard

IT & IF

M-TLB

• Application and lifeguard are processes• Application is stalled when log buffer is full• Model a 2-level cache hierarchy

Extend Virtutech

Simics

Page 26: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 26

Overall Performance: TaintCheck

0.00.51.01.52.02.53.03.54.04.55.0

bz

ip2

cra

fty

eo

n

ga

p

gc

c

gz

ip

mc

f

pa

rse

r

two

lf

vo

rte

x

vp

r

Av

g

slo

wd

ow

ns

1.36X

LBA baseline LBA optimized

Slowdown =application execution time w/o lifeguard

application execution time w/ lifeguard

Page 27: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 27

Applying Our Techniques One by One

AddrCheck TaintCheckTaintCheck w/ detailed

tracking

LockSetMemCheck

3.23

1.901.02

7.80

6.05

3.813.27 3.36

2.291.36

4.21

2.71

1.51

4.253.20

1.40

0.01.02.03.04.05.06.07.08.09.0

10.0B

AS

E

MT

LB

MT

LB+

IF

BA

SE

MT

LB

MT

LB+

IT

MT

LB+

IT+

IF

BA

SE

MT

LB

MT

LB+

IT

BA

SE

MT

LB

MT

LB+

IT

BA

SE

MT

LB

MT

LB+

IF

ave

rage

slo

wd

owns

• IT, IF, and M-TLB are indeed complementary

• Achieve dramatically better performance

Page 28: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 28

PIN-Based Analysis: IT

0102030405060708090

100

bzi

p2

cra

fty

eo

n

ga

p

gcc

gzi

p

mcf

pa

rse

r

two

lf

vort

ex

vpr

red

uc

ed

up

da

te e

ve

nts

(%

)

• IT removes 35.8% to 82.0% of the propagation events

Page 29: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 29

PIN-Based Analysis: IF

0

10

20

30

40

50

60

70

80

8 16 32 64 128 256number of filter entries

redu

ced

chec

k ev

ents

(%

)

0

10

20

30

40

50

60

70

80

8 16 32 64 128 256number of filter entries

redu

ced

chec

k ev

ents

(%

)

fully-assoc16-way8-way4-way2-way1-way

AddrCheck LockSet

• IF can effectively reduce check events

• 4-way works as well as fully-associative

Page 30: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 30

Conclusion

• Our focus: Instruction-Grain Lifeguards

• Three complementary hardware techniques:– Metadata-TLB (M-TLB)– Inheritance Tracking (IT)– Idempotent Filters (IF)

• Flexible to support a wide range of lifeguards– Reducing overheads by 2-3X in our experiments– Achieving 2-51% overheads for all but MemCheck

Page 31: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 31

Thank you!

Page 32: Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.

Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 32

People Working on LBA ProjectIntel Research:• Shimin Chen• Phillip B. Gibbons

University Faculty:• Babak Falsafi (EPFL)• Todd C. Mowry (CMU)

CMU Students:• Michelle Goodstein• Olatunji Ruwase

Previous Contributors:• Limor Fix (IRP)• Steve Schlosser (IRP)• Anastasia Ailamaki (CMU)• Greg Ganger (CMU)

• Bin Lin (Northwestern)• Radu Teodorescu (UIUC)

• Theodoros Strigkos• Evangelos Vlachos

• Vijaya Ramachandran (UT Austin)

• Mike Kozuch• Michael Ryan