LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) :...

32
Soyeon Park, Sangho Lee, Wen Xu, Hyungon Moon and Taesoo Kim LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY PROTECTION KEYS (INTEL MPK)

Transcript of LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) :...

Page 1: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

Soyeon Park, Sangho Lee, Wen Xu, Hyungon Moon and Taesoo Kim

LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY PROTECTION KEYS (INTEL MPK)

Page 2: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

INTRODUCTION

SECURITY CRITICAL MEMORY REGIONS NEED PROTECTION

▸ JIT page“To achieve code execution, we can simply locate one of these RWX JIT pages and overwrite it with our own shellcode.” - [1]

▸ Personal information ▸ Password ▸ Private key

“We confirmed that all individuals used only the Heartbleed exploit to obtain the private key.” - [2]

[1] Amy Burnett, et al. “Weaponization of a Javascriptcore vulnerability” RET2 Systems Engineering Blog [2] Nick Sullivan “The Results of the CloudFlare Challenge” CloudFlare Blog

Page 3: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

INTRODUCTION

EXAMPLE 1 - HEARTBLEED ATTACK

Private key

Heartbleed request

Leaked data including private key Web Server

Reply “HELLO” (1000 bytes)

1000 bytes

H E L L O · · ·

· · · Private key ·

· ·

Page 4: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

INTRODUCTION

EXAMPLE 1 : EXISTING SOLUTION TO PROTECT MEMORY▸ Process separation

Process

MEMORY

[1] Song, Chengyu, et al. "Exploiting and Protecting Dynamic Code Generation”, NDSS 2015. [2] Litton, James, et al. "Light-Weight Contexts: An OS Abstraction for Safety and Performance”, OSDI 2016.

Process

MEMORY

Page 5: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

INTRODUCTION

EXAMPLE 2 - EXISTING SOLUTION TO PROTECT JIT PAGE

Process

mprotect(W)

Write code

mprotect(RX)Code Cache

Execute WriteWrite

▸ JIT page W^X protection

Page 6: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

INTRODUCTION

PROBLEMS OF EXISTING SOLUTIONS

▸ Process Separation

▸ W^X Protection

High overhead to spawn new process and synch data

Race condition due to permission synchronization

Multiple cost to change permission of multiple pages

This talk: utilizing a hardware mechanism, Intel Memory Protection Key (MPK), to address these challenges

Page 7: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

OUTLINE

OUTLINE▸ Introduction

▸ Intel MPK Explained ▸ Challenges ▸ Design ▸ Implementation ▸ Evaluation ▸ Discussion ▸ Related Work ▸ Conclusion

Page 8: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

INTEL MPK EXPLAINED

OVERVIEW

▸ Support fast permission change for page groups with single instruction ▸ Fast single invocation ▸ Fast permission change for multiple pages

Kernel

mprotect Intel MPK

Userspace

Late

ncy

(ms)

0

4.5

9

13.5

18

Number of pages1000 6000 11000 16000 21000 26000 31000 36000

mprotect (contiguous)mprotect (sparse)

Page 9: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

INTEL MPK EXPLAINED

UNDERLINE IMPLEMENTATION

Kernel

pkey 2 <- R/W page 120 -> R/W

pkey 2 <- R page 120 -> R

page # pkey ··· perm.120 2 ··· R/W···

< Page table>

WRPKRU

RDPKRUpkey_mprotect

32-bit register

16 pkeys

R W R W

▸ Permissions per cpu ▸ 32-bit PKRU register contains keys/perm ▸ WRPKRU: write key/perm ▸ RDPKRU: read key/perm

Page 10: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

RWX

function init() pkey = pkey_alloc() pkey_mprotect(code_cache, len, RWX, pkey)

function JIT() WRPKRU(pkey, W) ... write code cache ... WRPKRU(pkey, R)

function fini() pkey_free(pkey)

INTEL MPK EXPLAINED

EXAMPLE - JIT PAGE W^X PROTECTION

Write code in code cache

Grant permission

Revoke permission

pkey = 1

CODE CACHERWX

R1

W

PKRU Register1

Page 11: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

INTEL MPK EXPLAINED

EXAMPLE : EXECUTABLE-ONLY MEMORY

function init() pkey = pkey_alloc() pkey_mprotect(code_cache, len, RWX, pkey)

function JIT() WRPKRU(pkey, W) ... write code cache ... WRPKRU(pkey, R)

function fini() pkey_free(pkey)

pkey

CODE CACHERWX

Page 12: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

INTEL MPK EXPLAINED

function init() pkey = pkey_alloc() pkey_mprotect(code_cache, len, RWX, pkey)

function JIT() WRPKRU(pkey, W) ... write code cache ... WRPKRU(pkey, None)

function fini() pkey_free(pkey)

CODE CACHERWX RWX

pkey

EXAMPLE : EXECUTABLE-ONLY MEMORY

Page 13: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

OUTLINE

OUTLINE▸ Introduction ▸ Intel MPK Explained

▸ Challenges ▸ Non-scalable Hardware Resource ▸ Asynchronous Permission Change

▸ Design ▸ Implementation ▸ Evaluation ▸ Discussion ▸ Related Work ▸ Conclusion

Page 14: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

CHALLENGES

NON-SCALABLE HARDWARE RESOURCE▸ Only 16 keys are provided

Process

Write code cache 1

W

R

1 2 3 4

5 … 16 17

Write code cache 16

W

R

Write code cache 17

W

R

pkey 1

pkey 2

pkey 3

pkey 4

pkey 5

pkey 16

pkey 1

pkey 16

pkey ?

Page 15: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

Process

CHALLENGES

ASYNCHRONOUS PERMISSION CHANGE - PROS▸ Permission change with MPK is per-thread intrinsically

Code Cache

Write Code Cache

W

R

RXRX

RX

RX

RX

Page 16: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

Process

CHALLENGES

ASYNCHRONOUS PERMISSION CHANGE - PROS

Code Cache

Write Code Cache

W

R

pkey

RXRX

RX

RX

W

Write

▸ Permission change with MPK is per-thread intrinsically

Page 17: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

CHALLENGES

ASYNCHRONOUS PERMISSION CHANGE - CONS▸ Permission synchronization is necessary in some context

Process

Code Cache

Write Code Cache

W

None pkey

RXRX

RX

RX

X

Page 18: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

CHALLENGES

Process

Code Cache

Write Code Cache

W

None pkey

RXRX

RX

RX

X

Read

ASYNCHRONOUS PERMISSION CHANGE - CONS▸ Permission synchronization is necessary in some context

Page 19: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

DESIGN

REVISIT : CHALLENGES

▸ Non-scalable Hardware Resources

▸ Asynchronous Permission Change

Key virtualization solve by key indirection.

libmpk provide permission synchronization API

Page 20: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

Library

Application

DESIGN

KEY VIRTUALIZATION

▸ Decoupling physical keys from user interface ▸ Key indirection working like cache

Write code

W

R

pkey 1 pkey 16 pkey ?vkey 1 vkey 16 vkey 17

pkey 1 pkey 16

Write code

W

R

Write code

W

R

😱

😊

Evicted

Page 21: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

➊ call mpk_mprotect()

pkey_sync

DESIGN

INTER-THREAD PERMISSION SYNCHRONIZATION

Userspace

Kernel

THREAD A STATE : RUNNING

THREAD B STATE : RUNNING

➍ return➌ interrupt

SLEEP

➋ add hooks

task_work

WRPKRU

➎ update PKRU (rescheduled)

RUNNING

RX RXX X

Page 22: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

IMPLEMENTATION

IMPLEMENTATION▸ libmpk is written in C/C++ ▸ Userspace library : 663 LoC ▸ Kernel support : 1K LoC ▸ Permission Synchronization ▸ Kernel module for managing metadata ▸ Userspace cannot fabricate metadata

‣ We open source at https://github.com/sslab-gatech/libmpk

Page 23: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

function init() vkey = libmpk_mmap(&code_cache, len, RWX)

function JIT() libmpk_begin(vkey, W) ... write code cache ... libmpk_end(vkey) libmpk_mprotect(vkey, X)

CODE CACHERWX

IMPLEMENTATION

USE CASE - JIT PAGE W^X PROTECTION

RWX

XX

XX

Key virtualization

Permission synchronization

Page 24: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

OUTLINE

OUTLINE▸ Introduction ▸ Intel MPK Explained ▸ Challenges ▸ Design ▸ Implementation ▸ Evaluation ▸ Usability ▸ Checking overhead occurred by design ▸ Use cases - applying for memory isolation and protection

▸ Discussion ▸ Related Work ▸ Conclusion

Page 25: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

EVALUATION

LIBMPK IS EASY TO ADOPT

▸ OpenSSL (83 LoC) : protecting private key ▸ Memcached (117 LoC) : protecting slabs ▸ Chakracore (10 LoC) : protecting JIT pages

Page 26: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

EVALUATION

LATENCY - KEY VIRTUALIZATION

▸ Cache miss costs overhead due to eviction

0.0

0.8

1.5

2.3

3.0

0 25 50 75 100Hit rate

Tim

e (μ

s)

HitMiss

mprotect

Reasonable overhead while providing similar functionality.

Page 27: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

EVALUATION

LATENCY - INTER-THREAD PERMISSION SYNCHRONIZATION

▸ Performance ▸ 1,000 pages : 3.8x ▸ Single page : 1.7x

Late

ncy (μs

)

0

10

20

30

40

Number of threads

1 5 10 15 20 25 30 35 40

mpk_mprotectmprotect (4KB)mprotect (4000KB)

libmpk outperform mprotect regardless of the number of pages.

Page 28: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

EVALUATION

FAST MEMORY ISOLATION - OPENSSL & MEMCACHED▸ OpenSSL ▸ request/sec: 0.53%

slowdown

requ

est/

sec

0

375

750

1125

1500

Size of each request (KB)1 2 4 8 16 32 64 128 256 512 1024

original libmpk

Kbyt

e/se

c

0

125

250

375

500

Number of connections

250 500 750 1000

original mpk_inthreadmpk_synch mprotect

▸ For 1GB protection : ▸ original vs mpk_inthread :

0.01% ▸ mpk_synch vs mprotect :

8.1x

Page 29: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

EVALUATION

FAST AND SECURE W⊕X - JIT COMPILATION▸ Chakracore ▸ mprotect-based protection ▸ Allows race-condition attack

▸ 4.39% performance improvement (31.11% at most)

Norm

alize

d Sc

ore

0.70

0.85

1.00

1.15

1.30

RICH

ARDS

DELT

ABLU

ECR

YPTO

RAYT

RACE

EARL

EYBO

YER

REGE

XP

SPLA

YSP

LAYL

ATEN

CYNA

VIER

STOK

ESPD

FJS

MANDR

EEL

MANDR

EELL

ATEN

CYGA

MEBOY

CODE

LOAD

BOX2

D

ZLIB

TYPE

SCRI

PT

mprotect libmpk

Page 30: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

DISCUSSION

DISCUSSION▸ Rogue data cache load (Meltdown) ▸ MPK is also affected by the Meltdown attack ▸ Hardware or software-level mitigation

▸ Code reuse attack ▸ Arbitrary executed WRPKRU may break the security ▸ Applying sandboxing or control-flow integrity

▸ Protection key use-after-free ▸ pkey_free does not perfectly free the protection key ▸ Pages are still associated with the pkey after free

Page 31: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

RELATED WORK

RELATED WORK▸ ERIM [1] : Secure wrapper of MPK ▸ Shadow Stack [2] : Shadow stack protected by MPK ▸ XOM-Switch [3] : Code-reuse attack prevention with

execute-only memory supported by MPK

[1] Anjo Vahldiek-Oberwagner, et al. “ERIM: Secure, Efficient In-Process Isolation with Memory Protection Keys”, Security 2019 [2] Nathan Burow, et al. “Shining Light on Shadow Stacks”, Oakland 2019 [3] Mingwei Zhang, et al. “XOM-Switch: Hiding Your Code From Advanced Code Reuse Attacks in One Shot”, Black Hat Asia 2018

Page 32: LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY …libmpk-slides.pdf · Chakracore (10 LoC) : protecting JIT pages . EVALUATION LATENCY - KEY VIRTUALIZATION Cache miss costs overhead

CONCLUSION

CONCLUSION▸ libmpk is a secure, scalable, and

synchronizable abstraction of MPK for supporting fast memory protection and isolation with little effort.

THANKS!https://github.com/sslab-gatech/libmpk