AutoSSD: an Autonomic SSD Architecture · AutoSSD: an Autonomic SSD Architecture Bryan S. Kim...

Memory & Storage Architecture Lab.@ Seoul National University

AutoSSD:an Autonomic SSD Architecture

Bryan S. Kim (Seoul National University, Korea)

Hyun Suk Yang (Hongik University, Korea)

Sang Lyul Min (Seoul National University, Korea)


Flash memory ubiquity

NAND flash

memory

Flash-based

storage

Devices and

applications

* Images from various sources via google search

ConclusionDesignBackground Evaluation


Performance issues with SSDs

http://www.zdnet.com/article/why-ssds-dont-perform/

The Tail at Scale, Communications of the ACM, vol 56, no. 2

https://storagemojo.com/2015/06/03/why-its-hard-to-meet-slas-with-ssds/

http://appleinsider.com/articles/11/07/25/performance_variation_found_in_ssds_shipping_with_new_macbook_airs



Performance issues with SSDs

* Graph from SNIA solid state storage performance test specification



All because of garbage collection?

Dean et al, CACM 2013

“[GC] … can increase read latency by a factor of 100 …”

Kim et al, USENIX FAST 2015

“… garbage collection is the source of this problem…”

Kang et al, ACM TECS 2017

“GC induces a long-latency problem …”

Yan et al, USENIX FAST 2017

“The core problem … is the well-known and notorious garbage collection”



FTL tasks as necessary evil

No in-placeupdate

Asymmetricgranularity

Error &disturbance

Bad blocks

FTL

tasks

Mapping

Garbagecollection

Wear-leveling

Readscrubbing

Read retryBad blockmanagement

Flash memory

quirks



Existing approaches

Implement FTL in host

FTL is implemented in the host system

• Fusion I/O, Baidu’s Software-Defined Flash, Open-channel SSD, etc

Exposed flash memory quirks

Add control interface

Host explicitly manages FTL tasks running inside the device

• NVMe’s advanced background operation / predictable latency mode

Ad-hoc interface extension and suboptimal host-centric decisions

Exploit workload idleness

SSD schedules background tasks while host is idle

• HIOS (ISCA ’14), ITEX (TC ‘14), RL-assisted GC (TECS 17)

Heavy dependence on host workload

Reconstruct data using redundancy

When blocked by background tasks, reconstruct data using RAID-like parity

• ttFlash (FAST 17)

Increased internal traffic and reduced storage efficiency



Our approach

Autonomic SSD

Flash

chip

Flash memory

Subsystem

Host system

Flash channel

Flash

chip

Flash

chip

Flash

chip

Flash channel

Flash

chip

Flash

chip

Flash

chip

Flash channel

Flash

chip

Flash

chip

Scheduling

Subsystem

Flash Translation Layer

Host

request

handlingGarbage

collection

Read

scrubbing

Other

mgmt

tasks

Task

queuesFlash

memory

subsystem

queue

Share controller

Share

weight

Key system states

(# of clean blocks,

read count, etc)

1. Make device-centric decisions

2. Work under sustained I/O

3. Be FTL implementation-agnostic



Autonomic SSD

Flash

chip

Flash memory

Subsystem

Host system

Flash channel

Flash

chip

Flash

chip

Flash

chip

Flash channel

Flash

chip

Flash

chip

Flash

chip

Flash channel

Flash

chip

Flash

chip

Scheduling

Subsystem

Flash Translation Layer

Host

request

handlingGarbage

collection

Read

scrubbing

Other

mgmt

tasks

Task

queuesFlash

memory

subsystem

queue

Share controller

Share

weight

Key system states

(# of clean blocks,

read count, etc)

Autonomic SSD

Virtualization of flash memory resources

Simple and effectivescheduler

Dynamic share controlthrough feedback



SSD architecture that self-manages FTL tasks

Virtualization of the flash memory subsystem

Each FTL task is given an illusion of a dedicated flash memory subsystem

• Decouples algorithm and scheduling

• Makes each task oblivious of others

• Allows seamless integration of new FTL tasks

Share enforcement with debit scheduling

Each task is given a share that limits the number of resources it can simultaneously use

• Simple – no complex computation and bookkeeping

• Approximated fairness without explicit tracking of time

• Share-based resource reservation

Feedback control of share

Each task’s share is adjusted reactively to the changes in system states

• Number of free blocks represent the urgency of the garbage collection task

• Maximum read count represents the urgency of the read scrubbing task



Debit scheduling

Debit scheduler• Limits # of outstanding requests per-task

• Increment on issuing request • Decrement on receiving response

• Debit limit is proportional to the share

A0B0

B1

B3

A0

Chip 0

Chip 1

Chip 2

Chip 3

A2

B0 B0B1

Chip 0 queue

Task A queue

Task B queue

Chip 1 queue

Chip 2 queue

Chip 3 queue

Task A

Debit

Debt limit

Task B

5

1 à 2

3

3

Issue task A’s request

to chip 2

Chip 0

queue full

At max

debt

A0B0

B1A0

Chip 0

Chip 1

Chip 2

Chip 3

A2B0 B0B1

Chip 0 queue

Task A queue

Task B queue

Chip 1 queue

Chip 2 queue

Chip 3 queue

Task A

Debit

Debt limit

Task B

5

2

3

2 à 3

Issue ta

sk B’s r

equest

to chip 1

Chip 0

queue full

Task B’s

request

completed



Share controller

Share controller• Monitors key system states

• # free blocks for GC• Max read count for RS

• Adjusts shares to maintain stable states

DebitScheduler

Host requests

GC requests

Scheduled requests

Sys state:# free blks,

Max read cnt

Share controller

Host share

GC share

RS requests

RS share



Share controller

Share controller• Monitors key system states

• # free blocks for GC• Max read count for RS

• Adjusts shares to maintain stable states


Control function:

SA[t] = PA ∙ eA [t] + IA ∙ SA [t-1]

Share @ t Share @ t-1Error @ t

Proportional coeff (0≤PA)

Integral coeff (0≤IA<1)


Evaluation environment & methodology

Storage system configuration

200GB storage with 28% over-provisioning

Garbage collection: reclaims space for writes

Read scrubbing: preventatively migrates data before data loss

Map caching: selectively keeps mapping data in memory

Workload configuration

Synthetic I/O

8 real-world I/O traces collected from MS production servers

• With original dispatch time

• With half the original dispatch time (2x intensity)



I/O trace results

Vanilla : schedule in order of arrivalRAIN : RAID-like parity + prioritized schedulingQoSFC : WFQ + P controlAutoSSD : debit + PI control


0

0.2

0.4

0.6

0.8

1

1.2

DAP-DS DAP-PS DTRS LM-TBE MSN-CFS MSN-BEFS RAD-AS RAD-BE geomean

No

rm.

6 n

ines

Qo

S

11.32.033.482.171.942.13 2.791.922.69

0

0.2

0.4

0.6

0.8

1

1.2

DAP-DS DAP-PS DTRS LM-TBE MSN-CFS MSN-BEFS RAD-AS RAD-BE geomean

Norm

. avg.

RT

1.6918% reduction

for MSN-BEFS

77% reduction

for RAD-AS

43% reduction

on average51% reduction

for LM-TBE


RAD-AS LM-TBE

Microscopic view


0

1

2

3

4

5

35405 35407 35409 35411 35413

Av

erag

e re

spo

nse

tim

e (m

s)

Time (seconds)

RAIN QoSFC AutoSSD

0

10

20

30

40

0

50

100

150

200

35405 35407 35409 35411 35413

GC

sh

are

Nu

mb

er o

f fr

ee b

lock

s

Time (seconds)

# of free blocks GC share

0

1

2

3

4

5

55090 55094 55098 55102 55106

Av

erag

e re

spo

nse

tim

e (m

s)

Time (seconds)

RAIN QoSFC AutoSSD

0

10

20

30

40

0

25

50

75

100

55090 55094 55098 55102 55106

RS

sh

are

Max

rea

d c

ou

nt

x 1

00

0

Time (seconds)

Max read count RS share


Static share vs. dynamic share

MSN-BEFS


0.995

0.996

0.997

0.998

0.999

1

0 0.5 1 1.5 2

Cum

ula

tive

pro

bab

ilit

y

Response time (ms)

GC: Dynamic share

GC: 5% share

GC: 10% share

GC: 20% share


Conclusion

SSD architecture that self-manages FTL tasks

Virtualization of flash memory resources

Share enforcement with debit scheduling

Feedback control of share


Autonomic SSD architecture

reduces average response time

by up to 18.0%

reduces 99.9% QoS

by up to 67.2%

reduces 99.9999% QoS

by up to 77.6%


Thank you!


AutoSSD: an Autonomic SSD Architecture · AutoSSD: an Autonomic SSD Architecture Bryan S. Kim...

Documents

Transcript of AutoSSD: an Autonomic SSD Architecture · AutoSSD: an Autonomic SSD Architecture Bryan S. Kim...