Linux on eMMC- Optimizing for Performance

37
CONFIDENTIAL INFORMATION Intrinsyc Software Linux on eMMC Optimizing for Performance Ken Tough Principal Engineer [email protected]

description

Linux on eMMC- Optimizing for Performance

Transcript of Linux on eMMC- Optimizing for Performance

Page 1: Linux on eMMC- Optimizing for Performance

CONFIDENTIAL INFORMATION

Intrinsyc Software

Linux on eMMC

Optimizing for Performance

Ken Tough

Principal Engineer

[email protected]

Page 2: Linux on eMMC- Optimizing for Performance

2CONFIDENTIAL INFORMATION

What is eMMC?

Solid state storage device on MMC bus

Chip on PCB

NAND flash based

Page 3: Linux on eMMC- Optimizing for Performance

3CONFIDENTIAL INFORMATION

Why eMMC matters

Popular on embedded devices

Cheap

Flexible

Page 4: Linux on eMMC- Optimizing for Performance

4CONFIDENTIAL INFORMATION

eMMC characteristics

Fast read access

Fast read seek times

Acceptable sequential write performance

Poor random write performance

Page 5: Linux on eMMC- Optimizing for Performance

CONFIDENTIAL INFORMATION

MMC

Micro-Controller

Slower NANDFlash

(Erase Blocks)

Slower NAND Flash

(Erase Blocks)

Slower NAND Flash

(Erase Blocks)

Slower NAND Flash

(Erase Blocks)

SRAM

Fast CacheFlash

MMC

Bus

Inside

Firmware

Page 6: Linux on eMMC- Optimizing for Performance

6CONFIDENTIAL INFORMATION

Inside the eMMC

NAND flash arranged in pages

Controller with temporary storage

Wear levelling

Free space management

Page 7: Linux on eMMC- Optimizing for Performance

7CONFIDENTIAL INFORMATION

Discard (TRIM)

eMMC TRIM command

Tells controller what is free

TRIM blocks on format

Page 8: Linux on eMMC- Optimizing for Performance

8CONFIDENTIAL INFORMATION

eMMC scenarios

Tablets, smart phones with lots of DRAM

Netbooks with lots of DRAM

Multimedia players, USB memory sticks

Page 9: Linux on eMMC- Optimizing for Performance

9CONFIDENTIAL INFORMATION

eMMC spec performance

Typically emphasizes sequential write performance

Random accesses hit eMMCs internal pipelines

Frequently limited by eMMC’s Random IOPs limit

Minimum OP time regardless of OP size

Not often data BW limited

~200 IOPs (e.g. 4kB per OP)

Analyze application’s eMMC read/writes patterns

Page 10: Linux on eMMC- Optimizing for Performance

10CONFIDENTIAL INFORMATION

Cache is King

Alleviates write performance issues

Improves read times even further

Reduces NAND wear

Page 11: Linux on eMMC- Optimizing for Performance

11CONFIDENTIAL INFORMATION

Areas of Focus

User space

Filesystem type

Filesystem layout

IO Scheduler

Block IO & Cache

MMC bus driver

EMMC

MMC/Block Device

Block Device

IO Scheduler

Filesystem Filesystem

User User User

Page 12: Linux on eMMC- Optimizing for Performance

12CONFIDENTIAL INFORMATION

MMC driver

Maximum bandwidth enabled (8-bit, 50MHz)

Enable DMA if option

Power management

Trim / vendor command support

Benchmarking Log

Page 13: Linux on eMMC- Optimizing for Performance

13CONFIDENTIAL INFORMATION

Analysis at MMC/Block Level

0

5000

10000

15000

20000

250001 2 4 8 16 32 64

128

256

512

1024

204

8

No

rmal

ize

d C

ou

nt

Sectors per chunk

Histogram of chunk sizes

Reader

Surfing

Random

Page 14: Linux on eMMC- Optimizing for Performance

14CONFIDENTIAL INFORMATION

eMMC Read Times

0

5

10

15

20

25

30

35

0 200 400 600 800 1000 1200

mill

sec

Read Chunk Size (sectors)

Page 15: Linux on eMMC- Optimizing for Performance

15CONFIDENTIAL INFORMATION

eMMC Write Times

0

500

1000

1500

2000

2500

0 200 400 600 800 1000 1200

mil

lise

c

Write Chunk Size (sectors)

Page 16: Linux on eMMC- Optimizing for Performance

16CONFIDENTIAL INFORMATION

Wide variation in read/write times

Big dependency on internal eMMC firmware

Power Class support

Geometry / technology

Trim support

Vendor Performance

Page 17: Linux on eMMC- Optimizing for Performance

17CONFIDENTIAL INFORMATION

Allows reads to bypass long writes

Useful in very specific applications

Small RAM

Page/Block cache and IO Scheduler

Internal eMMC Pipelines blocked anyway

Multimedia apps and “long” buffering

MMC v4 High Priority Interrupt

Page 18: Linux on eMMC- Optimizing for Performance

18CONFIDENTIAL INFORMATION

Filesystems

Focus on write performance

Tests run using fsbench (3.0 kernel, OMAP3 aka Nook Color)

Various low-level and high-level scenarios modelled

EXT4, BTRFS, NILFS2 tested

Page 19: Linux on eMMC- Optimizing for Performance

19CONFIDENTIAL INFORMATION

Filesystem Benchmarks

Page 20: Linux on eMMC- Optimizing for Performance

20CONFIDENTIAL INFORMATION

Page 21: Linux on eMMC- Optimizing for Performance

21CONFIDENTIAL INFORMATION

Page 22: Linux on eMMC- Optimizing for Performance

22CONFIDENTIAL INFORMATION

Page 23: Linux on eMMC- Optimizing for Performance

23CONFIDENTIAL INFORMATION

Page 24: Linux on eMMC- Optimizing for Performance

24CONFIDENTIAL INFORMATION

EXT4 - a write

Journal write (usually ~16K)

inode update (usually 4K)

Data goes into page cache

Page 25: Linux on eMMC- Optimizing for Performance

25CONFIDENTIAL INFORMATION

BTRFS - a write

Update non-sync very fast

Sync write puts tree leaves on eMMC

Sync write is 4 non-sequential writes

Page 26: Linux on eMMC- Optimizing for Performance

26CONFIDENTIAL INFORMATION

NILFS2 - a write

Log structured filesystem

Stores the ‘update’

One large (40K+) write

Eventually “snapshot” needs flushing

Initialization

Recovery

Page 27: Linux on eMMC- Optimizing for Performance

27CONFIDENTIAL INFORMATION

EXT4 w/o journal

Not too dangerous on embedded systems with battery

Good performance due to improved sequentiality

Page 28: Linux on eMMC- Optimizing for Performance

28CONFIDENTIAL INFORMATION

BTRFS

If not using a lot of fsync/fdatasync

Great large write performance

Terrible on small/medium sync writes

Good performance on multiple writes

Page 29: Linux on eMMC- Optimizing for Performance

29CONFIDENTIAL INFORMATION

NILFS2

Consistent performance

Potentially much faster if eMMC part has fast sequential performance

Should theoretically be the fastest :-)

Page 30: Linux on eMMC- Optimizing for Performance

30CONFIDENTIAL INFORMATION

EXT4 with journal

If journaling is needed, consider RAM journal device

Again RAM journal not as dangerous as you think

Better than BTRFS on small/medium sync writes

Page 31: Linux on eMMC- Optimizing for Performance

31CONFIDENTIAL INFORMATION

I/O schedulers

CFQ, noop, deadline

Results are similar within ~10% range

QOS considerations are more important than throughput

Page 32: Linux on eMMC- Optimizing for Performance

32CONFIDENTIAL INFORMATION

Filesystem layout

No swap

Align partitions to erase block boundaries

Extents match erase blocks

System design (multiple storage devices)

Page 33: Linux on eMMC- Optimizing for Performance

33CONFIDENTIAL INFORMATION

User space

Avoid synchronization on files

Avoid sync/fsync/fdatasync/etc

Avoid small writes to files, better to buffer

Don’t be afraid to read, be afraid to write!

Page 34: Linux on eMMC- Optimizing for Performance

34CONFIDENTIAL INFORMATION

Future

Linaro project (www.linaro.org) working on improving eMMC experience

eMMC 4.5 brings METADATA

Page 35: Linux on eMMC- Optimizing for Performance

35CONFIDENTIAL INFORMATION

Summary

User space

Filesystem type

Filesystem layout

IO Scheduler

Block IO & Cache

MMC bus driver

EMMC

MMC/Block Device

Block Device

IO Scheduler

Filesystem Filesystem

User User User

Page 36: Linux on eMMC- Optimizing for Performance

36CONFIDENTIAL INFORMATION

Conclusion

EXT4 (discard, ram/no journal) is probably your best bet

Try out a couple of configurations for the eMMC you are targeting

Benchmark per Vendor

Avoid writes! :-)

Page 37: Linux on eMMC- Optimizing for Performance

37CONFIDENTIAL INFORMATION

Questions?