Dziuba OSCON 2011 Data Fsync File Io

download Dziuba OSCON 2011 Data Fsync File Io

of 26

Transcript of Dziuba OSCON 2011 Data Fsync File Io

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    1/26

    Not proprietary or confidential. In fact, youre risking a career by listening to me.

    What Every Data ProgrammerNeeds to Know about Disks

    Ted Dziuba

    @dozba

    [email protected]

    OSCON Data July, 2011 - Portland

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    2/26

    Who are you and why are you talking?

    A few years ago: Technical troll for The Register.

    Recently: Co-founder of Milo.com, local shopping engine.

    Present: Senior Technical Staff for eBay Local

    First job: Like college but they pay you to go.

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    3/26

    The Linux Disk Abstraction

    Volume

    /mnt/volume

    File System

    xfs, ext

    Block Device

    HDD, HW RAID array

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    4/26

    What happens when you read from a file?

    f = open(/home/ted/not_pirated_movie.avi, rb)

    avi_header = f.read(56)

    f.close()

    user

    buffer

    page

    cache

    Disk

    controllerplatter

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    5/26

    What happens when you read from a file?

    user

    buffer

    page

    cache

    Disk

    controller platter

    Main memory lookup

    Latency: 100 nanoseconds

    Throughput: 12GB/sec on good hardware

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    6/26

    What happens when you read from a file?

    user

    buffer

    page

    cache

    Disk

    controller platter

    Needs to actuate a physical device

    Latency: 10 milliseconds

    Throughput: 768 MB/sec on SATA 3(Faster if you have a lot of money)

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    7/26

    Sidebar: The Horror of a 10ms Seek Latency

    A disk read is 100,000 times slower than a memory read.

    100 nanoseconds

    Time it takes you to write a really clever tweet

    10 milliseconds

    Time it takes to write a novel, working full time

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    8/26

    What happens when you write to a file?

    f = open(/home/ted/nosql_database.csv, wb)

    f.write(key)

    f.write(,)f.write(value)

    f.close()

    user

    buffer

    page

    cache

    Disk

    controllerplatter

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    9/26

    What happens when you write to a file?

    f = open(/home/ted/nosql_database.csv, wb)

    f.write(key)

    f.write(,)f.write(value)

    f.close()

    user

    buffer

    page

    cache

    Disk

    controllerplatter

    You need to make this

    part happen

    Mark the page dirty,

    call it a day and go have a smoke.

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    10/26

    Aside: Stick your finger in the Linux Page Cache

    Clear your page cache: echo 1 > /proc/sys/vm/drop_caches

    Dirty pages: grep i dirty /proc/meminfo

    Pre-Linux 2.6 used pdflush, now per-Backing Device Info (BDI) flush threads

    /proc/sys/vm Love:

    dirty_expire_centisecs : flush old dirty pages

    dirty_ratio: flush after some percent of memory is useddirty_writeback_centisecs: how often to wake up and start flushing

    Crusty sysadmins hail-Mary pass: sync; sync; sync

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    11/26

    Fsync: force a flush to disk

    f = open(/home/ted/nosql_database.csv, wb)

    f.write(key)

    f.write(,)f.write(value)

    os.fsync(f.fileno())

    f.close()

    user

    buffer

    page

    cache

    Disk

    controllerplatter

    Also note, fsync() has a cousin, fdatasync() that does not sync metadata.

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    12/26

    Aside: point and laugh at MongoDB

    Mongos fsync command:

    > db.runCommand({fsync:1,async:true});

    wat.

    Also supports journaling, like a WAL in the SQL world, however

    It only fsyncs() the journal every 100msfor performance.

    Its not enabled by default.

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    13/26

    Fsync: bitter lies

    f = open(/home/ted/nosql_database.csv, wb)

    f.write(key)

    f.write(,)f.write(value)

    os.fsync(f.fileno())

    f.close()

    user

    buffer

    page

    cache

    Disk

    controllerplatter

    Drives will lie to you.

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    14/26

    Fsync: bitter lies

    page

    cache

    Disk

    controller

    its a cache!

    Two types of caches: writethrough and writebackWritebackis the demon

    platter

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    15/26

    (Just dropped in) to see what condition your caches are in

    Disk

    controller platter

    No controller cache Writeback cache on disk

    A Typical Workstation

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    16/26

    (Just dropped in) to see what condition your caches are in

    Disk

    controller platter

    Writethrough cache

    on controller

    Writethrough cache on disk

    A Good Server

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    17/26

    (Just dropped in) to see what condition your caches are in

    Disk

    controller platter

    Battery-backed writeback

    cache on controller

    Writethrough cache on disk

    An Even Better Server

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    18/26

    (Just dropped in) to see what condition your caches are in

    Disk

    controller platter

    Battery-backed writeback

    cache or

    Writethrough cache

    Writeback cache on disk

    The Demon Setup

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    19/26

    Disks in a virtual environment

    The Trail of Tears to the Platter

    user

    buffer

    page

    cache

    Virtual

    controller

    platterHost

    page

    cache

    Physical

    controller

    Hypervisor

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    20/26

    Disks in a virtual environment

    Why EC2 I/O is Slow and Unpredictable

    Image Credit: Ars Technica

    Shared Hardware

    Physical DiskEthernet ControllersSouthbridge

    How are the caches configured?How big are the caches?

    How many controllers?How many disks?RAID?

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    21/26

    Aside: Amazon EBS

    Please stop doing this.

    MySQL Amazon EBS

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    22/26

    Whats Killing That Box?

    ted@u235:~$ iostat -x

    Linux 2.6.32-24-generic (u235) 07/25/2011 _x86_64_ (8 CPU)

    avg-cpu: %user %nice %system %iowait %steal %idle

    0.15 0.14 0.05 0.00 0.00 99.66

    Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz %util

    sda 0.00 3.27 0.01 2.38 0.58 45.23 19.21 0.24

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    23/26

    Cool Hardware Tricks

    Beginner Hardware Trick: SSD Drives

    0 1 2 3

    SATA

    SSD

    $/GB

    $2.50/GB vs 7.5c/GBNegligible seek time vs 10ms seek timeNot a lot of space

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    24/26

    Cool Hardware Tricks

    Intermediate Hardware Trick: RAID Controllers

    Standard RAID ControllerSSD as writeback cacheBattery-backed

    Adaptec MaxIQ$1,200

    Image Credit: Toms Hardware

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    25/26

    Cool Hardware Tricks

    Advanced Hardware Trick: FusionIO

    SSD Storage on the Northbridge (PCIe)6.0 GB/sec throughput. Gigabytes.30 microsecond latency (30k ns)Roughly $20/GB

    Top-line card > $100,000 for around 5TB

  • 8/2/2019 Dziuba OSCON 2011 Data Fsync File Io

    26/26

    Questions

    Thank Youhttp://teddziuba.com/

    @dozba

    Questions & Heckling