Using TCP/IP Traffic shaping to achieve iSCSI service ... · Using TCP/IP Traffic shaping to...

29
Using TCP/IP Traffic shaping to achieve iSCSI service predictability Paper presentation Jarle Bjørgeengen University of Oslo / USIT November 11, 2010

Transcript of Using TCP/IP Traffic shaping to achieve iSCSI service ... · Using TCP/IP Traffic shaping to...

  • Using TCP/IP Traffic shaping to achieve iSCSIservice predictability

    Paper presentation

    Jarle Bjørgeengen

    University of Oslo / USIT

    November 11, 2010

  • Outline

    About resource sharing in storage devicesLab setup / job setupExperiment illustrating the problemOne half of the solution: the throttleLive demo

    The throttlePart two of the solution: the controller

    How the controller worksConclusion and future work

  • General problem of sharing resources

    QoS bridgeQoS bridge

    QoS bridgeQoS bridge

    QoS bridgeConsumers

    Shared physical resources

    SAN

    Virtual disks

    Centralized storage pool

    Free competition causes unpredictable I/O performancefor any given consumer.

    Remaining capacity affects performance.Storage is managed by sysadminsSysadmins are unable to make keepable promises aboutperformance.

  • Lab setup

    b

    HP SC1010 x 36GB 10k

    vg_perc

    lv_b2

    lv_b3

    lv_b4

    lv_b5

    b2

    b4

    b5

    b3

    iSCSI target(iet)

    TCP Connections

    TCP/IP Layer

    Striped logical volumes. 64KB

    stripe size across 10 disks

    /dev/iscsi_0

    iqn.iscsilab:perc_b2

    iqn.iscsilab:perc_b3

    iqn.iscsilab:perc_b4

    iqn.iscsilab:perc_b5

    Block Layer Block Layer

    bm

    b1Argus

  • Is read response time affected by write activity ?

    b

    lv_b2

    lv_b3

    lv_b4

    lv_b5

    b2

    b4

    b5

    b3

    bm Random readrate=256kB/s

    Seq writefull speed

    Seq writefull speed

    Seq writefull speed

  • The Answer is yes

    Long response times adversely affect application serviceavailability.

    0 100 200 300 400 500Time (s)

    0

    20

    40

    60

    80

    100

    120

    Wai

    t tim

    e (m

    s)

    No interference 1 thread (1 machines)3 threads (3 machines)12 threads (3 machines)

  • Throttling method

    SYNSYN+ACK

    Initiator Target

    ACK

    Write

    Timeline without delay

    ACK

    WriteACK

    WriteACK

    WriteACK

    Time

    SYNSYN+ACK

    Initiator Target

    ACK

    Write

    ACK

    Write

    ACK

    Write

    ACK

    Write

    ACK

    Timeline with delay

    Throttling delay

  • Relation between packet delay and average rate

    0 0.6 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6

    Introduced delay (ms)

    Tim

    e to

    rea

    d 20

    0MB

    of d

    ata

    (s)

    010

    2030

    40

    0 0.6 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6

    Introduced delay (ms)

    Tim

    e to

    writ

    e 20

    0MB

    of d

    ata

    (s)

    020

    4060

    80

    Write rate 15 MB/s - 2.5 MB/sRead rate 22 MB/s - 5 MB/s

  • Managing consumers

    Need to operate on sets of consumers(throttlable={10.0.0.243,10.0.0.244})Ipset: One rule to match them all� �

    ipset -N $throttlable ipmap --network 10.0.0.0/24ipset -A $throttlable 10.0.0.243ipset -A $throttlable 10.0.0.244iptables --match-set $throttlable dst -j MARK --set-mark $mark� �

    The mark is a step in the range of available packet delays

  • Live demonstration

    Manual throttling and QoS specificationAn automatic QoS policy and automated throttling

  • Dynamic throttling decision

    Figure: Block diagram of a PID controller. Created bySilverStar(at)en.wikipedia. Licensed under the terms of CreativeCommons Attribution 2.5 Generic.

  • Modified PID function

    Start

    Stop

    Calculate Up,Ui,Ud

    Up = Kp× ekUi = Uik−1 +

    Ts×KpTi

    × ek

    Ud = Kp× Td× ek − ek−1Ts

    0 < Ui < Ukmax

    Ui < 0

    Ui > Ukmax

    N

    N

    Ui=0

    Ui=Uik-1

    Uk = Up+Ui+Ud

    0 < Uk < Ukmax

    Y

    Y

    mark = int(ceil(Uk))

    Uk < 0

    Uk > Ukmax Uk=Ukmax

    Uk=0

    N

    Y

    Y

    YY

    N

  • The completely automated approach

    ISCSIMAP

    set_maintaner.pl

    Create

    /proc/net/iet/sessions

    /proc/net/iet/volumes

    IP-sets

    Create & maintain members

    Read

    perf_maintainer.pl

    PDATA

    ReadSaturation indicators

    /proc/diskstats

    Read

    pid_reg.pl

    Read

    pid_threads

    Read

    Spawn($resource)

    Throttles

    Files Shared memory ProcessesLegend:

    lvs Run

    Command Dependency

    Read output

    perf_server.pl

    CMEM

    Throttle values

    gnuplot

  • Impact

    The packet delay throttle is very efficientSolves the throttling need completely for iSCSI (likely otherTCP based storage networks too)

    The modified PID controller is consistently keepingresponse time low in spite of rapidly changing loadinterference.The concept is widely applicable

  • Impact

    The packet delay throttle is very efficientSolves the throttling need completely for iSCSI (likely otherTCP based storage networks too)

    The modified PID controller is consistently keepingresponse time low in spite of rapidly changing loadinterference.The concept is widely applicable

  • Impact

    The packet delay throttle is very efficientSolves the throttling need completely for iSCSI (likely otherTCP based storage networks too)

    The modified PID controller is consistently keepingresponse time low in spite of rapidly changing loadinterference.The concept is widely applicable

  • Future work

    iSCSI disk array Ethernet sw.

    QoS bridgeQoS bridge

    QoS bridgeQoS bridge

    QoS bridgeConsumers

    QoS bridge

    Resource/consumer mapsVirtual disk latencies

    Array specific pluginSNMPGET

    Packet delay throttle with other algorithmsPID controller with other throttles

  • Thanks for the attention !

  • Overhead

    Negligeble overhead introduced by TC filtersDifferences measured 20 timest-test 99% confidence shows 0.4% / 1.7 %• overhead forread/write (worst case)

  • Is response time improved by throttling ?

    0 100 200 300 400 500Time (s)

    0

    20

    40

    60

    80

    Wai

    t tim

    e (m

    s)

    010000

    2000030000

    4000050000

    Aggr

    egat

    ed in

    terfe

    renc

    e (k

    B/s)

    Small job average wait time (Left)Interference aggregated throughput (Right). Throttling period with 4.6 ms delayThrottling period with 9.6 ms delay

  • Automatically controlled wait time

    0 100 200 300 400 500Time (s)

    0

    20

    40

    60

    80

    100

    Aver

    age

    wait

    time

    (ms)

    No regulation20 ms treshold15 ms threshold10 ms threshold

  • The throttled rates

    0 100 200 300 400 500Time (s)

    0

    10000

    20000

    30000

    40000

    50000

    Aggr

    egat

    e W

    rite

    (kB/

    s)

    No regulation20 ms threshold (smoothed)15 ms threshold (smoothed)10 ms threshold (smoothed)

  • Exposing the throttling value

    0 50 100 150 200Time (s)

    0

    10

    20

    30

    40

    50

    (ms)

    010000

    2000030000

    40000

    (kB/

    s)

    vg_aic read wait time with automatic regulation, thresh=15msPacket delay introduced to writersAggregated write rate

  • Effect of the packet delay throttle: Reads

    0 50 100 150 200 250 300Time (s)

    0

    5000

    10000

    15000

    20000

    Read

    (kB/

    s)

    b2b3b4b5

  • Effect of the packet delay throttle: Writes

    0 50 100 150 200 250 300Time (s)

    0

    5000

    10000

    15000

    20000

    Writ

    e (k

    B/s)

    b2b3b4b5

  • The tc delay queues

    110:netem

    parent 1:10limit 1000

    delay 4.1ms

    110:1netem

    111:netem

    parent 1:11limit 1000

    delay 4.6ms

    111:1netem

    112:netem

    parent 1:12limit 1000

    delay 5.1ms

    112:1netem

    113:netem

    parent 1:13limit 1000

    delay 5.6ms

    113:1netem

    114:netem

    parent 1:14limit 1000

    delay 6.1ms

    114:1netem

    115:netem

    parent 1:15limit 1000

    delay 6.6ms

    115:1netem

    116:netem

    parent 1:16limit 1000

    delay 7.1ms

    116:1netem

    117:netem

    parent 1:17limit 1000

    delay 7.6ms

    117:1netem

    118:netem

    parent 1:18limit 1000

    delay 8.1ms

    118:1netem

    119:netem

    parent 1:19limit 1000

    delay 8.6ms

    119:1netem

    120:netem

    parent 1:20limit 1000

    delay 9.1ms

    120:1netem

    121:netem

    parent 1:21limit 1000

    delay 9.6ms

    121:1netem

    12:netem

    parent 1:2limit 1000

    delay 99us

    12:1netem

    13:netem

    parent 1:3limit 1000

    delay 598us

    13:1netem

    14:netem

    parent 1:4limit 1000

    delay 1.1ms

    14:1netem

    15:netem

    parent 1:5limit 1000

    delay 1.6ms

    15:1netem

    16:netem

    parent 1:6limit 1000

    delay 2.1ms

    16:1netem

    17:netem

    parent 1:7limit 1000

    delay 2.6ms

    17:1netem

    18:netem

    parent 1:8limit 1000

    delay 3.1ms

    18:1netem

    19:netem

    parent 1:9limit 1000

    delay 3.6ms

    19:1netem

    1:htbroot

    r2q 10default 1

    direct_packets_stat 4399042ver 3.17

    1:10htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:11htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:12htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:13htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:14htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:15htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:16htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:17htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:18htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:19htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:2htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:20htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:21htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:3htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:4htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:5htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:6htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:7htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:8htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    1:9htb

    prio 0quantum 200000

    rate 8000Mbitceil 8000Mbit

    burst 0b/8mpu 0b

    overhead 0bcburst 0b/8

    mpu 0boverhead 0b

    level 0

    (1) (2)(3) (4)

    (5) (6)(7)

    (8)

    (9) (10) (11) (12)

    (13) (14) (15)(16)

    (17)(18)

    (19)(20)

  • The tc bandwidth queues

    1:htbroot

    r2q 10default 1

    direct_packets_stat 4665509ver 3.17

    1:1htb

    rate 1000Mbitceil 1000Mbit

    burst 130875b/8mpu 0b

    overhead 0bcburst 130875b/8

    mpu 0boverhead 0b

    level 7

    1:10htb

    prio 0quantum 200000rate 550000Kbitceil 550000Kbitburst 277612b/8

    mpu 0boverhead 0b

    cburst 277612b/8mpu 0b

    overhead 0blevel 0

    1:11htb

    prio 0quantum 200000rate 500000Kbitceil 500000Kbitburst 252500b/8

    mpu 0boverhead 0b

    cburst 252500b/8mpu 0b

    overhead 0blevel 0

    1:12htb

    prio 0quantum 200000rate 450000Kbitceil 450000Kbitburst 227418b/8

    mpu 0boverhead 0b

    cburst 227418b/8mpu 0b

    overhead 0blevel 0

    1:13htb

    prio 0quantum 200000rate 400000Kbitceil 400000Kbitburst 202350b/8

    mpu 0boverhead 0b

    cburst 202350b/8mpu 0b

    overhead 0blevel 0

    1:14htb

    prio 0quantum 200000rate 350000Kbitceil 350000Kbitburst 177231b/8

    mpu 0boverhead 0b

    cburst 177231b/8mpu 0b

    overhead 0blevel 0

    1:15htb

    prio 0quantum 200000rate 300000Kbitceil 300000Kbitburst 152137b/8

    mpu 0boverhead 0b

    cburst 152137b/8mpu 0b

    overhead 0blevel 0

    1:16htb

    prio 0quantum 200000rate 250000Kbitceil 250000Kbitburst 127062b/8

    mpu 0boverhead 0b

    cburst 127062b/8mpu 0b

    overhead 0blevel 0

    1:17htb

    prio 0quantum 200000rate 200000Kbitceil 200000Kbitburst 101975b/8

    mpu 0boverhead 0b

    cburst 101975b/8mpu 0b

    overhead 0blevel 0

    1:18htb

    prio 0quantum 200000rate 150000Kbitceil 150000Kbitburst 76875b/8

    mpu 0boverhead 0b

    cburst 76875b/8mpu 0b

    overhead 0blevel 0

    1:19htb

    prio 0quantum 200000rate 100000Kbitceil 100000Kbitburst 51787b/8

    mpu 0boverhead 0b

    cburst 51787b/8mpu 0b

    overhead 0blevel 0

    1:2htb

    prio 0quantum 200000rate 950000Kbitceil 950000Kbitburst 478325b/8

    mpu 0boverhead 0b

    cburst 478325b/8mpu 0b

    overhead 0blevel 0

    1:20htb

    prio 0quantum 200000rate 50000Kbitceil 50000Kbitburst 26693b/8

    mpu 0boverhead 0b

    cburst 26693b/8mpu 0b

    overhead 0blevel 0

    1:21htb

    prio 0quantum 200000rate 45000Kbitceil 45000Kbitburst 24181b/8

    mpu 0boverhead 0b

    cburst 24181b/8mpu 0b

    overhead 0blevel 0

    1:22htb

    prio 0quantum 200000rate 35000Kbitceil 35000Kbitburst 19162b/8

    mpu 0boverhead 0b

    cburst 19162b/8mpu 0b

    overhead 0blevel 0

    1:23htb

    prio 0quantum 200000rate 25000Kbitceil 25000Kbitburst 14146b/8

    mpu 0boverhead 0b

    cburst 14146b/8mpu 0b

    overhead 0blevel 0

    1:24htb

    prio 0quantum 187500rate 15000Kbitceil 15000Kbitburst 9127b/8

    mpu 0boverhead 0b

    cburst 9127b/8mpu 0b

    overhead 0blevel 0

    1:25htb

    prio 0quantum 62500rate 5000Kbitceil 5000Kbitburst 4Kb/8

    mpu 0boverhead 0bcburst 4Kb/8

    mpu 0boverhead 0b

    level 0

    1:3htb

    prio 0quantum 200000rate 900000Kbitceil 900000Kbitburst 453262b/8

    mpu 0boverhead 0b

    cburst 453262b/8mpu 0b

    overhead 0blevel 0

    1:4htb

    prio 0quantum 200000rate 850000Kbitceil 850000Kbitburst 428187b/8

    mpu 0boverhead 0b

    cburst 428187b/8mpu 0b

    overhead 0blevel 0

    1:5htb

    prio 0quantum 200000rate 800000Kbitceil 800000Kbitburst 403100b/8

    mpu 0boverhead 0b

    cburst 403100b/8mpu 0b

    overhead 0blevel 0

    1:6htb

    prio 0quantum 200000rate 750000Kbitceil 750000Kbitburst 378000b/8

    mpu 0boverhead 0b

    cburst 378000b/8mpu 0b

    overhead 0blevel 0

    1:7htb

    prio 0quantum 200000rate 700000Kbitceil 700000Kbitburst 352887b/8

    mpu 0boverhead 0b

    cburst 352887b/8mpu 0b

    overhead 0blevel 0

    1:8htb

    prio 0quantum 200000rate 650000Kbitceil 650000Kbitburst 327762b/8

    mpu 0boverhead 0b

    cburst 327762b/8mpu 0b

    overhead 0blevel 0

    1:9htb

    prio 0quantum 200000rate 600000Kbitceil 600000Kbitburst 302700b/8

    mpu 0boverhead 0b

    cburst 302700b/8mpu 0b

    overhead 0blevel 0

    (1)(2)

    (3)

    (4) (5)

    (6) (7)

    (8)

    (9) (10)

    (11)

    (12) (13)

    (14)(15)

    (16)

    (17)

    (18)

    (19)

    (20) (21)

    (22)

    (23)

    (24)

  • Input signal

    100 200 300 400 500TimeHsL

    20

    40

    60

    80

    100

    Wait HmsL

    Red: Exponential Weighted Moving Average (EWMA)Green: Moving medianL(t) = l(t)α + L(t−1)(1− α)EWMA, also called low pass filter

    m-av-best.swfMedia File (application/x-shockwave-flash)

  • u(t) =

    Continous︷ ︸︸ ︷Kpe(t)︸ ︷︷ ︸

    Proportional

    +KpTi

    t∫0

    e(τ)dτ

    ︸ ︷︷ ︸Integral

    + KpTde′(t)︸ ︷︷ ︸Derivative

    uk = uk−1︸︷︷︸Previous

    + Kp(1 +TTi

    )ek − Kpek−1 +KpTd

    T(ek − 2ek−1 + ek−2)︸ ︷︷ ︸

    Delta︸ ︷︷ ︸Incremental form

    uk = Kpek︸ ︷︷ ︸Proportional

    + ui(k−1) +KpTTi

    ek︸ ︷︷ ︸Integral

    +KpTd

    T(ek − ek−1)︸ ︷︷ ︸

    Derivative︸ ︷︷ ︸Absolute form