Chalmers 2011

download Chalmers 2011

of 146

Transcript of Chalmers 2011

  • 8/13/2019 Chalmers 2011

    1/146

    Eliminating the Hardware/Software Divide

    Satnam Singh, Microsoft Research Cambridge, UK

  • 8/13/2019 Chalmers 2011

    2/146

    !

    IRQ, NMI

  • 8/13/2019 Chalmers 2011

    3/146

  • 8/13/2019 Chalmers 2011

    4/146

  • 8/13/2019 Chalmers 2011

    5/146

  • 8/13/2019 Chalmers 2011

    6/146

  • 8/13/2019 Chalmers 2011

    7/146

  • 8/13/2019 Chalmers 2011

    8/146

  • 8/13/2019 Chalmers 2011

    9/146

  • 8/13/2019 Chalmers 2011

    10/146

  • 8/13/2019 Chalmers 2011

    11/146

  • 8/13/2019 Chalmers 2011

    12/146

  • 8/13/2019 Chalmers 2011

    13/146

  • 8/13/2019 Chalmers 2011

    14/146

    locks

    monitors

    condition variablesspin locks

    priority inversion

  • 8/13/2019 Chalmers 2011

    15/146

  • 8/13/2019 Chalmers 2011

    16/146

  • 8/13/2019 Chalmers 2011

    17/146

  • 8/13/2019 Chalmers 2011

    18/146

  • 8/13/2019 Chalmers 2011

    19/146

  • 8/13/2019 Chalmers 2011

    20/146

    multiple

    independent

    multi-ported

    memories

    fine-grain

    parallelism

    and

    pipelining

    hard and softembedded

    processors

  • 8/13/2019 Chalmers 2011

    21/146

    LUTs are just higher order functions

    i o

    lut1

    oi1

    i0

    lut2 lut3

    i0

    i1

    i2

    o

    lut4

    i0

    i1

    i2i3

    o

    inv = lut1not

    and2 = lut2(&&)

    mux = lut3(ls d0 d1 . ifs thend1 elsed0)

  • 8/13/2019 Chalmers 2011

    22/146

    XC6VLX760 758,784 logic cells, 864 DSP blocks,

    1,440 dual ported 18Kb RAMs

    32-bit

    integer

    Adder

    (32/474,240)

    >700MHz

    332x1440

    14820 sim-adds

    1,037,400,000,000additions/second

  • 8/13/2019 Chalmers 2011

    23/146

  • 8/13/2019 Chalmers 2011

    24/146

    XD2000i FPGA in-socketaccelerator for Intel FSB XD2000F FPGA in-socketaccelerator for AMD socket F XD1000 FPGA co-processormodule for socket 940

  • 8/13/2019 Chalmers 2011

    25/146

  • 8/13/2019 Chalmers 2011

    26/146

    Case StudySpam Filtering(Alessandro Forin, MSR Redmond)

    Benchmark

    ~50,000 regular expressions fromForefront Team (snapshot from

    their Exchange server in Aug 09)

    Performance Up to 6000x faster than standard Intel processors

    Capable of processing at line rate of gigabit Ethernet

    Power Requirement

    7

    10 watts rather than 200++ watts

  • 8/13/2019 Chalmers 2011

    27/146

    27

    Software Version FPGA Version

    ~6000 Messages/Sec~1 Message/Sec

  • 8/13/2019 Chalmers 2011

    28/146

  • 8/13/2019 Chalmers 2011

    29/146

  • 8/13/2019 Chalmers 2011

    30/146

    Ren Mller (ETH)

    FPGAs + SQL [VLDB]

  • 8/13/2019 Chalmers 2011

    31/146

    CPU FPGA

  • 8/13/2019 Chalmers 2011

    32/146

  • 8/13/2019 Chalmers 2011

    33/146

  • 8/13/2019 Chalmers 2011

    34/146

    Speed Grade -1 -2 -3With Layout 270MHz 320MHz 362MHzWithout Layout 210MHz 260MHz 280MHz

    541 seconds

    1896 seconds

  • 8/13/2019 Chalmers 2011

    35/146

    opportunityscientific computingdata mining

    search

    image processing

    financial analytics

    challenge

  • 8/13/2019 Chalmers 2011

    36/146

    The Accidental Semi-colon

  • 8/13/2019 Chalmers 2011

    37/146

    publicstaticint[] SequentialFIRFunction(int[] weights, int[] input)

    {

    int[] window = newint[size];

    int[] result = newint[input.Length];

    // Clear to window of x values to all zero.

    for(intw = 0; w < size; w++)

    window[w] = 0;

    // For each sample...

    for(inti = 0; i < input.Length; i++){

    // Shift in the new x value

    for(intj = size - 1; j > 0; j--)

    window[j] = window[j - 1];

    window[0] = input[i];

    // Compute the result value

    intsum = 0;

    for(intz = 0; z < size; z++)

    sum += weights[z] * window[z];

    result[i] = sum;

    }

    returnresult;

    }

  • 8/13/2019 Chalmers 2011

    38/146

    PLDI 1998

  • 8/13/2019 Chalmers 2011

    39/146

    PLDI 2003

    1

    2

    3

    4

    5

  • 8/13/2019 Chalmers 2011

    40/146

    PLDI 2010

    0

    10

    20

    30

    40

    50

    60

    70

    80

    1 2 3 4 5 6

    Series1

    Series2

  • 8/13/2019 Chalmers 2011

    41/146

    POPL 1998

  • 8/13/2019 Chalmers 2011

    42/146

    POPL 2002

  • 8/13/2019 Chalmers 2011

    43/146

    POPL 2010

  • 8/13/2019 Chalmers 2011

    44/146

  • 8/13/2019 Chalmers 2011

    45/146

    ray of light

    Signal

    Liquid

    Metal

    PRET-C

    Bluespec

    Feldspar

    Accelerator

    RapidMind /Ct

    Streams-C

    Esterel

    SHIM

  • 8/13/2019 Chalmers 2011

    46/146

    universallanguage?

    embeddedhigh level

    software

    FPGA

    GPU

    DSP

    machine

    learning

    grand unification

    theorypolygots

    Gannet

    DSLs

  • 8/13/2019 Chalmers 2011

    47/146

    Our High Level Synthesis Projects

    Kiwi: concurrent

    C# programs for

    control-oriented

    applications

    [David Greaves,

    Univ. Cambridge]

    shape analysis:

    synthesis of

    dynamic data

    structures (C)[MPI and CMU]

    Accelerator/FPGA:

    synthesis of data

    parallel programs in

    C++/C#/F#[MSR Redmond]

    HLINQ

    eDSLs

    [Gavin Bierman]

    + compilation of self-recursive Haskell functions to FPGA circuits!

  • 8/13/2019 Chalmers 2011

    48/146

    Redmond Accelerator TeamBarry Bond

    Kerry Hammil

    Lubomir Litchev

  • 8/13/2019 Chalmers 2011

    49/146

    Effort vs. Reward

    loweffort

    low

    reward

    higheffort

    high

    reward

    mediumeffort

    medium

    reward

    CUDA

    OpenCL

    HLSL

    DirectComputeAccelerator

  • 8/13/2019 Chalmers 2011

    50/146

    Accelerator

  • 8/13/2019 Chalmers 2011

    51/146

    Application.EXE Accelerator.DLL

    Windows on Intel/AMD

    processor

    DX9

    GPU

    (ATI, Nvidia, )

  • 8/13/2019 Chalmers 2011

    52/146

    openSystemopenMicrosoft.ParallelArraysletmain(args) =

    letx = newFloatParallelArray (Array.map float32 [|1; 2; 3; 4; 5|])lety = newFloatParallelArray (Array.map float32 [|6; 7; 8; 9; 10|])letz = x + yusedx9Target = newDX9Target()letzv = dx9Target.ToArray1D(z)printf "%A\n"zv

    0

  • 8/13/2019 Chalmers 2011

    53/146

    openSystemopenMicrosoft.ParallelArraysletmain(args) =

    letx = newFloatParallelArray (Array.map float32 [|1; 2; 3; 4; 5|])lety = newFloatParallelArray (Array.map float32 [|6; 7; 8; 9; 10|])letz = x + yusesse3Target = newX64MulticoreTarget()letzv = sse3Target.ToArray1D(z)printf "%A\n"zv

    0

  • 8/13/2019 Chalmers 2011

    54/146

    openSystemopenMicrosoft.ParallelArrays

    letmain(args) =letx = newFloatParallelArray (Array.map float32 [|1; 2; 3; 4; 5|])lety = newFloatParallelArray (Array.map float32 [|6; 7; 8; 9; 10|])letz = x + yusefpgaTarget = newFPGAMulticoreTarget()fpgaTarget.ToArray1D(z)0

  • 8/13/2019 Chalmers 2011

    55/146

  • 8/13/2019 Chalmers 2011

    56/146

  • 8/13/2019 Chalmers 2011

    57/146

    [1; 2; 3; 4; 5]

    FloatParallelArray

    CPU Address Space

    F# Array

    GPU Address Space

    Encapsulated

    Data-parallel

    array

    x

    [6; 7; 8; 9; 10]

    FloatParallelArray

    y

    100010101101011010

    x+yGPU code

    GPU memory

    GPU code

    y

    [7; 9; 11; 13; 15]

    F# Array

  • 8/13/2019 Chalmers 2011

    58/146

    usingSystem;

    usingMicrosoft.ParallelArrays;

    namespaceAddArraysPointwise{

    classAddArraysPointwiseDX9{

    staticvoidMain(string[] args)

    {varx = newFloatParallelArray(new[] {1.0F, 2, 3, 4, 5});vary = newFloatParallelArray(new[] {6.0F, 7, 8, 9, 10});vardx9Target = newDX9Target();varz = x + y;foreach(variindx9Target.ToArray1D (z))

    Console.Write(i+ " ");Console.WriteLine();}

    }}

  • 8/13/2019 Chalmers 2011

    59/146

    moduleMainwhere

    importAccelerator

    x = fpa [1.0, 2.0, 3.0, 4.0, 5.0]y = fpa [6.0, 7.0, 8.0, 9.0, 10.0]

    z = x + y

    main= dodx9Target

  • 8/13/2019 Chalmers 2011

    60/146

  • 8/13/2019 Chalmers 2011

    61/146

  • 8/13/2019 Chalmers 2011

    62/146

  • 8/13/2019 Chalmers 2011

    63/146

  • 8/13/2019 Chalmers 2011

    64/146

  • 8/13/2019 Chalmers 2011

    65/146

    rX *

    pa

    Shift

    (0,0)k[0]

    +

    +

    *

    Shift

    (0,1) k[1]

    +

    letrecconvolve (shifts : int ->int [])

    (kernel : float32 []) i(a : FloatParallelArray)= lete = kernel.[i] * ParallelArrays.Shift(a, shifts i)ifi = 0thene

    else

    e + convolve shifts kernel (i-1) a

  • 8/13/2019 Chalmers 2011

    66/146

  • 8/13/2019 Chalmers 2011

    67/146

  • 8/13/2019 Chalmers 2011

    68/146

  • 8/13/2019 Chalmers 2011

    69/146

  • 8/13/2019 Chalmers 2011

    70/146

    publicstaticint[] SequentialFIRFunction(int[] weights, int[] input)

    {

    int[] window = newint[size];int[] result = newint[input.Length];

    // Clear to window of x values to all zero.

    for(intw = 0; w < size; w++)

    window[w] = 0;

    // For each sample...

    for(inti = 0; i < input.Length; i++){

    // Shift in the new x value

    for(intj = size - 1; j > 0; j--)

    window[j] = window[j - 1];

    window[0] = input[i];

    // Compute the result value

    intsum = 0;for(intz = 0; z < size; z++)

    sum += weights[z] * window[z];

    result[i] = sum;

    }

    returnresult;

    }

  • 8/13/2019 Chalmers 2011

    71/146

  • 8/13/2019 Chalmers 2011

    72/146

  • 8/13/2019 Chalmers 2011

    73/146

    shift (x, 0) = [7, 2, 5, 9, 3, 8, 6, 4] = x

    shift (x, -1) = [7, 7, 2, 5, 9, 3, 8, 6]

    shift (x, -2) = [7, 7, 7, 2, 5, 9, 3, 8]

  • 8/13/2019 Chalmers 2011

    74/146

    y = [y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7]]

    = a[0] * [x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]] +a[1]* [x[-1], x[0], x[1], x[2], x[3], x[4], x[5], x[6]] +

    a[2] * [x[-2], x[-1], x[0], x[1], x[2], x[3], x[4], x[5]] +

    a[3]* [x[-3], x[-2], x[-1], x[0], x[1], x[2], x[3], x[4]] +

    a[4]* [x[-4], x[-3], x[-2], x[-1], x[0], x[1], x[2], x[3]]

    y= a[0] * shift (x, 0) +

    a[1] * shift (x, -1) +

    a[2] * shift (x, -2) +

    a[3] * shift (x, -3) +a[4] * shift (x, -4)

  • 8/13/2019 Chalmers 2011

    75/146

    usingMicrosoft.ParallelArrays;usingA= Microsoft.ParallelArrays.ParallelArrays;namespaceAcceleratorSamples{

    publicclassConvolver{

    publicstaticfloat[] Convolver1D(TargetcomputeTarget,

    float[] a, float[] x){

    varxpar = newFloatParallelArray(x);varn = x.Length;varypar= newFloatParallelArray(0.0f, new[] { n });for(inti= 0; i< a.Length; i++)

    ypar+= a[i] * A.Shift(xpar, -i);float[] result = computeTarget.ToArray1D(ypar);returnresult;

    }}

    }

    for(inti= 0; i< a.Length; i++)ypar+= a[i] * A.Shift(xpar, -i);

  • 8/13/2019 Chalmers 2011

    76/146

  • 8/13/2019 Chalmers 2011

    77/146

    usingSystem;usingSystem.Linq;usingMicrosoft.ParallelArrays;namespaceAcceleratorSamples{

    staticclassConvolver2D{

    staticFloatParallelArrayconvolve(thisFloatParallelArraya,Func shifts, float[] kernel)

    {returnkernel

    .Select((k, i) => k * ParallelArrays.Shift(a, shifts(i)))

    .Aggregate((a1, a2) => a1 + a2);}staticFloatParallelArrayconvolveXY(thisFloatParallelArrayinput, float[] kernel){

    returninput

    .convolve(i => new[] { -i, 0}, kernel)

    .convolve(i => new[] { 0, -i }, kernel);}staticvoidMain(string[] args){

    constintinputSize = 10;varrandom = newRandom(42);varinputData = newfloat[inputSize, inputSize];for(introw= 0; row< inputSize; row++)

    for(intcol= 0; col< inputSize; col++)inputData[row, col] = (float)random.NextDouble() * random.Next(1, 100);

    vartestKernel = new[] { 2F, 5, 7, 4, 3};vardx9Target = newDX9Target();varinputArray = newFloatParallelArray(inputData);varresult = dx9Target.ToArray2D(inputArray.convolveXY(testKernel));for(varrow= 0; row< inputSize; row++){

    for(intcol= 0; col< inputSize; col++)Console.Write("{0} ", result[row, col]);

    Console.WriteLine();}

    }}

    }

    staticFloatParallelArrayconvolve(thisFloatParallelArraya,Func shifts,float[] kernel)

    {returnkernel.Select((k, i) => k * ParallelArrays.Shift(a, shifts(i)))

    .Aggregate((a1, a2) => a1 + a2);}

    staticFloatParallelArrayconvolveXY(thisFloatParallelArrayinput,float[] kernel)

    { returninput

    .convolve(i => new[] { -i, 0}, kernel)

    .convolve(i => new[] { 0, -i }, kernel);}

  • 8/13/2019 Chalmers 2011

    78/146

    openSystemopenMicrosoft.ParallelArrays[]letmain(args) =

    // Declare a filter kernel for the convolution lettestKernel = Array.map float32 [| 2; 5; 7; 4; 3|]// Specify the size of each dimension of the input arrayletinputSize = 10// Create a pseudo-random number generatorletrandom = Random (42)

    // Declare a psueduo-input data arraylettestData = Array2D.init inputSize inputSize (funi j ->float32 (random.NextDouble() *

    float (random.Next(1, 100))))

    // Create an Accelerator float parallel array for the F# input array usetestArray = new FloatParallelArray(testData)// Declare a function to convolve in the X or Y direction letrecconvolve (shifts : int ->int []) (kernel : float32 []) i (a : FloatParallelArray)

    = lete = kernel.[i] * ParallelArrays.Shift(a, shifts i)ifi = 0then

    eelse

    e + convolve shifts kernel (i-1) a// Declare a 2D convolverletconvolveXY kernel input

    = // First convolve in the X direction and then in the Y directionletconvolveX = convolve (funi ->[| -i; 0|]) kernel (kernel.Length - 1) inputletconvolveY = convolve (funi ->[| 0; -i |]) kernel (kernel.Length - 1) convolveXconvolveY

    // Create a DX9 target and use it to convolve the test input usedx9Target = newDX9Target()letconvolveDX9 = dx9Target.ToArray2D (convolveXY testKernel testArray)printfn "DX9: -> \r\n%A"convolveDX90

    letconvolveXY kernel input= // First convolve in the X direction and then in Y

    letconvolveX = convolve (funi ->[| -i; 0|]) kernel(kernel.Length - 1) inputletconvolveY = convolve (funi ->[| 0; -i |]) kernel

    (kernel.Length - 1) convolveXconvolveY

  • 8/13/2019 Chalmers 2011

    79/146

    0

    5

    10

    15

    20

    25

    0 5 10 15 20 25 30 35 40 45

    speedupo

    veronec

    ore

    kernel size

    x64 multicore target benchmark for 2D convolver

    (24 core server Xeon E7540)

    6 core speedup

    12 core speedup

    18 core speedup

    24 core speedup

  • 8/13/2019 Chalmers 2011

    80/146

  • 8/13/2019 Chalmers 2011

    81/146

    Convolver

  • 8/13/2019 Chalmers 2011

    82/146

  • 8/13/2019 Chalmers 2011

    83/146

    8.249ns max delay

    3 x DSP48Es

    63 slice registers

    24 slice LUTs

  • 8/13/2019 Chalmers 2011

    84/146

  • 8/13/2019 Chalmers 2011

    85/146

  • 8/13/2019 Chalmers 2011

    86/146

  • 8/13/2019 Chalmers 2011

    87/146

  • 8/13/2019 Chalmers 2011

    88/146

  • 8/13/2019 Chalmers 2011

    89/146

    DRAM

  • 8/13/2019 Chalmers 2011

    90/146

    technology node 130nm CMOS

    (2006)

    45nm CMOS

    (2008)

    transfer 32b

    across-chip

    20 computations 57 computations

    transfer 32boff-chip 260 computations 1300 computations

    Power of Computation vs. Communication

    numbers derived from work by W. Dally, Stanford

  • 8/13/2019 Chalmers 2011

    91/146

  • 8/13/2019 Chalmers 2011

    92/146

  • 8/13/2019 Chalmers 2011

    93/146

    Virtex-6 XC6VLX240T-1

    317MHz

    Area 6%

    108 DSP48E1 (out of a768)

    2.7W (at 25C)

    1,110mega-samples per second

    cf. CUDA version on 470GTX at

    552 mega-samples per second

    (single precision)

  • 8/13/2019 Chalmers 2011

    94/146

    Kiwi Thesis

    thread

    2

    thread

    3

    thread

    1

  • 8/13/2019 Chalmers 2011

    95/146

  • 8/13/2019 Chalmers 2011

    96/146

    http://upload.wikimedia.org/wikipedia/fr/b/b4/Mono_project_logo.svg
  • 8/13/2019 Chalmers 2011

    97/146

    http://www.amazon.com/gp/product/images/0470052627/ref=dp_image_0?ie=UTF8&n=283155&s=books
  • 8/13/2019 Chalmers 2011

    98/146

  • 8/13/2019 Chalmers 2011

    99/146

  • 8/13/2019 Chalmers 2011

    100/146

    http://msdn.microsoft.com/en-us/devlabs/dd491992.aspx
  • 8/13/2019 Chalmers 2011

    101/146

    Kiwi

    structural imperative (C)parallel

    imperative

    gate-level

    VHDL/Verilog KiwiC-to-

    gates

    &0

    0

    0

    Q

    QSET

    CLR

    S

    R

    ;

    ;

    ;

    jpeg.cthread

    2

    thread

    3

    thread

    1

  • 8/13/2019 Chalmers 2011

    102/146

  • 8/13/2019 Chalmers 2011

    103/146

  • 8/13/2019 Chalmers 2011

    104/146

  • 8/13/2019 Chalmers 2011

    105/146

    Kiwi

    Library

    Kiwi.cs

    circuit

    model

    JPEG.cs

    Visual Studio

    multi-thread simulation

    debuggingverification

    Kiwi Synthesis

    circuit

    implementation

    JPEG.v

  • 8/13/2019 Chalmers 2011

    106/146

    parallel

    program

    C#

    Thread 1

    Thread 2

    Thread 3

    Thread 3

    C to

    gates

    C to

    gates

    C to

    gates

    C to

    gates

    circuit

    circuit

    circuit

    circuit

    Verilog

    for system

  • 8/13/2019 Chalmers 2011

    107/146

    Ports and Clockspublicst ticclass I2C

    { [OutputBitPort("scl")]

    st ticbool scl;[InputBitPort("sda_in")]

    st ticbool sda_in;[OutputBitPort("sda_out")]

    st ticbool sda_out;[OutputBitPort("rw")]

    st ticbool rw;

    circuit portsidentified by

    custom attribute

  • 8/13/2019 Chalmers 2011

    108/146

    publicst ticint max2(int a, int b){ int result;if(a > b)

    result = a;elseresult = b;

    returnresult;}

    .method public hidebysig staticint32max2(int32 a,

    int32 b) cil managed{// Code size 12 (0xc).maxstack 2

    .locals init ([0] int32 result)IL_0000: ldarg.0IL_0001: ldarg.1IL_0002: ble.s IL_0008

    IL_0004: ldarg.0

    IL_0005: stloc.0IL_0006: br.s IL_000a

    IL_0008: ldarg.1IL_0009: stloc.0IL_000a: ldloc.0IL_000b: ret

    }

    max2(3, 7)

    stack

    local memory

    0

    3

    7

    7

    7

  • 8/13/2019 Chalmers 2011

    109/146

    Writing to a Channel

    publicclassChannel

    {

    T datum;

    boolempty = true;

    publicvoidWrite(T v)

    {

    lock(this)

    {

    while(!empty)

    Monitor.Wait(this);

    datum = v;empty = false;

    Monitor.PulseAll(this);

    }

    }

  • 8/13/2019 Chalmers 2011

    110/146

    Reading from a Channel

    publicT Read()

    {

    T r;

    lock(this)

    {

    while(empty)

    Monitor.Wait(this);

    empty = true;

    r = datum;

    Monitor.PulseAll(this);}

    returnr;

    }

  • 8/13/2019 Chalmers 2011

    111/146

  • 8/13/2019 Chalmers 2011

    112/146

    systems level concurrency constructs

    threads, events, monitors, condition variables

    rendezvous join patternstransactional

    memory

    data

    parallelism

    userapplications

    domain specificlanguages

  • 8/13/2019 Chalmers 2011

    113/146

  • 8/13/2019 Chalmers 2011

    114/146

    Filter Example

    thread one-place

    channel

  • 8/13/2019 Chalmers 2011

    115/146

    Transposed Filter

  • 8/13/2019 Chalmers 2011

    116/146

  • 8/13/2019 Chalmers 2011

    117/146

    Inter-thread Communication and

    Synchronization

    // Create the channels to link together the taps

    for(intc = 0; c < size; c++){

    Xchannels[c] = newKiwi.Channel();

    Ychannels[c] = newKiwi.Channel();

    Ychannels[c].Write(0); // Pre-populate y-channel registers with zeros

    }

  • 8/13/2019 Chalmers 2011

    118/146

    // Connect up the taps for a transposed filter

    for(inti = 0; i < size; i++)

    {

    intj = i; // Quiz: why do we need the local j?

    ThreadtapThread = newThread(delegate() { Tap(j, weights[j],

    Xchannels[j],

    Ychannels[j],

    Ychannels[j+1]); });

    tapThread.Start();}

  • 8/13/2019 Chalmers 2011

    119/146

  • 8/13/2019 Chalmers 2011

    120/146

  • 8/13/2019 Chalmers 2011

    121/146

  • 8/13/2019 Chalmers 2011

    122/146

    t ti bli id h ()

  • 8/13/2019 Chalmers 2011

    123/146

    staticpublicvoidecho(){

    tx_sof_n = !false; // We are not at the start of a frametx_src_rdy_n = !false;

    tx_eof_n = !false; // We are not at the end of a frameboolstart = !rx_sof_n && !rx_src_rdy_n; // The start conditioninti, j;booldoneReading;

    while(true) // Process packets indefinately{

    // Wait for SOF and SRC_RDYwhile(!start)

    {Kiwi.Pause(); // Wait for a clock tickstart = !rx_sof_n && !rx_src_rdy_n; // Check for start of frame

    }// Read in the entire framei = 0;doneReading = false;

    // Read the remaining bytes

    while(!doneReading){

    if(!rx_src_rdy_n){

    buffer[i] = rx_data;i++;

    }doneReading = !rx_eof_n;Kiwi.Pause();

    }

  • 8/13/2019 Chalmers 2011

    124/146

  • 8/13/2019 Chalmers 2011

    125/146

    C#

    softprocessor

  • 8/13/2019 Chalmers 2011

    126/146

  • 8/13/2019 Chalmers 2011

    127/146

    fib :: Int -> Intfib 0 = 0fib 1 = 1

    fib n= n1 + n2wheren1 = fib (n

    1)n2 = fib (n - 2)

  • 8/13/2019 Chalmers 2011

    128/146

    STATE 1FREEPRECASE

    ds1 := dsCASEds1WHEN1 =>RETURN 1

    WHEN0 =>RETURN0

    WHENothers =>v0 := ds1 - 2RECURSE[v0] 2 [ds1]

    ENDCASESTATE 3 FREE n2n1 := resultInt

    v2 := n1 + n2RETURNv2

    STATE 2 FREE ds1n2 := resultIntv1 := ds1 - 1RECURSE[v1] 3 [n2]

  • 8/13/2019 Chalmers 2011

    129/146

  • 8/13/2019 Chalmers 2011

    130/146

  • 8/13/2019 Chalmers 2011

    131/146

  • 8/13/2019 Chalmers 2011

    132/146

  • 8/13/2019 Chalmers 2011

    133/146

  • 8/13/2019 Chalmers 2011

    134/146

  • 8/13/2019 Chalmers 2011

    135/146

  • 8/13/2019 Chalmers 2011

    136/146

    relocation viavirtualization???

  • 8/13/2019 Chalmers 2011

    137/146

    + encryption + virtualization +

    data-processing

    no standard ABIno FPGA-kernel-userspace model

    The cloud is just an extension ofexisting OS paradigms FPGAs getleft behind they lack abstractionboundaries

    Split Trust

  • 8/13/2019 Chalmers 2011

    138/146

    Split Trust

    managingphysical devicevs.

    usinga physical devicemanagement

    domain?

  • 8/13/2019 Chalmers 2011

    139/146

    FPGAs Improve Cloud Security

  • 8/13/2019 Chalmers 2011

    140/146

  • 8/13/2019 Chalmers 2011

    141/146

  • 8/13/2019 Chalmers 2011

    142/146

  • 8/13/2019 Chalmers 2011

    143/146

  • 8/13/2019 Chalmers 2011

    144/146

  • 8/13/2019 Chalmers 2011

    145/146

  • 8/13/2019 Chalmers 2011

    146/146