Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and...

20
Mapper Machine Model for the R-Stream Compiler For Software Version 3.3.3

Transcript of Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and...

Page 1: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

Mapper Machine Model for the R-Stream

Compiler

For Software Version 3.3.3

Page 2: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

PrefaceContacting

Reservoir LabsCall: 212.780.0527Fax: 212.780.0542Email: [email protected] inquiry form:https://www.reservoir.com/support/r-stream-support/To report a bug:https://www.reservoir.com/cgi-bin/bugzilla3/

Copyrights Copyright c© 2009-2013 Reservoir Labs, Inc. All rights reserved.

Disclaimer The content of this document is provided for informational use only, is subject tochange without notice, and should not be construed as a commitment by ReservoirLabs, Inc.

Mapper Machine ModelThis document describes R-Stream’s hierarchical, heterogeneous mapper machinemodel, version 2.1 .

Introduction Since version 3 [LLM+08], the behaviour of the R-Stream compiler is driven by amapper machine model (MMM) that supports both heterogeneous machines andhierarchical mapping. This new MMM serves two major roles in the mapper:

(i) it describes the target system architecture in terms of the relevant part of thesystem components, and

(ii) it describes the programming model(s) and system capabilities that are tobe exploited by the mapper.

The latter point is particularly important: an architecture may support multipleexecution models, but knowledge of such is not apparent when given just thesystem description. Thus, to guide the operation of the mapper, these must beexplicitly stated. The MMM thus provides an association between the physicalelements of the system and the possible mapping(s).

Overview The machine model represents a tree of abstract processing entities (also called“abstract processors”). For instance, a PC with a multicore x86 and a GPU deviceis represented as a tree in Figure 1. The processing entities that are non-leaf inthe tree are abstract, while the leaves are concrete (they represent an actual typeof processor).

Each abstract processing entity itself defines a graph that represents the struc-ture of the underlying physical machine: its memories, communication links andprocessors, and how they are connected.

Consider the Cell processor, whose architecture is depicted in Figure 2. Inthe machine model, the Cell chip - an abstract processor - defines the way eightprocessors (called SPUs) and one host processor (called PPU) are connected to

Page 3: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

Figure 1: The machine model viewed as a tree of abstract processing entities.

memories (L1 cache, L2 cache, the SPUs’ scratchpads and global memory) andDMA links, as represented on Figure 3. In this example, the Cell abstract proces-sor itself is not represented, and both SPUs and PPUs are concrete processors.

Details of how memories, data transfer links and processors are defined aregiven in Section .

In addition to declaring them, a machine model must define how the entitiesare connected within an abstract processor, but also how to program the abstractprocessor. An additional aspect is the role various processors play in a mappingprocess. In Cell for instance, the PPU acts as a host, which can spawn threads onthe SPUs (which play the role of “processing elements” (PEs)), and on which onewould typically execute all the sequential code. In the current machine model,special entities called “morphs” are responsible for describing both the machinemodel topology and the processor “roles.”

XML format The external textual representation of the MMM is in Java’s properties XMLformat. It is basically a set of type/key/value triplets.

The rules are as follows:

• All attributes are mandatory.

• XML input files are composable and overridable into a single machinemodel.

• R-Stream also allows overriding of specific XML entries via the commandline.

Page 4: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

Figure 2: CELL Architecture.

Basics entities The basic entities in the new machine model are (abstract) processors, memoriesand communication links (links for short).

Processor entities stand for computation engines, including scalar processors,SIMD processors and multiprocessors. These are usually structured as a multi-dimensional grid called its geometry. An abstract processor represents a systemthat may be made of several entities as a notional processor entity. It is mainlydefined by its geometry and the processors it “includes”. Processors are a subclassof abstract processors and hence inherit all their attributes. Memory entities standfor data-caches, instruction-caches, combined data and I-caches, and scratchpadmemories. Link entities stand for explicit communication protocols that requiresoftware control, such as DMA.

All physical entities in a machine model are named, and multiple attributesmay be attached to them. Attributes can be of the types byte, char, short,int, long, float, double, String and arrays of these. Arrays in writtenform are quoted by [ and ] and delimited by commas.

Figures 4 through 7 show the supported attributes for machine model entities.When different processor entities have access to a same memory, it is some-

times convenient for the user to specify that this memory should be used as the“working memory” of one particular processor entity (potentially leaving the otherone with none “of its own”). The proper_mems attribute of abstract processorsspecifies this.

parameter_passing defines whether parameters are passed implicitly(through the function call that spawns a thread on p) or explicitly (through a sep-arate mechanism).

Page 5: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

Figure 3: CELL machine model.

Key Attribute Typeaproc.p.geometry Processor geometry int[]aproc.p.included_procs Set of aprocs directly AbstractProcessor

included in this aproc.aproc.p.proper_mems “proper memories” for p1. Memory[]aproc.p.parameter_passing Parameters passing mode. implicit or

explicit.aproc.p.time_sharing Whether p can time-share boolean

threads

Figure 4: Abstract processor (aproc) attributes.

Page 6: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

Key Attribute Typeproc.p.SIMD_width SIMD width in bits intproc.p.SIMD_alignment SIMD alignment in bits intproc.p.int_registers Number of integer registers intproc.p.fp_registers Number of fp registers intproc.p.funit_types Types of functional units String[]proc.p.funit_issues_per_cycle Issue rates of fu. double[]proc.p.instruction_size Instruction size in bytes intproc.p.addressable_unit Smallest addressable unit in bytes int

Figure 5: Processor (proc attributes.

Key Attribute Typemem.m.size Size of memory (banks) in bytes long []mem.m.bank_names Names of the memory banks String []mem.m.cache_level Cache level intmem.m.cache_line_size Cache line size in bytes intmem.m.tlb_miss_cost TLB miss cost for caches intmem.m.speed A notion of memory speed intmem.m.data Whether it stores data booleanmem.m.instructions Whether it stores instructions booleanmem.m.options Used to define non-standard memory attributes. String[]

Figure 6: Memory (mem) attributes.

Page 7: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

Supported options for memories:

host_allocated specifies that data residing in this memory must be allocated by the host.

cuda_constant specifies that the memory is an nVidia device constant memory.

Links are connections among processing elements and memories which re-quire the use of an explicit communication protocol to operate. This includesDMA engines, higher level abstractions such as MPI [For03], or anything in be-tween. The capabilities of such links are specified in a Link object. Currentsupported link attributes are listed in Figure 7.

Key Attribute Typelink.l.strided_overhead Overhead per message intlink.l.strided_bandwidth Bandwidth in bytes/cycle intlink.l.indexed_overhead Overhead per message intlink.l.indexed_bandwidth Bandwidth in bytes/cycle intlink.l.has_local_strides Whether it can scatter locally boolean

Figure 7: Link attributes.

Machine modeltopology

The current machine model gives the role of defining how entities of the machinemodel are connected (its topology) to the “morph” entities. 2

The way entities of the machine model are connected is through a set of (di-rected) edges of different types:

data edges specify which memories an entity gets their data from. They can go from aprocessor to a memory, between memories, between a memory and the datatransfer link it uses to get (and put) its data, and between a data transfer linkand the memory it gets (and puts) its data from.

thread edges specify that a processor can start the execution of code (“threads”) on an-other processor.

control edges specify which processor controls (i.e., has to execute commands to) a datatransfer link.

Data edges are the default type of edge.

Driving theR-Streammapper

(Morphs)

Morphs drive the way programs get parallelized by R-Stream onto the machinerepresented by the machine model.

A “backend” is specified through a keyword that implicitly defines the pro-gramming model (API, language, various quirks of the target that can’t be ex-pressed in the regular entities of the machine model). For instance, OpenMP,CUDA, and SWARM are supported backend names.

Also, part of the execution model is specified by defining three special rolesto abstract or concrete processors:

2In future versions, it will likely be part of abstract processors

Page 8: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

root this is the abstract processor that includes everything related to this morph.This is what links a morph to its abstract processor.

processing element (PE) this is the set of processors on which parallel execution of code will happen.

host this is the processor on which everything that doesn’t get parallelized to theprocessing elements runs. Very often, the host also spawns threads on theprocessing elements, but not necessarily.

Three different models can be specified by setting the host to:

• A processor that is neither the root or the PE. This is the most straigthfor-ward model, where there is actually a separate host. Example: Cell.

• One of the PE(s). This specifies that there is no separate host processor, butone of the PEs can run a master thread as well as a slave thread. Example:multicore x86.

• The root. This specifies that there is no host processor, and that the non-parallelized code has to run on one of the PEs.

Finally, morphs are structured in a hierarchy (they can have “submorphs”),which provides the aforementioned details needed by R-Stream to apply its map-ping process hierarchically.

Figure 8 describes the attributes associated with a morph.

Key Attribute Typemorph.m.backend Name of the backend Stringmorph.m.host Name of host processor Stringmorph.m.PEs Name of processing elements String []morph.m.host_can_synchronize Whether the host can synchronize booleanmorph.m.PEs_can_synchronize Whether the PEs can synchronize boolean[]morph.m.topology A list of edges String[]morph.m.options List of options String[]morph.m.submorphs List of submorphs String[]

Figure 8: Morph attributes.

At this point, R-Stream does not make any use of the options field ofmorphs.

In hierarchical mapping, the set of morphs form a tree, with sub-morphs rep-resenting submapping problems after the high level mapping problems have beencompleted.

Modular MMMdesign

MMM files support the inclusion of other files through include entries.For instance, the following entry includes XML files that define a GPU’s

CUDA cores (cc), streaming multiprocessors (sm) and a GPU board (expressedas an abstract processor).

Page 9: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

<entry key="include">[cc-proc.xml,sm-proc.xml,cuda-device-proc.xml]</entry>An unenforced naming convention is that abstract processors (along with their

relevant memories and links) are defined in files ending in -proc.xml, whilewhole machine models have the -mapper.xml ending.

Examples In this section we describe some example MMMs.

CELL

The first example is a STI CELL system [MD05, GHF+05, GHF+06] containinga host processor and 8 PEs (see Figure 2).

The system architecture is abstracted into the entity graph as shown in Fig-ure 3.

In terms of XML, the MMM can be specified as follows. First, we declare thename of the machine model and its version number.

<entry key="name">Cell</entry><entry key="version">2.1</entry>

A typical system has only one PowerPC processor (ppu). The processor’sattributes and its memory and caches are defined as follows:

<!-- 1 ppu --><entry key="proc.ppu.geometry">[1]</entry><entry key="proc.ppu.SIMD_width">128</entry><entry key="proc.ppu.SIMD_alignment">128</entry><entry key="proc.ppu.int_registers">32</entry><entry key="proc.ppu.fp_registers">32</entry><entry key="proc.ppu.funit_types">[MEM,INT,FP4,FP8]</entry><entry key="proc.ppu.funit_issues_per_cycle">[1,1,1,1]</entry><entry key="proc.ppu.instruction_size">4</entry><entry key="proc.ppu.addressable_unit">8</entry><entry key="proc.ppu.parameter_passing">[implicit]</entry><entry key="proc.ppu.time_sharing">true</entry><entry key="proc.ppu.proper_mems">[L1]</entry>

<!-- global memory --><entry key="mem.global.size">[64G]</entry><entry key="mem.global.bank_names">[none]</entry><entry key="mem.global.cache_line_size">0</entry><entry key="mem.global.assoc">1</entry><entry key="mem.global.tlb_miss_cost">0</entry><entry key="mem.global.cache_level">-1</entry><entry key="mem.global.speed">1</entry><entry key="mem.global.data">true</entry><entry key="mem.global.instructions">true</entry><entry key="mem.global.options">[]</entry>

Page 10: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

<!-- ppu’s L1 cache --><entry key="mem.L1.cache_line_size">32</entry><entry key="mem.L1.bank_names">[none]</entry><entry key="mem.L1.tlb_miss_cost">0</entry><entry key="mem.L1.cache_level">1</entry><entry key="mem.L1.assoc">4</entry><entry key="mem.L1.size">[16K]</entry><entry key="mem.L1.speed">10</entry><entry key="mem.L1.data">true</entry><entry key="mem.L1.instructions">false</entry><entry key="mem.L1.options">[]</entry>

<!-- ppu’s L2 cache --><entry key="mem.L2.size">[128K]</entry><entry key="mem.L2.bank_names">[none]</entry><entry key="mem.L2.cache_level">2</entry><entry key="mem.L2.cache_line_size">32</entry><entry key="mem.L2.assoc">4</entry><entry key="mem.L2.tlb_miss_cost">0</entry><entry key="mem.L2.speed">5</entry><entry key="mem.L2.data">true</entry><entry key="mem.L2.instructions">false</entry><entry key="mem.L2.options">[]</entry>

The CELL SPUs can be specified as follows. In this example, we assumedouble precision floating point is not fully pipelined. Thus it has a slower issuerate.

<!-- 8 spus --><entry key="proc.spu.geometry">[8]</entry><entry key="proc.spu.SIMD_width">128</entry><entry key="proc.spu.SIMD_alignment">128</entry><entry key="proc.spu.int_registers">128</entry><entry key="proc.spu.fp_registers">128</entry><entry key="proc.spu.funit_types">[MEM,INT,FP4,FP8]</entry>

<!--Assume double precision floating point is not fully pipelined.

--><entry key="proc.spu.funit_issues_per_cycle">[4.0,4.0,4.0,0.25]</entry><entry key="proc.spu.instruction_size">4</entry><entry key="proc.spu.addressable_unit">8</entry><entry key="proc.spu.parameter_passing">[implicit]</entry><entry key="proc.spu.time_sharing">false</entry><entry key="proc.spu.proper_mems">[local]</entry>

<!-- spu’s local memory --><entry key="mem.local.size">[256K]</entry>

Page 11: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

<entry key="mem.local.bank_names">[none]</entry><!-- cache level=-1 means scratchpad --><entry key="mem.local.cache_level">-1</entry><entry key="mem.local.cache_line_size">0</entry><entry key="mem.local.assoc">1</entry><entry key="mem.local.tlb_miss_cost">0</entry><entry key="mem.local.speed">5</entry><entry key="mem.local.data">true</entry><entry key="mem.local.instructions">true</entry><entry key="mem.local.options">[]</entry>

The DMA engine and its capability are specified as follows. Currently, ourruntime system can by-pass many of the Cell DMA hardware limitations; there-fore such constraints are currently not specified in the MMM.

<!-- DMA engine --><entry key="link.dma.strided_overhead">256</entry><entry key="link.dma.strided_bandwidth">1024</entry><entry key="link.dma.indexed_overhead">256</entry><entry key="link.dma.indexed_bandwidth">1024</entry><entry key="link.dma.preferred_size_multiple">16</entry><entry key="link.dma.preferred_alignment">16</entry><entry key="link.dma.has_local_strides">false</entry><entry key="link.dma.asynchronous">true</entry><entry key="link.dma.tag_type">int</entry><entry key="link.dma.tag_memory">local</entry><entry key="link.dma.options">[]</entry>

The abstract processor made of the SPUs, the PPU and their memories is alsodeclared so that other machine models could include this one (as part of a largermachine model).

<!-- Entire cell chip represented as an abstract processor --><entry key="aproc.cell.geometry">[1]</entry><entry key="aproc.cell.included_procs">[spu,ppu]</entry><entry key="aproc.cell.proper_mems">[global]</entry><entry key="aproc.cell.parameter_passing">[explicit]</entry><entry key="aproc.cell.time_sharing">false</entry>

Finally, we define the Cell morph. In this case only one level of mapping isrequired. The topology specifies that in this system the SPU should command thedma engine, while the PPU controls the threading on the SPUs. Note that on theCell it is possible for the PPU to initiate DMA transfers ; however, this capabilityis omitted from this MMM.

<!--Mapping strategy.Only one level of mapping is needed.

Page 12: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

--><entry key="morph.cell.backend">Cell</entry><entry key="morph.cell.host">ppu</entry><entry key="morph.cell.PEs">[spu]</entry><entry key="morph.cell.host_can_synchronize">true</entry><entry key="morph.cell.PEs_can_synchronize">[true]</entry><entry key="morph.cell.topology">[

spu -> local,ppu -> L1,L1 -> L2,L2 -> global,global -> dma,local -> dma,<!-- SPU controls the dma -->spu =>(control) dma,<!-- PPU commands the SPU threads -->ppu =>(thread) spu]</entry>

<entry key="morph.cell.options">[]</entry><entry key="morph.cell.submorphs">[]</entry>

Intel Core2 Duo

The next example shows the MMM for a system consisting with 4 duo-core Intelprocessors. The graphical representation of the MMM is shown in Figure 9.

The XML equivalent of the graph is represented in three files:

<!-- core2duo-mapper.xml --><properties><comment/><entry key="name">Core2Duo</entry><entry key="version">2.1</entry>

<!-- use a core2duo processor --><entry key="include">[core2duo-proc.xml,PC-proc.xml]</entry>

<!-- The machine that contains the multi-core CPU --><entry key="aproc.PC.included_procs">[cpu]</entry>

<!--Mapping strategy: only one level of mapping is needed.We called this mapping strategy "smp".The host processor and the processing elements are the same.

--><entry key="morph.smp.backend">OpenMP</entry><entry key="morph.smp.host">cpu</entry>

Page 13: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

Figure 9: Core2Duo machine model.

Page 14: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

<entry key="morph.smp.root">PC</entry><entry key="morph.smp.PEs">[cpu]</entry><entry key="morph.smp.host_can_synchronize">true</entry><entry key="morph.smp.PEs_can_synchronize">[true]</entry><entry key="morph.smp.submorphs">[]</entry> <!-- no submappings needed --><entry key="morph.smp.topology">[

cpu -> L1,cpu -> I,L1 -> TLB,TLB ->2-1 L2, <!-- Two cores share the L2 cache -->I -> 2-1 L2, <!-- Two cores share the L2 cache -->L2 -> many-1 global,PC -> global]</entry>

<entry key="morph.smp.options">[smp]</entry></properties>

<!-- core2duo-proc.xml --><properties><comment/><entry key="name">Core2Duo</entry><entry key="version">2.1</entry>

<!-- CPUs --><entry key="proc.cpu.geometry">[8]</entry><entry key="proc.cpu.int_registers">32</entry> <!-- Intel hack --><entry key="proc.cpu.fp_registers">32</entry> <!-- Intel hack --><entry key="proc.cpu.instruction_size">4</entry><entry key="proc.cpu.SIMD_width">128</entry><entry key="proc.cpu.SIMD_alignment">128</entry><entry key="proc.cpu.funit_types">[MEM, INT, FP4, FP8]</entry><entry key="proc.cpu.funit_issues_per_cycle">[2, 2, 2, 2]</entry><entry key="proc.cpu.addressable_unit">8</entry><entry key="proc.cpu.parameter_passing">[implicit]</entry><entry key="proc.cpu.time_sharing">true</entry><entry key="proc.cpu.proper_mems">[L1]</entry>

<!-- L1 cache --><entry key="mem.L1.cache_level">1</entry><entry key="mem.L1.cache_line_size">32</entry><entry key="mem.L1.tlb_miss_cost">0</entry><entry key="mem.L1.size">[16K]</entry><entry key="mem.L1.assoc">8</entry><entry key="mem.L1.bank_names">[none]</entry><entry key="mem.L1.speed">5</entry><entry key="mem.L1.data">true</entry><entry key="mem.L1.instructions">false</entry>

Page 15: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

<entry key="mem.L1.options">[]</entry>

<!-- L2 cache for both data and instructions --><entry key="mem.L2.cache_level">2</entry><entry key="mem.L2.size">[6M]</entry> <!-- 4 x 6M L2 cache --><entry key="mem.L2.bank_names">[none]</entry><entry key="mem.L2.cache_line_size">64</entry><entry key="mem.L2.assoc">24</entry><entry key="mem.L2.tlb_miss_cost">192</entry><entry key="mem.L2.data">true</entry><entry key="mem.L2.speed">10</entry><entry key="mem.L2.instructions">true</entry><entry key="mem.L2.options">[]</entry>

<!-- I-cache --><entry key="mem.I.cache_level">1</entry><entry key="mem.I.cache_line_size">32</entry><entry key="mem.I.tlb_miss_cost">0</entry><entry key="mem.I.size">[16K]</entry> <!-- 16K --><entry key="mem.I.assoc">8</entry><entry key="mem.I.bank_names">[none]</entry><entry key="mem.I.speed">10</entry><entry key="mem.I.data">false</entry><entry key="mem.I.instructions">true</entry><entry key="mem.I.options">[]</entry>

<!-- TLB, configured with 4KB pages --><entry key="mem.TLB.cache_level">3</entry><entry key="mem.TLB.cache_line_size">4096</entry><entry key="mem.TLB.tlb_miss_cost">0</entry><entry key="mem.TLB.size">[1M]</entry> <!-- 16K --><entry key="mem.TLB.assoc">4</entry><entry key="mem.TLB.bank_names">[none]</entry><entry key="mem.TLB.speed">10</entry><entry key="mem.TLB.data">false</entry><entry key="mem.TLB.instructions">true</entry><entry key="mem.TLB.options">[]</entry></properties>

<!-- PC-proc.xml --><properties><comment/><entry key="name">Generic PC box</entry><entry key="version">2.1</entry>

<!-- The abstract processor -->

Page 16: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

<entry key="aproc.PC.geometry">[1]</entry><!-- this field is usually what you want to change when including this file --><entry key="aproc.PC.included_procs">[cpu]</entry><entry key="aproc.PC.proper_mems">[global]</entry><entry key="aproc.PC.parameter_passing">[explicit]</entry><entry key="aproc.PC.time_sharing">true</entry>

<!-- Global memory --><entry key="mem.global.size">[4G]</entry> <!-- 4GB --><entry key="mem.global.bank_names">[none]</entry><entry key="mem.global.cache_level">-1</entry><entry key="mem.global.cache_line_size">0</entry><entry key="mem.global.assoc">1</entry><entry key="mem.global.tlb_miss_cost">0</entry><entry key="mem.global.speed">1</entry><entry key="mem.global.data">true</entry><entry key="mem.global.instructions">true</entry><entry key="mem.global.options">[]</entry></properties>

FPGA

The final example shows a system with a Intel x86 processor attached with a singleFPGA device (see Figure 10.) In this system, the cpu is responsible for bothcontrolling the DMA transfers between the host memory and FPGA memories,and threading control of the FPGA devices.

The XML equivalent of the graph is:

<entry key="name">Dummy FPGA</entry><entry key="version">2.0</entry>

<!-- The FPGA computation part --><entry key="proc.fpga.geometry">[1]</entry>

<!-- These are actually not quite application on the fpga.But we are faking it right now

--><entry key="proc.fpga.int_registers">256</entry><entry key="proc.fpga.fp_registers">256</entry><entry key="proc.fpga.instruction_size">4</entry><entry key="proc.fpga.SIMD_width">128</entry><entry key="proc.fpga.SIMD_alignment">128</entry><entry key="proc.fpga.funit_types">[MEM, INT, FP4, FP8]</entry><entry key="proc.fpga.funit_issues_per_cycle">[1, 1, 1, 1]</entry>

<!-- The memory on board of the fpga --><entry key="mem.fpga_mem.size">[128K,128K]</entry><entry key="mem.fpga_mem.bank_names">[ram0,ram1]</entry>

Page 17: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

Figure 10: FPGA machine model.

Page 18: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

<entry key="mem.fpga_mem.cache_level">-1</entry><entry key="mem.fpga_mem.cache_line_size">0</entry><entry key="mem.fpga_mem.tlb_miss_cost">0</entry><entry key="mem.fpga_mem.data">true</entry><entry key="mem.fpga_mem.instructions">false</entry>

<!-- The host processor --><entry key="proc.cpu.geometry">[1]</entry><entry key="proc.cpu.int_registers">32</entry><entry key="proc.cpu.fp_registers">32</entry><entry key="proc.cpu.instruction_size">4</entry><entry key="proc.cpu.SIMD_width">128</entry><entry key="proc.cpu.SIMD_alignment">128</entry><entry key="proc.cpu.funit_types">[MEM, INT, FP4, FP8]</entry><entry key="proc.cpu.funit_issues_per_cycle">[2, 2, 2, 2]</entry>

<!-- Global memory --><entry key="mem.global_mem.size">[4G]</entry> <!-- 4GB --><entry key="mem.global_mem.bank_names">[none]</entry><entry key="mem.global_mem.cache_level">-1</entry><entry key="mem.global_mem.cache_line_size">0</entry><entry key="mem.global_mem.tlb_miss_cost">0</entry><entry key="mem.global_mem.data">true</entry><entry key="mem.global_mem.instructions">true</entry>

<!-- L1 cache --><entry key="mem.L1.cache_level">1</entry><entry key="mem.L1.cache_line_size">32</entry><entry key="mem.L1.tlb_miss_cost">0</entry><entry key="mem.L1.size">[16K]</entry><entry key="mem.L1.bank_names">[none]</entry><entry key="mem.L1.data">true</entry><entry key="mem.L1.instructions">false</entry>

<!-- L2 cache for both data and instructions --><entry key="mem.L2.cache_level">2</entry><entry key="mem.L2.size">[24M]</entry> <!-- 4 x 6M L2 cache --><entry key="mem.L2.bank_names">[none]</entry><entry key="mem.L2.cache_line_size">32</entry><entry key="mem.L2.tlb_miss_cost">0</entry><entry key="mem.L2.data">true</entry><entry key="mem.L2.instructions">true</entry>

<!-- I-cache --><entry key="mem.I.cache_level">1</entry><entry key="mem.I.cache_line_size">32</entry>

Page 19: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

<entry key="mem.I.tlb_miss_cost">0</entry><entry key="mem.I.size">[16K]</entry> <!-- 16K --><entry key="mem.I.bank_names">[none]</entry><entry key="mem.I.data">false</entry><entry key="mem.I.instructions">true</entry>

<!-- DMA engine --><entry key="link.dma.strided_overhead">256</entry><entry key="link.dma.strided_bandwidth">1024</entry><entry key="link.dma.indexed_overhead">256</entry><entry key="link.dma.indexed_bandwidth">1024</entry><entry key="link.dma.has_local_strides">false</entry>

<!--Mapping strategy.

--><entry key="morph.fpga.backend">MPA</entry><entry key="morph.fpga.host">cpu</entry><entry key="morph.fpga.PEs">[fpga]</entry><entry key="morph.fpga.host_can_synchronize">true</entry><entry key="morph.fpga.PEs_can_synchronize">[false]</entry><entry key="morph.fpga.options">[FPGA]</entry><entry key="morph.fpga.topology">

[cpu -> L1,cpu -> I,L1 -> L2,I -> L2,L2 -> global_mem,fpga -> fpga_mem,global_mem -> dma,fpga_mem -> dma,<!-- cpu controls the dma engine -->cpu =>(control) dma,<!-- cpu controls the fpga -->cpu =>(thread) fpga]</entry>

<entry key="morph.fpga.submorphs">[]</entry>

Java interface The machine model’s Java interface includes a reader for these XML files, classesfor each entity and tools for walking and inspecting the machine model.

References[For03] Message Passing Interface Forum. MPI-2: Extensions to the message-

passing interface. Technical report, November 2003.

Page 20: Mapper Machine Model for the R-Stream Compiler · Arrays in written form are quoted by [ and ] and delimited by commas. Figures4through7show the supported attributes for machine model

[GHF+05] M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, andT. Yamazaki. A novel SIMD architecture for the CELL heteroge-neous chip-multiprocessor. In Hot Chips 17: A Symposium on HighPerformance Chips, August 2005.

[GHF+06] M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, andT. Yamazaki. Synergistic processing in CELL’s multicore architec-ture. IEEE Micro, pages 10–24, March 2006.

[LLM+08] R. Lethin, A. Leung, B. Meister, P. Szilagyi, N. Vasilache,and D. Wohlford. Final report on the R-Stream 3.0 compilerDARPA/AFRL Contract # F03602-03-C-0033, DTIC AFRL-RI-RS-TR-2008-160. Technical report, Reservoir Labs, Inc., May 2008.

[MD05] Dominic Mallison and Mark Deloura. CELL:A new platform for digital entertainmenthttp://research.scea.com/research/html/cellgdc05/index.html,2005.