MESH User’s Manualcosim/mesh-tutorial.pdf · 2007. 8. 17. · 1.3 Installing the MESH Viewer 5...

MESH

User’s Manual

The MESH GroupCarnegie Mellon University

Pittsburgh, PA 15213USA

http://www.ece.cmu.edu/˜mesh

September 21, 2006

c© 2003-2006 Carnegie Mellon University.

Contents

1 Installing MESH 4

1.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Installing MESH . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Installing From Source . . . . . . . . . . . . . . . . . . 4

1.2.2 Installing a Binary Distribution . . . . . . . . . . . . . 4

1.3 Installing the MESH Viewer . . . . . . . . . . . . . . . . . . . 5

2 Tutorial 6

2.1 Application: Matrix Multiplication . . . . . . . . . . . . . . . 7

2.2 Creating an Initial MESH Model . . . . . . . . . . . . . . . . 7

2.2.1 Basic Building Blocks . . . . . . . . . . . . . . . . . . 7

2.2.2 Annotations for Software Timing: Consume Calls . . . 9

2.2.3 Running the Simulation and Viewing the Results . . . 11

2.3 Design Exploration: Changing the Model . . . . . . . . . . . 13

2.3.1 Adding an Extra Processor . . . . . . . . . . . . . . . 13

2.3.2 Simulating Architecture Heterogeneity . . . . . . . . . 16

2.3.3 Creating Multiple Schedulers . . . . . . . . . . . . . . 19

2.3.4 Simulating Polling Behavior . . . . . . . . . . . . . . . 20

2.3.5 Custom Scheduling Strategies . . . . . . . . . . . . . . 22

2.4 Interrupt Modeling and Pre-emptive Scheduling . . . . . . . . 25

2.4.1 Interrupt Modeling . . . . . . . . . . . . . . . . . . . . 26

2.4.2 Pre-emptive Scheduling . . . . . . . . . . . . . . . . . 28

2.4.3 Lightweight Consume Calls . . . . . . . . . . . . . . . 34

2.4.4 Custom Interrupt Controller and DMA Example . . . 36

2.5 Shared Resource Modeling . . . . . . . . . . . . . . . . . . . . 43

2.5.1 Modeling Communication in MESH . . . . . . . . . . 43

2.5.2 Simple Bus-Based Example . . . . . . . . . . . . . . . 45

2.5.3 Changing Blocking Modes . . . . . . . . . . . . . . . . 50

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2

Contents 3

3 The MESH Viewer 54

3.1 Using the MESH Viewer . . . . . . . . . . . . . . . . . . . . . 54

3.2 Viewer Features . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.3 Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 API Reference 57

4.1 mesh kernel.h . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2 mesh syscalls.h . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.3 mesh comm.h . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.4 mesh testbench.h . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.5 mesh utils.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.6 mesh def interrupts.h . . . . . . . . . . . . . . . . . . . . . . . 92

4.7 mesh def resources.h . . . . . . . . . . . . . . . . . . . . . . . 93

4.8 mesh def schedulers.h . . . . . . . . . . . . . . . . . . . . . . 94

4.9 mesh trace.h . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.10 mesh energy.h . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.11 mesh def energy resources.h . . . . . . . . . . . . . . . . . . . 103

4.12 mesh def energy schedulers.h . . . . . . . . . . . . . . . . . . 104

1 Installing MESH

1.1 Requirements

The MESH framework relies on glib-1.2 utility library. This is installed bydefault on nearly all Linux distributions, and is available in many UNIXinstallations as well. If your site does not have this library, it can be foundat ftp://ftp.gtk.org/pub/gtk/v1.2/

1.2 Installing MESH

To install MESH, first unpack the distribution tarball:

tar xzvf mesh-version.tar.gz

1.2.1 Installing From Source

Move into the newly created directory and run the configure script withany site-specific options necessary. If you wish to use the mesh_init(1)function call to turn on detailed debug information, you must pass the--enable-verbose=yes flag to the configure script. Configuring with the--enable-verbose flag on will slow down the simulation significantly evenif mesh_init(0) is used. Verbosity will be turned off by default

cd mesh-version ./configure [--enable-verbose=yes]

At this point the configure script will determine the location of the necessarytools and libraries, and create Makefiles to build the framework. Finally, tobuild the framework:

make

The framework and several example applications should be built. Once thisis done, the mesh compilation can be tested by executing a “playground”example binary.

examples/playground/playground

If the program executes with no errors, your MESH compilation is complete.To use this new installation, include the header files in include/ and linkwith the library src/libmesh.a

1.2.2 Installing a Binary Distribution

When installing from a binary distribution, there is no need to run theconfigure script. A Makefile for your platform will still exist, but it willbuild only the examples and the supporting model elements. Instead, pre-built libraries will be provided inside the src directory. The libmesh.a isa default version of the simulator, where the libmesh-debug.a contains aversion configured with the --enable-verbose=yes. You should use the

4

1.3 Installing the MESH Viewer 5

libmesh-debug.a version for debugging and libmesh.a for speed of execu-tion.

To test the binary distribution, you should make the examples as describedin the “Installing From Source” section and try out the “playground” ex-ample.

1.3 Installing the MESH Viewer

The MESH distribution includes a stand alone portable simulation traceviewer written in Java. The MESH Viewer requires a Sun Java VM 1.4 torun. The ./configure script looks for the appropriate version of JVM andwarns if this requirement is not met.

The MESH Viewer is run using the viewer script located inside the viewerdirectory in the MESH distribution. To make the MESH Viewer runnablefrom anywhere, copy the viewer script to any directory in your PATH andset the MESH_VIEWER environment variable to the viewer directory.

using the bash shell:export MESH_VIEWER=[mypath]/mesh-version/viewer

using the csh shell:setenv MESH_VIEWER [mypath]/mesh-version/viewer

Note: mypath is the full path to the directory where the MESH distributionwas unpacked/installed.

For more information on the MESH Viewer, see section 3.

2 Tutorial

The MESH (Modeling Environment for Software and Hardware) Simula-tor is a tool for high level (above instruction set simulation) modeling andperformance estimation of heterogeneous Systems on Chips (SoCs). TheMESH modeling environment grew out of concepts of Frequency Interleav-ing, a previous research effort of this group. The goal of the MESH modelingenvironment is to facilitate design space exploration by helping the designeranswer common early design questions such as: “Do I have enough pro-cessing resources for the application at hand?”, “What kind and how manyprocessors should I put in my system?”, “How does the choice of intercon-nect affect the performance of the system?”

The MESH simulator is a compiled simulator written in C, i.e. simulatedarchitecture description, the application, and the simulator kernel itself areall compiled into a single executable (Figure 2.1). As such, the applicationand the architecture to be simulated are specified by the designer within a.c file with the help of several MESH API function calls. It should be notedthat the application specification can include source code of the application,but does not necessarily have to. At compile time, the application andarchitecture specifications are linked with the MESH simulation kernel andthe models necessary for execution.

ArchitectureSpecification

.c file

ApplicationSpecification

.c file

MESH SimulationKernel

pre-compiledlibraries

Resource andScheduler Models

pre-compiledlibraries

Compile and Link Executable Simulation

Figure 2.1: MESH Simulator tool flow.

This tutorial will use an example of a simple matrix multiplication parallelapplication and show how to build a MESH model that models the per-formance of this application on a heterogeneous multiprocessor. After thecreation of the initial model, the tutorial will show how a MESH model canbe manipulated in order to answer key questions about the design. However,this tutorial will not show how to obtain computation consumption valuesfor the model, nor suggest a design methodology. Keep in mind that thematrix multiplication was chosen because of its widespread familiarity anddoes not present a typical application or set of applications that MESH wasdesigned to simulate. We would like to show you more meaningful examplessuch as speech recognition or MPEG encoding, however the limited scope

6

2.1 Application: Matrix Multiplication 7

of this tutorial does not allow for a detailed and complicated applicationspecification.

2.1 Application: Matrix Multiplication

The matrix multiplication is a good tutorial example because it is well knownand easily parallelizable. It is not, however, a perfect example, since it is toofinely grained to represent real-world MESH models. Early in this tutorialwe will consider a simple multiplication of a 4×4 matrix with a 1×4 matrix.

a(1, 1) a(1, 2) a(1, 3) a(1, 4)a(2, 1) a(2, 2) a(2, 3) a(2, 4)a(3, 1) a(3, 2) a(3, 3) a(3, 4)a(4, 1) a(4, 2) a(4, 3) a(4, 4)

×

b(1, 1)b(2, 1)b(3, 1)b(4, 1)

=

a(1, 1)× b(1, 1) + a(1, 2)× b(2, 1) + a(1, 3)× b(3, 1) + a(1, 4)× b(4, 1)a(2, 1)× b(1, 1) + a(2, 2)× b(2, 1) + a(2, 3)× b(3, 1) + a(2, 4)× b(4, 1)a(3, 1)× b(1, 1) + a(3, 2)× b(2, 1) + a(3, 3)× b(3, 1) + a(3, 4)× b(4, 1)a(4, 1)× b(1, 1) + a(4, 2)× b(2, 1) + a(4, 3)× b(3, 1) + a(4, 4)× b(4, 1)

The parallel version of this algorithm consists of a boss-worker model, wherethe boss thread will create a worker thread for each row in the A matrix.Each worker thread is then responsible for calculating the result for its rowand returning the result to the boss.

2.2 Creating an Initial MESH Model

To represent the above application within the MESH modeling environment,we must consider the breakdown of the application and the underlying ar-chitecture onto the layering model employed by MESH. Figure 2.2 showsthe MESH layered view of the matrix multiplication application consistingof a software layer on top of hardware resources with a scheduling layer inbetween. The functionality of the application is modeled through the bossthread and 4 worker threads within the software (or logical thread) layer.These threads are scheduled for execution by a simple round robin scheduler.Finally, all these threads are running on top of a single physical hardwareresource. After building this simple model, we will add additional hardwareresources to the system and observe their impact on performance.

2.2.1 Basic Building Blocks

To start building the model described by Figure 2.2, install the MESHpackage as described in Section 1. Under the examples/tutorial directoryyou will find the matrix_mul1.c file which contains the code used in thistutorial example. You can build the example by typing make in the tutorialdirectory and run it by typing matrix_mul1. Let us start by looking at themain routine within matrix_mul1.c.

63 int main ( )64 {65 me s h f e a t u r e l i s t ∗ c f l ;66

8 2 Tutorial

SoftwareBoss Worker 1 Worker 4...

Round RobinScheduler

ExecutionResource

Schedulers

HardwareResources

Figure 2.2: Layered view of the parallelized matrix multiplication applica-tion.

67 mesh in i t ( 0 ) ;6869 c f l=me s h c r e a t e f e a t u r e l i s t ( ) ;70 c f l=mesh feature add ( c f l , ”ADD” , 1 ) ;71 c f l=mesh feature add ( c f l , ”MUL” , 0 . 5 ) ;7273 de f au l t s ch ed=mesh c r ea t e s chedu l e r ( ” d e f au l t s ch ed ” ,74 mesh schedu l e r r r ) ;7576 mesh create thread ( ” boss ” , de f au l t s ched , boss , NULL) ;7778 mesh c r ea t e r e sou r c e ( ” r e source1 ” , c f l ,79 de fau l t s ched , mesh re source de fau l t ,80 1 ) ;8182 me sh t r a c e i n i t ( 0 , 1 000 ) ;8384 mesh kerne l ( ) ;8586 mesh t ra c e p r in t ( ”out” ) ;8788 mesh cleanup ( ) ;8990 return ( 0 ) ;91 }

Before any MESH constructs are used, mesh.h must be included and it isnecessary to make a call to mesh_init() routine (line 67) which initializesmesh init()

details on pg. 69 the simulator data structures and sets several defaults. The mesh_init()routine takes in a verbosity argument in the form of an integer. Set verbosityto 1 to turn on debug information printing during the simulation. Printingdebugging output significantly decreases the speed of simulation. Note thatyou must turn on the debugging output support during the installationof MESH (see the installation section for more detail). We will skip thedescription of lines 69-71 for now.

The MESH scheduler layer allocates the software tasks to execute onto hard-ware resources. It also serves as the layer that ties the software and hardwaremesh create scheduler()

details on pg. 64 threads together. In line 73, a new scheduler named “default scheduler” iscreated, which is of type mesh_scheduler_rr, and implements a round-robin scheduling of software threads onto hardware.

Next, in line 76, we create the boss software thread. When any softwaremesh create thread()details on pg. 66 thread is created, it is necessary to tie it to a scheduler which will allow for

the software thread to be placed onto a hardware resource and executed.In this case, we tie the boss thread to the only scheduler available, the

2.2 Creating an Initial MESH Model 9

default scheduler. Since the boss thread will assess the size of the inputmatrices and spawn off worker threads, it is not necessary to create anyother threads at this time.

The next step in creating a simple MESH model is to create a model of anexecution resource (i.e. processor) on which our software threads will run(line 78). In this case, we will name our resource “resource1” and tie it tomesh create resource()

details on pg. 64 the default scheduler as well. mesh_resource_default is a function pointerto a resource handler function that resolves software thread computationcomplexity into physical time. Further discussion of behavior and creationof resource handler functions is beyond the scope of this tutorial. The lastfield of the mesh_create_resource represents the default computationalpower of the resource. We’ll talk more of computational powers and howthey are consumed in the next section.

We will skip the description of lines 82 and 86, leaving them for later in thistutorial. At this point, we showed how to place the basic building blocks ofthe model. The subsequent call to mesh_kernel starts the simulation kerneland runs the simulation. The mesh_cleanup function frees memory spaceused during the simulation.

2.2.2 Annotations for Software Timing: Consume Calls

One of the key features of high-level concurrent modeling of hardware andsoftware is properly estimating the timing of software running on varioushardware resources. We introduce the concept of a “consume call” whichis a designer inserted annotation stating the amount of computation con-sumed by a software thread since the last consume call. The designer caninsert consume calls at any point in the software code, effectively separatingthe program execution into regions and specifying the complexity of eachregion. These annotation regions are each considered atomically by theMESH kernel during simulation. It is important to realize that the only in-formation about software threads passed through the simulator is containedwithin consume calls. Therefore, it is possible to build MESH models inthe absence of full source code; instead, the software thread functionality isrepresented by consume calls with no code in between.

Let us look at the boss thread implementation within matrix_mul1.c.26 void ∗ boss ( void ∗ arg )27 {28 int i ;29 void ∗ r e t ;30 char name [ 2 5 5 ] ;31 int answer [MAXROWS] ;32 mesh thread ∗ workers [MAXROWS] ;3334 //spawn of f a worker thread for each row35 for ( i =0; i<MAXROWS; i ++) {36 s p r i n t f (name , ”worker%d” , i ) ;37 workers [ i ] = mesh create thread (name ,38 de fau l t s ched ,39 worker ,40 ( void ∗ ) i ) ;41 //overhead of spawning a thread42 mesh consume str ( ”5” ) ;43 }4445 //wait for results46 for ( i =0; i<MAXROWS; i ++) {47 mesh thread jo in ( workers [ i ] , & r e t ) ;48 answer [ i ] = ( int ) r e t ;

10 2 Tutorial

49 }5051 //print results52 for ( i =0; i<MAXROWS; i++)53 p r i n t f ( ”row %d: %d\n” , i , answer [ i ] ) ;5455 //overhead of joining threads and56 //output of results57 mesh consume str ( ”10” ) ;5859 return NULL;60 }

The first thing to note here is that every software thread must be encapsu-lated in a function taking a void * as an argument and returning a void *.In lines 36 through 40, the boss thread will create a name for each workerthread and spawn off a new one for each row of matrix A. Each workerthread is passed an integer with the row number it should be working on.

Once all the worker threads are created, the designer inserts a mesh_consume_strcall that specifies the overhead of all computation executed up to this point.In this case, the consume call specifies that spawning off a single thread con-mesh consume str()

details on pg. 74 sumes 5 default computational units. Notice that when creating resource1on line 80 we specified the default resource computational power to be 1.This means that resource1 can compute 1 computational unit per simula-tion cycle; i.e. the mesh_consume_str on line 42 will tie up resource1 for5 simulation cycles per thread spawned. Since no units are associated withcomputational power or complexity, the designer is free to select any level ofgranularity appropriate to the application. The relation between computa-tional complexity included within consume calls and computational powerof the execution resource is usually determined via profiling or designer in-tuition.

After starting the worker threads, the boss thread waits for the threadsto complete using the mesh_thread_join function (line 47). After all themesh thread join()

details on pg. 82 results are collected from the worker threads, the application outputs thefinal result and consumes a final value of 10.

However, representing computational complexity of software as a singlescalar value is not enough to show different types of computation at whichsome processors might be better than others. In this example, the code forworker threads uses these “multidimensional” or “multi-feature” consumecalls.

12 void ∗ worker ( void ∗ arg ) {13 int i ;14 int row = ( int ) arg ;15 int r e s u l t =0;1617 //perform matrix multiplication for this row18 for ( i =0; i<MAX COLS; i ++) {19 r e s u l t += matrixA [ i ] [ row ] ∗ matrixB [ i ] ;20 mesh consume str ( ”ADD=1:MUL=1” ) ;21 }2223 return ( void ∗ ) r e s u l t ;24 }

The consume call on line 20 uses 2 features, consuming one addition and onemultiplication. Unlike the consume calls in the boss thread, it is easy to seehow these consume call values can be set through designer intuition sincethe code on line 19 performs exactly one multiplication and one addition.

2.2 Creating an Initial MESH Model 11

Even though the worker threads are first to use multi-feature consume calls,they do not show how how to specify which features exist in the system, orhow resources differentiate them. For that information, it is necessary to goback to the code we skipped within the main subroutine.

69 c f l=me s h c r e a t e f e a t u r e l i s t ( ) ;70 c f l=mesh feature add ( c f l , ”ADD” , 1 ) ;71 c f l=mesh feature add ( c f l , ”MUL” , 0 . 5 ) ;

On line 69 a new feature list is created. This will create a blank feature listmesh create feature list()details on pg. 63 to which individual features are added. The mesh_feature_add() takes

three arguments: the already created feature list, the name of feature tomesh feature add()details on pg. 68 add, and the power of the feature. The feature power value describes the

amount of computational complexity that the processor can handle duringone unit of simulation time. In this case, we are creating a processor thatcan perform one addition per cycle and one multiplication every two cycles.On line 78 this feature list is included in the resource creation function.

Therefore, each processor may have their own feature list that specifies thecomputational power of the processor on a per-feature list. The “defaultfeature”, such as the one passed through consume calls in the boss thread,is meant for simple definitions of computational complexity not requiringthe specification of new feature lists. The designer is free to use either thedefault feature, the custom feature list, or both in conjunction, as a methodto specify computational complexity of software and its timing.

2.2.3 Running the Simulation and Viewing the Results

Now that the basic MESH model is complete, we will run our first simulationin order to understand how to view the results of the simulation.

MESH Simulation Kernel - compiled on Dec 2 2003 19:08:30

row 0: 0row 1: 0row 2: 0row 3: 0

MESH Final Simulation Time = 78.000000

Resource usage:resource1: 78.000000Thread usage:worker3: 12.000000, contended 0.000000boss: 30.000000, contended 0.000000worker0: 12.000000, contended 0.000000worker1: 12.000000, contended 0.000000worker2: 12.000000, contended 0.000000

The output between “MESH Simulation Kernel” and “MESH Final Simu-lation Time” lines is the output printed by the boss thread. In this case,we see that the answers are all 0 since we did not provide any input valuesto the application. When we deal with construction of testbenches later inthis tutorial we will feed this example with data. Below the answer, is thetotal simulation time, as well as utilization times of every resource in the

12 2 Tutorial

system. In this case, we have only one resource, so resource utilization isequal to total simulation time. Below the resource(s) are the runtimes ofevery software thread within the system. Since we have only one processor,every thread is executed sequentially. Therefore the total simulation timeis the sum of all thread runtimes. The contended field signifies the amountof time the thread spent waiting to access a shared resource. We will ignorethe contended field until a later section on shared resource modeling.

In order to get a better picture of what is going on within the simulation wewill use the MESH simulator’s ability to export simulation traces and viewthem using a Java-based MESH Viewer. Going back to the main routine,we skipped the description of several lines of code.

82 me sh t r a c e i n i t ( 0 , 1 000 ) ;8384 mesh kerne l ( ) ;8586 mesh t ra c e p r in t ( ”out” ) ;

The mesh_trace_init function must be called before the simulation kernelis run. Since traces for large simulations can be extremely large, the ar-mesh trace init()

details on pg. 96 guments of mesh_trace_init specify the start and stop time at which thesimulation trace will be collected.

After the simulation trace is collected, the mesh_trace_print will create atext file readable by the Java-based MESH Viewer. The only argument tomesh trace print()

details on pg. 96 mesh_trace_print specifies the name of the output file. To view the traceoutput file, follow the installation instructions for the Java MESH viewerfound in Section 1. Provided that you set your paths and environment vari-ables as described in the installation instructions, the command viewer outshould show a graph similar to the one in Figure 2.3.

Figure 2.3: Output of the simulation shown in the MESH Viewer.

The Figure 2.3 shows the simulation output representing the interleavingof various threads on resource1 as well as their physical timings. In theprevious section we saw that the boss thread consumed 5 default compu-tational complexity points for spawning off each worker thread. Since thedefault computational power for resource1 is 1, the physical timing of thissection of code resolved to 5 simulation time units. Therefore, the first redsection of the boss thread in Figure 2.3 is drawn between times 0 and 5. Thesimulation timing is labeled via the yellow timeline ruler above the trace.

2.3 Design Exploration: Changing the Model 13

Note that the scale slider is set to 10−1, therefore all ruler values should bedivided by 10. At each consume call, the scheduler is given an opportunityto run and make a scheduling decision. At t = 5, the default sched hastwo threads to choose from: the boss thread and the newly created worker0thread. Since the default sched is a round-robin scheduler and thread bossjust ran, the scheduler will schedule worker0 to run on resource1.

The thread worker0 runs until its first consume call is reached, consum-ing a single MUL processor feature and a single ADD processor feature.The resource1 power for MUL and ADD features is 0.5 and 1 respectively.Therefore, it will take 2 simulation cycles to consume a MUL and a singlesimulation cycle to consume an ADD, setting the total runtime for this an-notation region to 3 simulation cycles. In Figure 2.3 this is drawn via theblue worker0 block between t = 5 and t = 8. The scheduler will run againat t = 8, alternately scheduling the boss thread again.

The second execution of the boss thread creates the worker1 thread whichis considered, and scheduled, at the next scheduling decision (t = 13). Thesimulation continues, interleaving the execution of all threads until t = 38,at which point the boss thread has created all of the worker threads and liesdormant until all of the worker threads complete. This happens at t = 68,when the boss thread is woken up and simulation completes 10 cycles later.

2.3 Design Exploration: Changing the Model

Now that we understand the basics of model creation, we will exercise themost important and powerful feature of the MESH modeling environment:efficient high-level design exploration.

2.3.1 Adding an Extra Processor

One of the questions commonly asked early in the design cycle is “Howdoes the addition of extra hardware resources affect the performance ofthe system as a whole?” We will modify our example from the previoussection, duplicating the resource1 processor by creating resource2 (Figure2.4). For now, we will assume that the processors are communicating viashared memory and that communication is latency and contention free.Later in the tutorial we will show how to create MESH models for variousinterconnect methods.


Round RobinScheduler

Resource 1

Schedulers

HardwareResources

Resource 2

Figure 2.4: Adding an additional hardware resource.

14 2 Tutorial

Since both hardware resources still receive their software threads from thesame scheduler, we don’t need to make any changes to the software or sched-uler layers. The only action required is to duplicate the resource creationcode. We have done so in matrix_mul2.c in the examples/tutorial di-rectory:

78 mesh c r ea t e r e sou r c e ( ” r e source1 ” , c f l ,79 de fau l t s ched , mesh re source de fau l t ,80 1 ) ;8182 mesh c r ea t e r e sou r c e ( ” r e source2 ” , c f l ,83 de fau l t s ched , mesh re source de fau l t ,84 1 ) ;

Let us compile matrix_mul2.c and run the simulation to determine theimpact of the second resource on the overall system runtime.

MESH Simulation Kernel - compiled on Dec 2 2003 19:08:30

row 0: 0row 1: 0row 2: 0row 3: 0


Resource usage:resource1: 32.000000resource2: 46.000000Thread usage:worker3: 12.000000, contended 0.000000boss: 30.000000, contended 0.000000worker0: 12.000000, contended 0.000000worker1: 12.000000, contended 0.000000worker2: 12.000000, contended 0.000000

Figure 2.5: Output of the simulation with an extra homogeneous processor.(matrix mul2.c)


As can be seen from the simulation output above, the thread usage has notchanged, but since there are now two resources in the system, the resourceusage is dramatically different. Even though we did not simulate the com-munication overhead between the two processors, the overall system runtimewas not cut in half (single processor runtime = 78.0). Additionally, one ofthe processors is more utilized than the other (resource1 runs for 32.0 cyclesand the resource2 runs for 46.0 cycles). What exactly is going on? It is atthis time that the trace output in Figure 2.5 becomes especially useful.

Immediately at t = 0 we notice that something is not right. At t = 0 onlythe boss thread should be eligible to run. Instead, worker0 gets to run onresource1. Remember, the boss thread is supposed to create all the workerthreads at the delay of 5 time units per thread. So how can the worker0thread be scheduled for execution when the boss thread that creates it hasnot finished yet? The answer to this question brings up a fundamentalproperty of consume calls: all functionality placed in front of the consumecall executes in zero time before the delay of consumption is applied. Figure2.6 attempts to show this graphically.

Figure 2.6: Graphical representation of consume regions and code execution.

Therefore, even though the consume call is meant to represent the com-putational delay of the functionality, this functionality executes before thedelay is applied to the system. Let us look back at the way the boss threadspawns new workers:

34 //spawn of f a worker thread for each row35 for ( i =0; i<MAXROWS; i ++) {36 s p r i n t f (name , ”worker%d” , i ) ;37 workers [ i ] = mesh create thread (name ,38 de fau l t s ched ,39 worker ,40 ( void ∗ ) i ) ;41 //overhead of spawning a thread42 mesh consume str ( ”5” ) ;43 }

Since the consume call of 5 units (line 42) happens after the mesh_create_thread is called (line 37), the system is aware of the thread creation beforethe 5 units of computation are consumed. Therefore, the worker0 threadis visible to the system at t = 0.0, worker1 is visible at t = 5.0, worker2 isvisible at t = 10.0, etc. However, what if the actual behavior of the systemrequired for new threads to become available at the end of each boss threadrun? In that case, we must consider a new version of a thread creationfunction, mesh_create_thread_delayed. This function moves the threadmesh create thread

delayed()details on pg. 66

creation process after the consume call advances system time, even thoughthe thread creation function is placed before the consume call. Just likethread creation, many other system calls can be moved after the consume

16 2 Tutorial

call advances system time, and all carry the designation _delayed (see theAPI reference for a full list). To implement the above described change, wewill create a new tutorial file matrix_mul3.c and adjust the boss behavior:

34 //spawn of f a worker thread for each row35 for ( i =0; i<MAXROWS; i ++) {3637 s p r i n t f (name , ”worker%d” , i ) ;38 workers [ i ] = mesh crea te thread de layed (name ,39 de fau l t s ched ,40 worker ,41 ( void ∗ ) i ) ;42 mesh consume str ( ”5” ) ;43 }

The output in Figure 2.7 shows the effect of this change on the system.

Figure 2.7: Output of the simulation with thread creation events moved tothe end of the consume block. (matrix mul3.c)

Unlike in Figure 2.5, the worker0 thread is not available for scheduling untilt = 5.0, worker1 not available until t = 10.0, worker2 not available until t =15.0, and so on. Although the introduction of _delayed system calls doesnot change the performance of this system significantly (matrix_mul3.cis three simulation cycles slower), the designer must still be aware of thisbehavior because it may have significant impact on performance dependingon the particular situation.

Now that we have a correct simulation, we can spend some some time an-alyzing the system. The number one reason this code cannot be perfectlydistributed across multiple processors is because the boss thread is sequen-tial. Because of this, the resource1 sits idle at the beginning and at the endof the simulation. Therefore, the performance of the above system is limitedby the parallelism available in the application.

2.3.2 Simulating Architecture Heterogeneity

Until now, we have modeled systems that are either uniprocessor or ho-mogeneous, a task many other simulators can perform. One of the MESH


simulator’s strengths is its ability to easily handle systems with a varyingnumber of heterogeneous processing elements. We extend the system viewfrom matrix_mul3.c to the one in Figure 2.8 by adding two additional re-sources: one a slower (lower power) processor, the other a DSP processorcontaining the multiply-accumulate (MAC) instruction. With these changeswe can get an idea how physical resources with varying computational pow-ers and features affect the system runtime.


Round Robin Scheduler

Resource 1

Schedulers

HardwareResourcesResource 2 Slow

ResourceResourcew/ MAC

Figure 2.8: Creating a heterogeneous system.

Starting with the matrix_mul3.c description we add two additional proces-sors (creating matrix_mul4.c) :

86 mesh c r ea t e r e sou r c e ( ” r e source1 ” , c f l ,87 de fau l t s ched , mesh re source de fau l t ,88 1 ) ;8990 mesh c r ea t e r e sou r c e ( ” r e source2 ” , c f l ,91 de fau l t s ched , mesh re source de fau l t ,92 1 ) ;9394 mesh c r ea t e r e sou r c e ( ” s l ow r e s ou r c e ” , c f l s l ow ,95 de fau l t s ched , mesh re source de fau l t ,96 0 . 5 ) ;9798 mesh c r ea t e r e sou r c e ( ”resource MAC” , c f l mac ,99 de fau l t s ched , mesh re source de fau l t ,

100 1 ) ;

Let us look at slow resource first. We will define slow resource to be identi-cal to resource1 and resource2, except that its operating frequency is scaleddown by half. Therefore, we half all of slow resource’s computational pow-ers, both within the default computational power feature (line 96) andwithin its custom feature list. To do the latter, we define a new featurelist named cfl slow :

75 c f l s l ow=mesh feature add ( c f l s l ow , ”ADD” , 0 . 5 ) ;76 c f l s l ow=mesh feature add ( c f l s l ow , ”MUL” , 0 . 2 5 ) ;

The second heterogeneous resource added is the resource MAC which fea-tures a multiply accumulate (MAC) instruction. The MAC instruction iscommonly found in DSP processors and can multiply two values and addthem into an accumulator, all during one cycle. Other than the MAC fea-ture, resource MAC is identical to resource1 and resource2. We will createa new feature list named cfl mac that defines the MAC instruction on re-source MAC :

18 2 Tutorial

79 c f l mac=mesh feature add ( cf l mac , ”MAC” , 1 ) ;

Additionally, the logical threads must use the new MAC feature duringcomputation. Therefore, we change the worker threads to include the newfeature:

17 //perform matrix multiplication for this row18 for ( i =0; i<MAX COLS; i ++) {19 r e s u l t += matrixA [ i ] [ row ] ∗ matrixB [ i ] ;20 mesh consume str ( ”ADD=1:MUL=1:MAC=1” ) ;21 }

With this change (line 20), the resource MAC will only understand the“MAC=1” part of the consume call, ignoring the ADD and MUL since ithas no entries in the feature list for those. Similarly, all other processors willignore the “MAC=1” part of the consume call since they do not include adefinition for the MAC feature. Therefore, by using multi-dimensional con-sume calls and different feature lists for individual processors, it is possibleto model a wide range of heterogeneous architectures, each with their ownstrengths and weaknesses.

Figure 2.9: Output of the heterogeneous architecture simulation for an 8x8matrix. (matrix mul4.c)

Let us look at the performance impact of this change shown in Figure 2.9.Since the overall compute power of the system has increased significantly, wewill increase the algorithm workload as well, making the size of the A matrix8 × 8 (this change is visible at the top of matrix_mul4.c file). This willappropriately increase the number of worker threads as well as lengtheningthe part of the boss thread that spawns off new threads.

The slower operating frequency of slow resource is visible through the lengthof thread annotation blocks executing on the resource. Where it takes 5simulation time units to complete a boss annotation block on resource1, ittakes 10 simulation time units to complete it on resource slow. On the otherhand, the resource MAC MAC instruction allows it to complete a singleworker loop in only one simulation cycle as opposed to 3 cycles needed by


resource1. The breaks in processor traces show the times the processors areidle because no thread is currently available to be run.

2.3.3 Creating Multiple Schedulers

As can be seen in Figure 2.9, the task distribution onto resources is notoptimal, i.e. by changing the scheduling strategy we can squeeze more per-formance out of this system. Notice that at times during execution, the bossthread is scheduled onto resource MAC. While resource MAC is executingthe boss thread other resources are executing worker threads. Ideally, thesituation should be reversed: resource1 or resource2 should be executingthe boss thread (because they are just as fast at that as resource MAC ),and the resource MAC should execute only the worker threads where itcan take advantage of its MAC unit. Additionally, since the boss threadpresents a bottleneck to parallel execution, it should not be placed onto theslow resource.

To solve this problem, we will identify resource1 as a “control” processorwhose job is to run the boss thread and distribute the worker tasks to otherthreads. We separate our heterogeneous system shown in Figure 2.8 intotwo scheduling domains, each with its own round robin scheduler. This willconceptually simulate a multiprocessor system with two different operatingsystems making scheduling decisions for parts of the architecture. The newlayered view of the model is shown in Figure 2.10.



Resource 1

Schedulers

HardwareResourcesResource 2 Slow

ResourceResourcew/ MAC


Figure 2.10: Heterogeneous system with two schedulers

The matrix_mul5.c file includes the previous example with the additionof an extra scheduler. We name this scheduler control sched (for schedulerrunning on the control processor) in line 81 below:

81 con t r o l s ch ed=mesh c r ea t e s chedu l e r ( ” con t r o l s ch ed ” ,82 mesh schedu l e r r r ) ;8384 de f au l t s ch ed=mesh c r ea t e s chedu l e r ( ” d e f au l t s ch ed ” ,85 mesh schedu l e r r r ) ;8687 mesh create thread ( ” boss ” , cont ro l s ched , boss , NULL) ;8889 mesh c r ea t e r e sou r c e ( ” r e source1 ” , c f l ,90 cont ro l s ched , mesh re source de fau l t ,91 1 ) ;

20 2 Tutorial

Figure 2.11: Performance of the split-scheduled system (matrix mul5.c)

Notice that the boss thread and resource1 are now connected to the con-trol sched scheduler (lines 87 and 90 respectively). All the worker threadsremain on the default sched as in the previous examples.

Figure 2.11 shows the output of the system with two schedulers. The oper-ation of the first round robin scheduler is trivial on resource1 since it onlyhas one task to consider. The other scheduler runs on the remaining threeresources, scheduling tasks on resource MAC, resource2, and slow resource.Because we statically split the scheduling decisions among two schedulers,no single scheduler can see the idle resources of the other, even though itmay have available threads to run. This situation causes a under-utilizationof resource1 during times of t = 40.0 to t = 55.0. It would have been usefulto place some of the worker tasks on resource1 during this time period andfurther decrease the runtime of the system, a solution we will consider in alater section. Still, splitting the schedulers did provide a bit of an increasein system performance, speeding it up by 1 time unit.

2.3.4 Simulating Polling Behavior

Up to this point, the example system assumed that there exists an infrastruc-ture where one thread can signal its completion to the other thread. This as-sumption is used at the end of worker threads where the boss thread can getinformation about thread completion through the mesh_thread_join con-struct. In real systems this is often not the case, with resources polling flagsin memory to determine whether a certain task has completed. For example,polling is commonly used to determine whether a shared resource is avail-able or whether data in input ports is ready to be read. In matrix_mul6.c,we adjust the example from the previous section to implement polling forworker thread completion.

We introduce a done shared variable for each worker thread which will beused to signal thread completion. When the worker thread is done with allof its computation, it will set the done flag, write its final answer, and exit:


15 void ∗ worker ( void ∗ arg ) {16 int i , temp ;17 int row = ( int ) arg ;18 int r e s u l t =0;1920 //perform matrix multiplication for this row21 for ( i =0; i<MAX COLS; i ++) {22 r e s u l t += matrixA [ i ] [ row ] ∗ matrixB [ i ] ;23 //when done , set the f lag and propagate answer24 if ( i==(MAX COLS−1)) {25 //sets done [row ] f lag to 126 temp = 1;27 mesh memcpy delayed(&done [ row ] ,&temp , sizeof ( int ) ) ;2829 //sets the result of the worker thread30 mesh memcpy delayed(&answer [ row ] ,& r e su l t , sizeof ( int ) ) ;31 }32 mesh consume str ( ”ADD=1:MUL=1:MAC=1” ) ;33 }34 return NULL;35 }

Similarly to Section 2.3.1, where the created threads were visible to thescheduler before the computation penalty for their creation is applied, wemust make sure that the value of the done flag is updated after the com-putation penalty of the consume call is applied. In that previous section,thread creation was delayed until the end of the consume call timing blockusing the function mesh_create_thread_delayed. Similarly, in this casewe need to delay the propagation of done and answer values until the con-sume call has been applied. This situation is identical to double buffering intraditional HDLs where the value of a gate must be set after the gate signalpropagation delay.

In lines 27 and 30, the mesh_memcpy_delayed function schedules a copy ofmesh memcpy delayed()details on pg. 77 memory to occur at the end of the next consume call delay. For example,

in line 27, the function will copy the value of temp (which is set to 1) to theappropriate done location, essentially setting the done flag.

Instead of using mesh_thread_join, the boss thread will continuously pollthe done variable at a constant period. The polling period is set by insertinga consume call that acts as a delay between consecutive polls. In this case weset the consume value to 1, making the polling period of 1 simulation timeunit (since default resource power of resource1 is 1). The following lines ofthe boss thread have been changed to implement the polling behavior:

60 //wait for results61 for ( i =0; i<MAXROWS; i ++) {62 while ( done [ i ]==0)63 mesh consume str ( ”1” ) ; //polling period64 }

As seen in Figure 2.12, between t = 40.0 and t = 55.0 the control processor(resource1 ) is polling for the completion of the worker threads. This resultis identical to the one in Figure 2.11 (matrix_mul5.c), except that, betweent = 40.0 and t = 55.0, resource1 is utilized for polling . This result would bemuch more interesting if resource1 wasn’t dedicated only to the boss thread.If other threads shared resource1, it is possible for those other threads tobe running at the time the done flag is set. Therefore, the boss threadwould be unable to act on this signal immediately, making the cumulativeexecution time suffer. Such considerations are important in polling systemsand should be considered during modeling.

22 2 Tutorial

Figure 2.12: Results of the polling implementation (matrix mul6.c).

2.3.5 Custom Scheduling Strategies

However, many systems provide some kind of interrupt capability that allowssignaling between threads in the manner used by matrix\_mul5.c throughmesh_thread_join(). Let us assume such a system again and return to asingle scheduler strategy from Section 2.3.2. Unlike Section 2.3.3, where weattempted to increase system performance by splitting the scheduling of thesystem in two, we will explore creating a single custom scheduler that per-forms similarly. Since resource MAC is much more proficient at executingworker tasks, let us adjust the scheduling strategy in such a way to steerthe boss thread onto resource1 and resource2 allowing the resource MACto exclusively execute worker threads. Also, since the boss threads are thesynchronous bottleneck of our application, we will forbid them to run onslow resource as well.

To accomplish this, we will create a new scheduling algorithm within matrix_mul7.c, naming the new scheduling function my scheduler. The job of ascheduling algorithm is to look at all the idle resources and threads eli-gible to run at the current simulation time, and perform the matching ofthreads to resources. This functionality is defined within the MESH simu-lation environment via a function that takes in a scheduler data structure(of type mesh scheduler) and outputs a list of resource-thread pairs (of typemesh resource thread pair). This is illustrated in Figure 2.13.

All lists are implemented using GLib 1.2 API (http://www.gtk.org/api/).Since the my scheduler is an extension of the basic round-robin schedulerfunctionality, we will describe the round-robin functionality first, and thendeal with the custom additions.

13 // Every scheduler functionality takes in a pointer to i t s e l f and14 // returns a l i s t of resource−thread pairings to be executed .15 GSList ∗ my scheduler ( mesh scheduler ∗ cs )16 {17 //rt pair i s an individual resource−thread pairing18 mesh r e sou r c e th r ead pa i r ∗ r t p a i r ;19 // r t pa i r l i s t i s a l i s t of resource−thread pairings20 GSList ∗ r t p a i r l i s t = NULL;


SchedulingAlgorithm

Scheduler Data:-idle resources-eligible threads

resource

resource

resource

resource

resource

resource

thread

thread

thread

thread

thread

thread

GSList

threadresource

mesh_resource_thread_pair

mesh_scheduler

Figure 2.13: Applying the scheduling algorithm

21 GSList ∗ i d l e r e s , ∗ t o f r e e , ∗ i t e r ;22 mesh resource ∗ r e sou r c e ;23 mesh thread ∗ thread ;2425 //check that valid mesh scheduler pointer i s passed26 g a s s e r t ( cs !=NULL) ;2728 //find idle resources controlled by this scheduler29 i d l e r e s=me sh s c h ed f i n d i d l e r e s o u r c e s ( cs ) ;

83 //match up remaining threads in arbitrary fashion84 while ( ( i d l e r e s !=NULL)&&(cs−>e l i g i b l e t h r e a d s !=NULL) ) {85 r t p a i r = g mal loc ( sizeof ( mesh r e sou r c e th r ead pa i r ) ) ;86 r t p a i r−>thread=cs−>e l i g i b l e t h r e a d s −>data ;87 r t p a i r−>r e sou r c e=i d l e r e s −>data ;8889 //remove the current matched thread and resource90 t o f r e e=cs−>e l i g i b l e t h r e a d s ;91 cs−>e l i g i b l e t h r e a d s=92 g s l i s t r emo v e l i n k ( cs−>e l i g i b l e t h r e a d s , t o f r e e ) ;93 g s l i s t f r e e 1 ( t o f r e e ) ;9495 t o f r e e=i d l e r e s ;96 i d l e r e s=g s l i s t r emo v e l i n k ( i d l e r e s , t o f r e e ) ;97 g s l i s t f r e e 1 ( t o f r e e ) ;9899 //place this resorce−thread pairing onto the l i s t

100 r t p a i r l i s t=g s l i s t p r e p e nd ( r t p a i r l i s t , r t p a i r ) ;101 }102103 //destroy the remaining items on idle resource l i s t104 g s l i s t f r e e ( i d l e r e s ) ;105 return r t p a i r l i s t ;106 }

The mesh_scheduler data structure passed to the scheduler contains all thenecessary information to make a scheduling decisions, such as the lists ofidle resources and eligible threads for scheduling. Using the mesh_sched_find_idle_resources() will return a GSList of resources that are readyto receive tasks (line 29). The break in code between lines 29 and 83 con-mesh sched

find idle resources()details on pg. 94

tains the changes to the round-robin scheduler which will be discussed laterin this section. Starting on line 84, the scheduler will step through eachidle resource and assign an eligible thread to run on it. A GSList of el-igible threads (threads not running and not blocked) is contained undercs->eligible_threads. Once the resource and its task are packaged in thert_pair structure (lines 86 and 87), the resource and the selected task areremoved from the idle_res and cs->eligible_threads lists respectively(lines 90-97). Finally, the rt_pair structure is added to the list returnedto the simulation kernel (line 105). Lines 31-81 extend this round-robin

24 2 Tutorial

behavior:31 //find resource MAC in the idle resource l i s t and assign a thread32 //other than boss33 i d l e r e s=mesh get resource by name ( ”resource MAC” ,34 i d l e r e s ,35 &re sou r c e ) ;36 if ( r e sou r c e != NULL) {37 for ( i t e r=cs−>e l i g i b l e t h r e a d s ; i t e r ;38 i t e r=g s l i s t n e x t ( i t e r ) ) {39 thread=i t e r−>data ;40 //check i f not boss41 if ( strcmp ( thread−>name , ” boss ” ) ){42 r t p a i r = g mal loc ( sizeof ( mesh r e sou r c e th r ead pa i r ) ) ;43 r t p a i r−>thread=thread ;44 r t p a i r−>r e sou r c e=re sou r c e ;4546 //place this resorce−thread pairing onto the l i s t47 r t p a i r l i s t=g s l i s t p r e p e nd ( r t p a i r l i s t , r t p a i r ) ;4849 //remove the thread from the e l i g ib l e l i s t50 cs−>e l i g i b l e t h r e a d s=51 g s l i s t r emo v e l i n k ( cs−>e l i g i b l e t h r e a d s , i t e r ) ;52 break ;53 }54 }55 }5657 //find slow resource in the idle resource l i s t and assign a thread58 //other than boss59 i d l e r e s=mesh get resource by name ( ” s l ow r e s ou r c e ” ,60 i d l e r e s ,61 &re sou r c e ) ;62 if ( r e sou r c e != NULL) {63 for ( i t e r=cs−>e l i g i b l e t h r e a d s ; i t e r ;64 i t e r=g s l i s t n e x t ( i t e r ) ) {65 thread=i t e r−>data ;66 //check i f not boss67 if ( strcmp ( thread−>name , ” boss ” ) ){68 r t p a i r = g mal loc ( sizeof ( mesh r e sou r c e th r ead pa i r ) ) ;69 r t p a i r−>thread=thread ;70 r t p a i r−>r e sou r c e=re sou r c e ;7172 //place this resorce−thread pairing onto the l i s t73 r t p a i r l i s t=g s l i s t p r e p e nd ( r t p a i r l i s t , r t p a i r ) ;7475 //remove the thread from the e l i g ib l e l i s t76 cs−>e l i g i b l e t h r e a d s=77 g s l i s t r emo v e l i n k ( cs−>e l i g i b l e t h r e a d s , i t e r ) ;78 break ;79 }80 }81 }

The round-robin behavior is modified in such a way to allow the resource MACand slow resource to get assigned to threads other than boss thread beforethe round-robin scheduling occurs. In line 33, the resource MAC is re-moved from the idle list. If it is not present in the idle list (i.e. resourcenot idle), mesh_get_resource_by_name will place NULL in the resourcepointer. If resource MAC is idle, pointer to it will be placed in resource bymesh get resource

by name()details on pg. 94

the mesh_get_resource_by_name function. The code then steps througheach eligible thread in the eligible thread list, matching resource MAC withthe first thread not named boss (lines 37-54). The same process repeatsfor slow resource in lines 59-81. When the round-robin code is reachedin line 84, the resource MAC and slow resource are already removed fromthe idle resource list. Finally, we will change the functionality of the de-fault sched created in line 178 to include the my scheduler instead of meshscheduler rr.

177 de f au l t s ch ed=mesh c r ea t e s chedu l e r ( ” d e f au l t s ch ed ” ,178 my scheduler ) ;

2.4 Interrupt Modeling and Pre-emptive Scheduling 25

Figure 2.14: Behavior of the new scheduler. (matrix mul7.c)

With the new scheduler in place, we run the matrix_mul8 example, gener-ating the output in Figure 2.14. As can be seen in the figure, the change ofscheduling strategy to avoid running the boss thread on resource MAC andslow resource, reduces the overall system runtime to 64.0, down from 66.0in Figure 2.9. However, looking at the empty periods in the trace output,we can conclude to still be far from an optimal schedule for this system. Assystem complexity grows and with data dependent thread runtimes (unlikethis simple example), finding good scheduling strategies becomes more dif-ficult. Even though this is a trivial example, it is meant to illustrate theavailable options in creating custom scheduling strategies within the MESHmodeling environment. For more information, please look at the customscheduler creation section in the MESH user’s manual.

2.4 Interrupt Modeling and Pre-emptive Scheduling

So far our simple scheduling models have all lacked an important featureof real schedulers: the ability to interrupt and pre-empt running threads.These schedulers implement “cooperative multitasking.” In other words,they rely on the software threads to create scheduling points by makingconsume calls at regular intervals. This is a problem because the scheduler isonly partly responsible for scheduling in this system – threads are explicitlyaware of scheduling and consume calls are used to mark scheduling points.Coupling software tasks with scheduling strategies hampers scheduler designspace exploration by reducing flexibility. It also prevents us from accuratelymodeling external events such as I/O.

Real systems, on the other hand, use interrupts to force unanticipatedscheduling points and often implement some form of pre-emption; threadsin these systems are not aware of most scheduling decisions because theyare triggered by external interrupts originating from other processors, orperipherals such as timers.

26 2 Tutorial

2.4.1 Interrupt Modeling

Interrupts are first-class citizens in the MESH framework that enhance theability of threads to influence schedulers in the system. Up to this pointthe systems we have modeled have been purely synchronous. Consumecalls mark scheduling points, and nothing is assumed to happen betweenconsume call boundaries. While this assumption holds for single-resourcesystems with no external inputs, asynchronous events can and do arise insystems with multiple processing resources or I/O. It is important to prop-erly capture these asynchronous events because they can have a profoundimpact on the scheduling behavior of the system.

Sched

Th Th. . .

. . .

Software

Schedulers

HardwareResources

ISR

Pn

P1Interrupt

Controller

Testbench

1Raise 1 Raise

2Pending

3 Fire

P1

4Active

Figure 2.15: Interrupt modeling in MESH

Figure 2.15 shows how MESH models interrupts. The process begins whensome thread in the system raises an interrupt on a resource, adding it to alist of pending interrupts (1). Both normal MESH threads and testbenchthreads (discussed later in this example) can raise interrupts. An interruptcontroller is used to select which (if any) of these pending interrupt shouldfire, and when (2). Once an interrupt fires on a resource it instantly cuts offthe currently scheduled consume call and returns the interrupted resourceto the idle list, allowing its scheduler to run (3). The remaining fraction ofthe consume call is saved for the next time the thread is scheduled. Ad-ditionally, each interrupt can have an interrupt handling or service routine(ISR) associated with it that runs whenever the interrupt fires (4). ThisISR is a MESH thread that executes outside the control of the resource’sscheduler; the MESH kernel schedules active ISR threads in LIFO order.Finally, MESH returns to normal scheduling again once there are no moreactive interrupts on a resource.

MESH represents interrupts as resource-name pairs that can be in any ofthree states:

• Inactive. During normal system execution all interrupts are inactiveand scheduling proceeds as usual.

• Pending. The interrupt has been raised but the interrupt controllerhas not fired it yet. Interrupt priorities, lack of support for nested


interrupts, and interrupt masking can all prevent a pending interruptfrom firing right away.

• Active. The interrupt has fired but the ISR has not finished executingyet. The interrupt controller determines when/whether an ISR can bepre-empted by other interrupts.

In this example (matrix mul8.c) we will show how to set up interrupt mod-eling in MESH and demonstrate how interrupts affect the matrix-matrixmultiply operation we have been working with.

By default, interrupt modeling is disabled in MESH, so we must set upinterrupt controllers for the four resources in our system. To reduce repeatedcode, we create a helper function to create resources and set up the necessaryinterrupt mechanisms on them:

199 // Helper function to create an interruptible resource200 mesh resource ∗201 c r e a t e r e s o u r c e ( gchar ∗name , me s h f e a t u r e l i s t ∗ c f l , mesh scheduler ∗ cs ) {202 mesh resource ∗ cr ;203204 // create the resource as usual205 cr = mesh c r ea t e r e sou r c e (name , c f l , cs , & mesh re source de fau l t , 1 ) ;206207 // assign an interrupt controller and create an interrupt208 me s h s e t i n t e r r u p t c o n t r o l l e r ( cr , & me sh d e f a u l t i n t e r r up t c on t r o l l e r , NULL) ;209 mesh c r ea t e i n t e r rup t ( cr , MY INTERRUPT, FALSE, & my isr , c r ) ;210211 return cr ;212 }

The resource is created in the usual way, using the feature list and schedulermesh setinterrupt controller()

details on pg. 71passed in by the caller. However, before returning the resource the helperassigns an interrupt controller and creates an interrupt.

The interrupt controller is responsible to decide whether – and when – apending interrupt should fire. It can be set by calling mesh set interruptcontroller. For now we choose to use mesh default interrupt controller(line 208), which simply activates one pending interrupt every time it is in-mesh default

interrupt controllerdetails on pg. 92

voked. If more specialized behavior is required a custom interrupt controllercan be used instead. Section 2.4.4 shows how this can be done.

Interrupts are created with calls to mesh create interrupt; they can haveany string name. In addition, each interrupt can have a handler, or interruptservice routine (ISR), associated with it if desired. This ISR will be passedmesh create interrupt()

details on pg. 63 one user-supplied argument every time MESH invokes it. We name theinterrupt “my interrupt” (line 209) and assign it my_isr(), which takes apointer to the resource as its argument:

169 // ISR170 void ∗ my isr ( void ∗ arg ) {171 mesh resource ∗ cr = arg ;172 mesh time now = mesh getTime ( ) ;173174 // just a touch of overhead175 p r i n t f ( ”%4 l f : I n t e r rupt on ’%s ’\n” , now , cr−>name ) ;176 mesh consume str and ex i t ( ”1” ) ;177178 // unreachable179 return NULL;180 }

In this case the ISR is trivial: it just prints a message and imposes a smalloverhead before returning. Other situations may or may not require an

28 2 Tutorial

ISR. Interrupts which are only used to trigger scheduling decisions do notusually need an explicit ISR. Peripherals, on the other hand, often demandsome sort of immediate processing in response to the interrupt. This “drivercode” is often best executed by an ISR.

Now that we have set up interrupts on the resources in our system, we needsomething to raise them. MESH provides special“testbench” threads forthis exact reason. Testbench threads model external influences, such as I/O,that affect the system but do not execute within it. These are somewhatdifferent from normal threads because they do not execute on resources andtherefore do not make consume calls to specify computational complexity orrequire timing resolution. Instead, they make calls to mesh_tb_wait_for()mesh tb wait for

details on pg. 89 to specify physical time delays between events.

In our case the testbench thread will raise interrupts on different resourcesmesh raise interrupt()details on pg. 70 at certain times throughout the simulation:

182 // Test bench thread to raise interrupts183 void ∗ tb thread ( void ∗ arg ) {184 // interrupt resource2 at t=15185 mesh tb wa i t f o r ( 1 5 ) ;186 me sh r a i s e i n t e r r up t ( resource2 , MY INTERRUPT) ;187188 // interrupt slow resource at t=20189 mesh tb wa i t f o r ( 5 ) ;190 me sh r a i s e i n t e r r up t ( s l ow re source , MY INTERRUPT) ;191192 // interrupt resource1 at t=50193 mesh tb wa i t f o r ( 3 0 ) ;194 me sh r a i s e i n t e r r up t ( resource1 , MY INTERRUPT) ;195196 return NULL;197 }

Finally, we need to instantiate the test bench thread with a call to mesh createmesh create tb thread()details on pg. 89 tb thread(). It will act as an external source of input to the system. Pass-

ing FALSE prevents the test bench thread from displaying in the MESHviewer:

245 mesh c rea t e tb thread ( ” tb thread ” , & tb thread , NULL, FALSE) ;

Figure 2.16 shows the MESH viewer output from running matrix_mul8.c.Because each interrupt handler invocation appears as a separate thread bydefault, it is helpful to change to name-based thread coloring instead (View→ Color by Name). At t = 15, an interrupt arrives on resource1 betweentwo consume calls, visible in the timeline as a small consume call labeled“my interrupt”. It has the side effect of pushing the boss thread to re-source2. At t = 20 thread worker1 executing on slow resource is interruptedin the middle of a consume call. Again this causes a migration, this time toresource MAC. Note how the rest of the interrupted consume call is com-pleted there. Finally, at t = 50 an interrupt arrives on (idle) resource1 withno impact on scheduling.

This example demonstrated how interrupts work in MESH. Next, we willuse the interrupt modeling abilities of MESH in a more realistic setting.

2.4.2 Pre-emptive Scheduling

This example explores using interrupts to implement a simple pre-emptivescheduler – a time-slicing round-robin scheduler. This can be achieved by


Figure 2.16: Output of running matrix mul8.c. Note how interrupts canoccur at any time, independent of the presence or progress of consume callson the resource.

implementing a custom scheduler, interrupt controller, and simulated timerperipheral that all work together to enforce the scheduling policy.

Figure 2.17 shows how these pieces fit together.

Each time the timer model wakes up it checks to see if it has reached thenext scheduled timeout. If so, it raises an interrupt on the resource, notifiesthe scheduler of the interrupt, schedules the next timeout, and goes back tosleep. If the scheduler restarted the timer while it was sleeping it will wakebefore the newly scheduled timeout and go back to sleep without raisingany interrupts. This behavior delivers the regular “timer tick” interruptsthe scheduler needs to perform time slicing.

Meanwhile, a custom scheduler implements time slicing by only changingthread-resource mappings under two circumstances:

1. A thread blocks or terminates before its time slice expires

2. The timer “tick” interrupt marks the end of the current time slice.

When the first case occurs the scheduler must restart the timer so thatthe next thread gets a full time slice. In both cases the scheduler picksthe least-recently scheduled (“LRS”) thread in the system to assign to theresource to enforce round-robin behavior. If neither of these cases holds thescheduler leaves the thread-resource assignment unchanged by reapplyingthe previous assignment. We describe the scheduler in more detail below.

State creation. First, we define a new struct to encapsulate state sharedby the timer(s) and schedule:

20 typedef struct {21 mesh resource ∗ cr ;22 gboolean in t e r rup t ed ;23 double i n t e r v a l ;24 double next t imeout ;25 } t ime r s t a t e ;

30 2 Tutorial

wait

timeout?

updatetimeout &

raiseinterrupt

yes

no

for r in idle resources { if (last_thread eligible && !interrupted) reschedule(last_thread) else { schedule(LRS_thread) reset timer }}

Timer Scheduler

interrupted

reset

Figure 2.17: Components of a time-slicing MESH scheduler

The fields of the struct will be used to track resource-specific state withineach timer and the scheduler that controls them all. The fields will be usedas follows:

• cr. Pointer to the resource paired with this timer. Each resource hasits own timer to allow flexible time slicing.

• interrupted. Flag set by the timer when it raises an interrupt, sincethere is no reliable way to infer interrupted status from thread orresource state.

• interval. Time slicing interval. This tells the timer how often to raisetick interrupts.

• next timeout. Next scheduled timeout/tick.

We also extend the helper function to accept the “tick” interval as an ar-gument, which it uses to initialize the state for the resource and set itstime-slicing interval.

272 // instantiate a state struct273 s t a t e = g new0 ( t ime r s t a t e , 1 ) ;274 state−>i n t e r v a l = i n t e r v a l ;275 state−>cr = cr ;276 state−>next t imeout = i n t e r v a l ;

Resource-local storage. After creating the resource and state struct, themesh resourcecreate entry()details on pg. 77

helper function places the state in resource-local storage so the scheduler canaccess and update the timer state of each resource as necessary.

278 // store the state in the resource279 mesh r e s ou r c e c r e a t e en t ry ( cr , TIMER KEY, s t a t e ) ;

MESH allows the user to associate local key-value pairs with resources,schedulers and threads as a way to flexibly extend these data types withoutchanging their definitions. Each key-value pair must be initialized before


use by calling the appropriate function (mesh_resource_create_entry()in this case). After initialization values can be retrieved and changed usingmesh_resource_get_entry() and mesh_resource_set_entry(), respec-tively (described later). The API reference contains complete descriptionsfor these functions and the corresponding ones for threads and schedulers.

Timer model. Next we replace the testbench thread from the last ex-ample with a timer model. It will raise the periodic timer interrupt thescheduler needs to properly time slice the threads it controls:

228 // A test bench thread that acts as a hardware timer and raises229 // periodic interrups . The timeout interval i s supplied as the230 // argument to the thread231 void ∗ t imer ( void ∗ arg ) {232 t ime r s t a t e ∗ s t a t e = arg ;233 mesh time now , de l t a ;234235 while ( 1 ) {236 // wait for the next timeout237 now = mesh getTime ( ) ;238 de l t a = state−>next t imeout − now ;239 if ( d e l t a < −EPSILON)240 g e r r o r ( ”Timer woke up l a t e ! ” ) ;241 else if ( d e l t a > EPSILON) {242 // scheduler must have acted while I was asleep243 p r i n t f ( ”%4 l f : % s t imer s l e e p i n g un t i l %4 l f \n” ,244 now , s tate−>cr−>name , s tate−>next t imeout ) ;245 mesh tb wa i t f o r ( de l t a ) ;246 }247 else {248 // raise the interrupt and update the timeout249 state−>i n t e r rup t ed = TRUE;250 state−>next t imeout = now + state−>i n t e r v a l ;251 me sh r a i s e i n t e r r up t ( s tate−>cr , MY INTERRUPT) ;252 mesh yie ld ( ) ;253254 }255 }256257 // unreachable258 return NULL;259 }

The timer uses the state struct to determine how long to wait between in-terrupts. If the wait between now and the next scheduled timeout is greaterthan some EPSILON (necessary because of accuracy issues in floating pointnumbers) the thread will sleep (lines 241 – 246). This can occur multipletimes if the scheduler updates the timer interval in the meantime. Oncethe wait is within ±EPSILON the timer will raise a tick interrupt on theresource to trigger scheduling. It then automatically begins a new timerinterval, sleeping until the next potential time slice (lines 247 – 254).

Because each resource must have its own timer model, we move the testbench thread instantiation inside the create int res function:

285 // use a test bench thread to represent a hardware timer286 timer name = g s t r dup p r i n t f ( ”%s t ime r ” , name ) ;287 mesh c rea t e tb thread ( timer name , & timer , s tate , FALSE) ;288 g f r e e ( timer name ) ;

Time slicing scheduler. The time slicing scheduler makes thread-resourceassignments in two phases.

93 // try to reschedule the same thread unless i t s time s l i c e expired94 for ( i t e r=i d l e r e s o u r c e s ; i t e r && i d l e t h r e a d s ; i t e r=g s l i s t n e x t ( i t e r ) ) {95 mesh r e sou r c e th r ead pa i r ∗ r t p a i r ;

32 2 Tutorial

96 mesh resource ∗ cr = i t e r−>data ;97 t ime r s t a t e ∗ s t a t e = mesh r e sou r c e ge t en t ry ( cr , TIMER KEY) ;98 mesh thread ∗ ct = cr−>l a s t t h r e ad ;99

100 // time s l i c e expired?101 if ( s tate−>i n t e r rup t ed ) {102 p r i n t f ( ” Inte r rupted !\n” ) ;103 continue ;104 }105106 // unavailable?107 if ( ! g s l i s t f i n d ( i d l e t h r e ad s , c t ) )108 continue ;109110 // reschedule111 r t p a i r = g new ( mesh re source thread pa i r , 1 ) ;112 r t pa i r−>thread = ct ;113 r t pa i r−>r e sou r c e = cr ;114 r t p a i r l i s t = g s l i s t p r e p e nd ( r t p a i r l i s t , r t p a i r ) ;115116 // update state117 i d l e t h r e a d s = g s l i s t r emov e ( i d l e t h r e ad s , c t ) ;118 cr−>i d l e = FALSE;119 }

First, it tries to reschedule threads whose time slice has not finished yet (lines100 – 104). The scheduler will restart the timer for a resource every time itchanges the thread assignment, so threads only get interrupted when theirtime slice has expired. The “interrupted” status is stored in the shared statestruct, which the scheduler retrieves using mesh_resource_get_entry().mesh resource

get entrydetails on pg. 78

The scheduler then checks to see if the thread is still eligible to run bysearching the eligible thread list (lines 106 – 108).

Once the scheduler has determined that the thread should be rescheduled,it makes a thread-resource pair and adds it to the list of assignments, asdescribed in Section 2.3.5.

During the second phase of scheduling, the scheduler pairs up remainingidle threads and resources in least-recently scheduled order to enforce roundrobin scheduling across time slices.

121 // schedule idle threads on available resources122 for ( i t e r=i d l e r e s o u r c e s ; i t e r && i d l e t h r e a d s ; i t e r=g s l i s t n e x t ( i t e r ) ) {123 mesh r e sou r c e th r ead pa i r ∗ r t p a i r ;124 mesh resource ∗ cr = i t e r−>data ;125 mesh thread ∗ ct = id l e t h r e ad s−>data ;126 t ime r s t a t e ∗ s t a t e = mesh r e sou r c e ge t en t ry ( cr , TIMER KEY) ;127128 // already taken?129 if ( ! cr−>i d l e )130 continue ;131132 // assign the next thread133 r t p a i r = g new ( mesh re source thread pa i r , 1 ) ;134 r t pa i r−>r e sou r c e = cr ;135 r t pa i r−>thread = ct ;136 r t p a i r l i s t = g s l i s t p r e p e nd ( r t p a i r l i s t , r t p a i r ) ;137 p r i n t f ( ”%s running %s at %4 l f \n” , cr−>name , ct−>name , now ) ;138139 // update state140 if ( cr−>l a s t t h r e ad != ct ) {141 s e t l a s t s c h e d u l e d ( ct , now ) ;142143 if ( ! s tate−>i n t e r rup t ed )144 state−>next t imeout = now + state−>i n t e r v a l ;145 }146147 // remove the thread from the l i s t148 i d l e t h r e a d s = g s l i s t r emov e ( i d l e t h r e ad s , c t ) ;149 }


It first checks to see if the resource was scheduled in the first phase (lines128 – 130). Then, it updates the last scheduled status of the thread (line141). Finally, if a time slice ended prematurely the scheduler updates thestate struct so the timer can restart (lines 143 – 144).

The scheduler creates an entry in thread-local storage to track when eachthread was last scheduled. Every time a thread begins a new time slice itstores the current simulation time in the thread-local entry. It later usesthis information find the least-recently scheduled thread to assign:

31 void s e t l a s t s c h e d u l e d ( mesh thread ∗ ct , double when ) {32 // keeps gcc happy about ”type−punned” pointer access33 union { void ∗v ; double ∗d ; } entry ;34 if ( ! mesh thread has entry ( ct , TIMER KEY, & entry . v ) ) {35 entry . d = g new ( double , 1 ) ;36 mesh thread c r ea t e ent ry ( ct , TIMER KEY, entry . d ) ;37 }3839 ∗( entry . d ) = when ;40 }

This introduces a new function – mesh_thread_has_entry() – which givesmesh thread has entry()details on pg. 82 a way to check whether a thread-local value already exists, since there is no

reliable way to determine whether a thread is newly created or not. Theentry union is necessary because double** and void** are incompatibletypes. After testing for the entry’s existence the code creates an entry ifnecessary.

Running the example Now that we’ve set up the infrastructure for time-slicing, there are a few small changes to make to complete the example. Themain() function for this example is very similar to the last one. The onlydifference is that we now pass the time slicing interval (6 in this case) tothe resource-creation helper function (see page 29), recalling that it uses thevalue to initialize the time state for each resource.

314 re source1 = c r e a t e r e s o u r c e ( ” r e source1 ” , c f l ,315 de fau l t s ched , 6 ) ;316 r e source2 = c r e a t e r e s o u r c e ( ” r e source2 ” , c f l ,317 de fau l t s ched , 6 ) ;318 s l ow r e s ou r c e = c r e a t e r e s o u r c e ( ” s l ow r e s ou r c e ” ,319 c f l s l ow , de f au l t s ched , 6 ) ;320 resource MAC = c r e a t e r e s o u r c e ( ”resource MAC” ,321 cf l mac , de f au l t s ched , 6 ) ;

Because the timer testbench thread runs in an infinite loop, we must alsomesh exit()details on pg. 75 modify the boss thread to terminate the simulation when it completes, with

a call to mesh_exit().202 //print results203 for ( i =0; i<MAXROWS; i++)204 p r i n t f ( ”row %d: %d\n” , i , answer [ i ] ) ;205206 //overhead of joining threads and207 //output of results208 mesh consume str ( ”10” ) ;209 mesh exit ( 0 ) ;210211 return NULL;212 }

Now we can run the example, found in matrix_mul9.c. Figure 2.18 showsthe resulting output in the MESH viewer. Note how consume calls no longerdirectly correspond to scheduling decisions. A single time slice might consist

34 2 Tutorial

Figure 2.18: Time-sliced program execution

of only part of a single consume call (ie slow resource at time 200), or manysmall consume calls together (ie resource MAC, also at time 200). Also notehow the timer restarts when a different thread begins executing, ensuringthat all threads get their full time slice. Finally, we can see how the timertick can split consume calls at arbitrary points (again resource MAC at time200), allowing interrupt handling to occur at exactly the right time.

Our new scheduler successfully decouples scheduling from consume calls,though threads can still force scheduling decisions by terminating or block-ing. This opens a new set of design variables to experiment with. Possibili-ties include varying time slice length (perhaps different lengths for differentprocessors) or implementing purely pre-emptive schedules where the highestpriority ready task always runs. Finally, interrupt modeling, combined withtest bench threads, makes it possible to model external devices and theirdriver code (a timer in this case).

2.4.3 Lightweight Consume Calls

Now that scheduling behavior is decoupled from consume calls made bysoftware tasks we would like to eliminate any consume calls that are nolonger necessary. In practice, time slicing intervals are orders of magnitudelonger than consume calls made at basic block (or even function) boundaries,but each consume call causes a context switch between the software threadand the MESH kernel for scheduling. The vast majority of the time, thesecontext switches are unnecessary because the scheduler will simply reassignthe same threads to the same resources.

Fortunately, MESH provides “lightweight” consume call functions that re-mesh lightweightconsume str

details on pg. 77port computational complexity without causing expensive context switches.Instead, the software thread accumulates “traces” of these lightweight con-sumes that are delivered to MESH much less frequently. This drasticallyreduces the overhead of fine-grained consume calls; threads that do notsynchronize or otherwise interact with other threads might not make any


consume calls until after they complete. In practice using lightweight con-sumes can result in 20-50x speedup with no other changes to the code.

In our example there are three kinds of threads that make consume calls:boss, worker and ISR. Every consume call in the boss thread forms a syn-chronization point (because it joins on worker threads), so lightweight con-sumes would not provide any benefit. The ISR thread only makes oneconsume call in its lifetime, for a similar lack of opportunity. However,the worker threads make many consume calls inside of a loop; these con-sume calls are necessary only to capture data-dependent behavior and donot mark a need for scheduling. This makes them excellent candidates forlightweight consumes:

211 void ∗ worker ( void ∗ arg ) {212 int i ;213 int row = ( int ) arg ;214 int r e s u l t =0;215216 //perform matrix multiplication for this row217 for ( i =0; i<MAX COLS; i ++) {218 r e s u l t += matrixA [ i ] [ row ] ∗ matrixB [ i ] ;219 mesh l i ghtwe ight consume st r ( ”ADD=1:MUL=1:MAC=1” ) ;220 }221222 return ( void ∗ ) r e s u l t ;223 }

Figure 2.19: Time-sliced program execution with lightweight consume calls

After compiling and running matrix_mul10.c, we can see how lightweightconsumes change the viewer’s output in Figure 2.19. The worker threads areno longer made up of many small consume calls. Instead, each tick interruptimposes a break at the appropriate places; the rest of the time simulationproceeds without the distraction of consume call-imposed context switches.After a short time we also note that scheduling diverges from the previousexample. This is due to the fact that we modeled a multiprocessor systemand the scheduler implemention does not make any guarantees about theorder it will schedule threads that are all “least-recently scheduled.” Undera deterministic scheduling algorithm the behavior of the system would nothave changed.

As a rule of thumb, lightweight consume calls should always be used to pro-

36 2 Tutorial

vide MESH with computational complexity. The only time “heavyweight”(normal) consume calls are necessary is to mark explicit scheduling points inthe code. Examples might include coroutines for cooperative multithread-ing, synchronization points, etc. At those places it is necessary to use normalconsume calls or mesh yield() to force the scheduler to run.

2.4.4 Custom Interrupt Controller and DMA Example

SoftwareDMA

EngineWorker

0Worker

7...


DMA Resource

Schedulers

HardwareResources

Resource 2

Slow Resource

Resourcew/ MAC


Resource 1

BossTimerTbench

DMAISR

Figure 2.20: Changed system architecture for this example, including DMAengine

The matrix-matrix multiply example we have been using so far assumes thatall memory addresses are freely accessible by all processors in the system. Inthis example we will change the system so that each processor must transferits working set to a private local memory before beginning calculations.We will offload these transfers to a hardware DMA engine that raises aninterrupt when it completes each transfer. Figure 2.20 shows the updatedsystem configuration.

MESH represents hardware peripheral models (like DMA engines) usingthe same layered approach that governs software. However, in this case,the “software thread” actually represents a hardware state machine thatexecutes continously. As we can see from the figure, the DMA engine threadexecutes on a dedicated “resource” that is always uncontended.

For the moment we assume that there is no contention for memory; Section2.5 introduces contention modeling in MESH.

Because each DMA transfer will invariably unblock a thread when it com-pletes, we would the notification interrupt to have higher priority than thetimer tick interrupt that drives scheduling. Unfortunately the default in-terrupt controller provided by MESH does not provide this functionality, sowe must define a custom controller instead.

We will begin by designing our DMA engine model to implement the behav-ior depicted in Figure 2.21. When a thread initiates a DMA transfer (seenon the right), it first pays a fixed overhead, then blocks until the trans-fer completes. The processor is free to run other threads while the DMAengine performs the memory copy operation (seen on the left). When thetransfer completes, the engine raises an interrupt to notify the system. Thescheduler then unblocks the waiting thread, allowing it to proceed normally(again on the right).


RequestAvailability

ProcessRequest

RaiseInterrupt

Wait(req_cond)

Setup

Notify(done_cond)

Wait(req_cond)

DMA(dma_engine)

Thread(dma_transfer)

Notify(done_cond)

DMA ISR(dma_done_isr)

Executes on processor resources

no

yes

Figure 2.21: DMA engine block diagram

Because the DMA engine is a peripheral it does not know about operatingsystems or threading; the scheduler must track which thread to wake upwhen a DMA transfer is complete.

DMA Engine Though it could be modeled using a testbench thread likethe timer peripheral, we will use a dedicated MESH resource running aservice thread instead because DMA does consume processing resources ina way. In addition, we would not be able extend the model later to accountfor contention unless it were based on a resource. Again the MESH threadactually represents a hardware state machine; the software ISR and drivercode will execute on the normal processors in the system.

99 // service thread modeling a DMA engine100 void ∗ dma engine ( void ∗ arg ) {101 dma state ∗ s t a t e = arg ;102103 while ( 1 ) {104 dma request ∗ r eques t ;105106 // wait for a DMA request107 while ( ! s tate−>r eque s t s )108 s i g n a l wa i t (&state−>s i g n a l ) ;109110 // ”process” the request111 reques t = state−>reques t s−>data ;112 mesh consume str ( ”%d” , request−>s i z e ) ;113114 // raise the ”DMA complete” interrupt115 me sh r a i s e i n t e r r up t ( request−>cr , DMADONE) ;116 mesh yie ld ( ) ;117 }118119 // unreachable120 return NULL;121 }

As seen on the left side of Figure 2.21, the DMA engine is very simple; itwaits for one or more requests to arrive (lines 106 – 108), then processes them

38 2 Tutorial

in order (lines 110 – 112), raising an interrupt each time it finishes a request(lines 114 – 116). At a high level the DMA and timer models are quitesimilar. Both loop indefinitely, waiting for some event to occur, performingsome action, then going back to waiting. They also both communicate withthe rest of the system using a struct that encapsulates shared data. Thisstruct is shown below and maintains a list of waiting requests as well assynchronization state (described later).

40 typedef struct {41 GSList ∗ r eque s t s ;42 s i g n a l s t a t e s i g n a l ;43 } dma state ;

The state consists of a list of requests and a signal_state object we willdiscuss more later. Each request is also encapsulated by a struct:

34 typedef struct {35 int s i z e ;36 mesh resource ∗ cr ;37 s i g n a l s t a t e s i g n a l ;38 } dma request ;

DMA requests contain everything the scheduler and/or DMA engine needto service them. They consist of a pointer to the resource to copy data to,the size of the transfer, and another signal_state object.

MESH Synchronization Primitives Unlike the timer tick, DMA trans-fers are sporadic and unpredictable. These unpredictable arrival times makemesh_tb_wait_for() unattractive because we don’t know how long to sleepwhile waiting for the next DMA request. Instead we would like the DMAengine to sleep when idle, then use some mechanism that allows us to “wakeit up” when new work arrives. This handshaking mechanism is representedby the dotted lines in Figure 2.21.

MESH supports a full set of synchronization primitives, including a pairthat provides exactly the functionality we need: the mutex and conditionvariable. These primitives work in any MESH thread, including testbenchthreads.

Mutexes enforce mutual exclusion. When a thread holds a mutex (by “lock-ing” it) it is guaranteed that no other thread currently holds that samemutex. Concurrent computations can safely access shared data by protect-ing it with one or more mutexes.

Condition variables provide a means for one thread to notify other threadsthat some event has occurred. Threads waiting for an event block on thecondition variable until the notification arrives, at which point one or morewaiting threads will wake up and continue executing.

Unfortunately, timing races make condition variables tricky to use prop-erly. When a thread sends a notification it only wakes up threads that arecurrently waiting for it. Threads that do not begin waiting until after thenotification will not wake up until the next notification, which may neverarrive. Solving this problem requires two additional pieces: a mutex and astatus variable.

Associating a boolean status variable with each condition variable allowsus to handle threads that attempt to wait after the notification has been


sent. The thread that sends the notification also sets the status to “ready”;threads check the status to decide whether to block on the condition variableor return immediately (because the event has already occurred).

Thread A Thread B

if (!ready) {ready = TRUEsignal()

wait()}

.

.

Figure 2.22: Example condition variable race condition

Because multiple threads might concurrently access the status variable westill face the potential timing race shown in Figure 2.22. Thread “A” mightcheck the status and decide to sleep, only to have thread “B” set the status(and send the notification) before “A” actually blocks. In order to eliminatethis race we must protect both the condition and status variables with amutex. This way, a thread that decides to sleep will be able to do so beforeany other thread can send a notification, and the thread that sends thenotification will always update the status in a safe way.

The signal_state struct and four helper functions exist to encapsulate thisbehavior:

28 typedef struct {29 mesh thread mutex lock ;30 mesh thread cond cond ;31 gboolean ready ;32 } s i g n a l s t a t e ;

The struct contains the mutex, condition variable and status flag we needto properly send event notifications.

69 // in i t i a l i z e a signal70 void s i g n a l i n i t ( s i g n a l s t a t e ∗ s i g n a l ) {71 mesh thread mutex in i t (& s i gna l−>lock , NULL) ;72 mesh thread cond in i t (& s i gna l−>cond , NULL) ;73 s i gna l−>ready = FALSE;74 }

signal_init() prepares the struct for use. It clears the status flag and ini-mesh threadmutex init()

details on pg. 84tializes the mutex and condition variable using calls to mesh thread mutexinit() and mesh thread cond init(). These two MESH API calls form

mesh threadcond init()

details on pg. 80

wrappers to the underlying pthreads library MESH is built on top of. Theyinitialize MESH state as well as allocating the necessary resources from theoperating system.

76 // destroy a signal77 void s i g n a l d e s t r o y ( s i g n a l s t a t e ∗ s i g n a l ) {78 mesh thread mutex destroy(&s i gna l−>l o ck ) ;79 mesh thread cond destroy (&s i gna l−>cond ) ;80 }

signal_destroy() cleans up the system resources tied up by this struct,freeing the mutex and cond variables with calls to mesh thread mutexdestroy() and mesh thread cond destroy() (see pages 80 and 84 for moredetails).

40 2 Tutorial

51 // make the current thread wait on a signal , blocking i f necessary52 void s i g n a l wa i t ( s i g n a l s t a t e ∗ s i g n a l ) {53 mesh thread mutex lock(&s i gna l−>l o ck ) ;54 while ( ! s i gna l−>ready )55 mesh thread cond wait (& s i gna l−>cond , & s i gna l−>l o ck ) ;5657 s i gna l−>ready = FALSE;58 mesh thread mutex unlock(&s i gna l−>l o ck ) ;59 }

signal_wait() allows a thread to wait for a signal. First it locks the mutexmesh threadmutex lock()

details on pg. 85to ensure atomicity. Then, it enters a loop that repeatedly tests the statusvariable before blocking on the condition variable. mesh thread cond wait()atomically releases the mutex before blocking, and reacquires it atomicallywhen notified. This allows the notifier to acquire the mutex while the wait-mesh thread

cond wait()details on pg. 81

ing thread is blocked. The loop is a safety measure to protect against anyspurious wakeups that may arrive. While unlikely, they are not prohib-ited by the threading packages of modern operating systems, especially ifmultiple threads are waiting for an event but only one will process it. Oncemesh thread

mutex unlock()details on pg. 85

the status signal is set the thread (possibly without ever blocking) clears it,unlocks the mutex, and returns from the function.

61 // signals a ( potentially blocked ) thread62 void s i g n a l n o t i f y ( s i g n a l s t a t e ∗ s i g n a l ) {63 mesh thread mutex lock(&s i gna l−>l o ck ) ;64 s i gna l−>ready = TRUE;65 mesh thread cond s igna l (& s i gna l−>cond ) ;66 mesh thread mutex unlock(&s i gna l−>l o ck ) ;67 }

Like the wait operation, the notify must protect itself with the mutex. Itmesh threadcond signal()

details on pg. 80then sets the status flag and sends the notification before returning.

These four helper functions will be used in several places throughout thisexample.

DMA transfer process Now that we have a means for threads to blockon events and send event notifications we can implement the actual DMAtransfer process.

307 void dma trans fe r ( dma state ∗ s tate , int s i z e ) {308 dma request r eques t ;309 mesh thread ∗ ct = mesh current thread ( ) ;310311 // in i t i a l i z e312 p r i n t f ( ”%4 l f : ’% s ’ r e q e s t i n g a DMA t r a n s f e r \n” ,313 mesh getTime ( ) , ct−>name ) ;314 s i g n a l i n i t (&reques t . s i g n a l ) ;315 reques t . s i z e = s i z e ;316 reques t . c r = ct−>l a s t r un on ;317318 // enqueue the request319 mesh consume str ( ”2” ) ;320 state−>r eque s t s = g s l i s t a pp end ( s tate−>reques t s , & reques t ) ;321 state−>s i g n a l . ready = 1;322 s i g n a l n o t i f y (&state−>s i g n a l ) ;323324 // wait for i t to complete325 s i g n a l wa i t (&reques t . s i g n a l ) ;326327 // clean up328 p r i n t f ( ”%4 l f : ’% s ’ DMA t r a n s f e r complete\n” ,329 mesh getTime ( ) , ct−>name ) ;330 s i g n a l d e s t r o y (&reques t . s i g n a l ) ;331 }


Threads can call this function to request a DMA transfer. They first set upthe request (lines 318 – 322), then block until the transfer is complete (line325). The thread initializes the signal struct with appropriate values beforeenqueueing the request and notifying the DMA engine (lines 311 – 316).Referring back to the DMA engine thread we can see that this notificationwill allow the engine to begin processing the request (lines 107 – 108).

When the transfer is complete it raises an interrupt, which will invoke theDMA ISR function:

82 // ISR for DMA transfer complete interrupt83 void ∗ dma done isr ( void ∗ arg ) {84 dma state ∗ s t a t e = arg ;85 dma request ∗ r eques t ;8687 // remove the completed request from the queue88 reques t = state−>reques t s−>data ;89 s tate−>r eque s t s = g s l i s t r emov e ( s tate−>reques t s , r eques t ) ;9091 // notify the waiting thread92 s i g n a l n o t i f y (&request−>s i g n a l ) ;9394 // apply overhead and return95 mesh consume str and ex i t ( ”2” ) ;96 return NULL;97 }

The ISR removes the (completed) request from the queue, then notifies thewaiting thread that it can continue executing. When the thread unblocks(in dma_transfer(), line 325) it cleans up the signal struct and returns tothe caller.

Defining the custom interrupt controller As we mentioned at thebeginning of this example, we want the DMA interrupt to have a higherpriority than the time slice interrupt so that newly unblocked threads canrun as soon as possible. To achieve this we define the following function toact as an interrupt controller:

388 // priorit ized interrupt controller that favors DMA over others389 char ∗ my in t e r r up t c on t r o l l e r ( void ∗ arg , GSList ∗∗ pending ) {390 mesh resource ∗ cr = arg ;391 char ∗ t o f i r e ;392 char ∗ head ;393394 // Should never have an empty pending interrupt l i s t395 g a s s e r t (∗ pending ) ;396397 // prefer the DMA interrupt398 head = (∗ pending)−>data ;399 if ( g s l i s t f i n d (∗ pending , DMADONE) ) {400 if ( strcmp ( head , DMADONE))401 p r i n t f ( ”DMA takes p r i o r i t y over %s\n” , head ) ;402 t o f i r e = DMADONE;403 }404405 // allow other interrupts i f none are active406 else if ( ! cr−>i n t e r r u p t c o n t r o l l e r . a c t i v e )407 t o f i r e = head ;408409 // don ’ t f i r e anything410 else411 return NULL;412413 // remove the winner from the l i s t and return i t414 ∗pending = g s l i s t r emov e (∗ pending , t o f i r e ) ;415 return t o f i r e ;416 }

42 2 Tutorial

There are several details to note here. Every MESH interrupt controllertakes a list of pending interrupt names and returns the one that should fire,if any, after removing it from the list (hence the double pointer). BecauseMESH uses normal strings to identify interrupts it is straightforward tocompare and sort the items in the pending list, if needed. A user-suppliedargument (a pointer to the resource in this case) can be used to store stateor customize behavior as needed.

Our controller first checks for a pending DMA interrupt (lines 397 – 403),which it will always fire, interrupting the timer tick if necessary. If no DMAis pending it then checks for currently active interrupts by accessing theinterrupt controller state of its resource (lines 405 – 407). active is a list ofISR threads (pointers to mesh_thread) currently in progress and must notbe modified. In this function we only check whether any other interruptsare active to prevent the timer tick from interrupting the DMA ISR. Morecomplicated control might require evaluating the set of currently executingISR threads to determine the correct course of action.

Finally, the chosen interrupt is removed from the list and returned to MESHfor firing (lines 413 – 415).

Running the example In order to expose the effects of prioritized in-terrupts in this short example we change the time slice interval to five timeunits instead of six. Inside the resource creation helper function we assignour custom interrupt controller and add the DMA interrupt to each resource.

Finally, we instantiate the DMA engine and its state in main(), initializinga globally declared instance of dma state as part of the process. Note thatunlike the timer the DMA engine runs on a dedicated hardware resourceand scheduler:

484 // set up the DMA engine485 s i g n a l i n i t (&g loba l dma s ta t e . s i g n a l ) ;486 g loba l dma s ta t e . r eque s t s = NULL;487 dma sched = mesh c r ea t e s chedu l e r ( ”dma sched” , & mesh schedu l e r r r ) ;488 mesh c r ea t e r e sou r c e ( ”dma” , NULL, dma sched , & mesh re source de fau l t , 1 ) ;489 mesh create thread ( ”dma engine” , dma sched , & dma engine , & g loba l dma s ta t e ) ;

The only other task is to update the interrupt controller assignment andrun the simulation (matrix_mul11.c):

439 me s h s e t i n t e r r u p t c o n t r o l l e r ( cr , & my in t e r r up t c on t r o l l e r , c r ) ;

Figure 2.23 shows the simulation results for this example. We doubled theoverhead of both DMA and timer ticks to order to highlight the effects ofprioritized interrupt nesting. This can be seen on resource2 between times55 and 60, when DMA interrupts the timer tick ISR as it executes.

Following the timeline for the DMA resource, we can see that each time atransfer completes (marked by a consume call), an the DMA interrupt fireson the processor that originally made the request. For example, the firstDMA request originated on resource2 and the second on resource1. Thecustom interrupt controller’s priority policy becomes visible whenever thetransfer completes just before or during a timer tick. Once the interruptcompletes, the blocked thread can resume executing as soon as the schedulerfinds an idle resource for it.

2.5 Shared Resource Modeling 43

Figure 2.23: Results of running matrix mul11.c. DMA interrupts take pri-ority over timer ticks.

While trivial, this example demonstrates how to define custom interruptcontrollers in MESH. In practice arbitrary interrupt handling can be achievedby consolidating, ignoring, prioritizing and/or nesting interrupts.

2.5 Shared Resource Modeling

Until now, we’ve assumed that our threads can access memory for free andthat all communication between threads happens in zero time with no over-head. In this section, we will add some interconnect models to see how theyplay an important role in system design.

2.5.1 Modeling Communication in MESH

As seen before in Figure 2.2, the MESH layered view separates design en-tities into hardware resources, schedulers, and software threads. In orderto model communications or memory let us consider a separate three layerstack dedicated to communications only, as shown on the right in Figure2.24. The separate communications stack has many parallel concepts tothe execution stack on the left. Although the hardware resource layer doesnot contain processors like the execution stack, and instead contains busses,queues, routers, links, etc., the concepts for their simulation are still thesame: resources resolve timing by handling consume requests from the abovelayers.

The communications stack contains schedulers in the middle layer just likethe execution one does. Often, the entities that allocate communicationresources are called arbiters. Even though we make the distinction betweenthe “schedulers” in the execution stack and “arbiters” in the communication

44 2 Tutorial

stack, these are identical in their form and implementation within the MESHsimulation: they receive consume calls from the software layer above andplace them onto resources.

P1

Sched

Th Th. . .

. . .

Software

Schedulers

HardwareResources

Arbiter

Comm Th.

Comm Th.

. . .

. . .Comm. RPn Comm.

R

Figure 2.24: A view of the previously presented execution stack (left) andthe communications stack (right).

The largest difference between the communication and execution stacks isin the top (software) layer. Note that the name “software” layer might bemisleading for the communication stack because some behaviors modeledat this layer may be implemented in hardware for a real system. This willbecome obvious once we show some examples in the following sections. Justlike software threads in the execution stack, the communication threadsexhibit concurrent execution while annotations are used to consume theappropriate number of hardware resources. The big difference is in howcommunication threads are used.

As seen with the arrows in Figure 2.24, communication threads can onlybe used by the execution threads. In other words, it is not possible forcommunication behavior to exist in ether; a hardware block or a piece ofsoftware running on a processor must start the communication process. Acommunication event must be started through a read/write interface andmust contain the name of the appropriate communication template.

Figure 2.25 shows through an example how execution threads (red blocks)instantiate a communication thread (yellow blocks) from a communicationtemplate. Let us say that t1 the execution thread running on P1 needs tograb a value from shared memory in order to continue running. The execu-tion thread will issue a mesh_comm_read call which among other things con-tains information about which communication template to use, which mem-ory address to access, and what should the execution thread do while waitingfor the value from memory. Communication templates can be seen as de-scriptions of tasks that must be completed in order for the mesh_comm_readcall to succeed. For example, in the case of this memory access, a busmust first be accessed to make the request to the memory bank. Next, thememory bank is used to access the value. Finally, the value is sent back tothe requesting processor using the bus. These three steps are shown in the“Access Memory” Template box in Figure 2.25.


Comm. Template Library

Access Memory

Access Disk

Access Wireless

P1

Bus

Memory

blocked on comm.

Exec

. St

ack

Com

m.

Stac

k

"Access Memory" template://access bus to get to memoryconsume("bus_arbiter{2}");

//perform memory readconsume("memory_arbiter{4}");

//return read value through the busconsume("bus_arbiter{4}");

mesh_comm_read(- use "Access Memory" template- memory address to access- block thread until value received);

Time

t0 t1 t2 t3 t4

Figure 2.25: Example illustrating the relationship between executionthreads (red) and communication threads (yellow).

These service times can be annotated via consume calls in a similar way tohow computation complexity is annotated in the execution stack. Note thatit is necessary to explicitly specify a scheduler/arbiter that will service theconsume call. Unlike the execution stack, we must be able to distinguish be-tween various classes of communication resources , such as memory banks orbusses. In our example, the first consume call within the “Access Memory”template results in bus usage from t1 to t2. The second consume call, repre-senting the actual access to the memory bank, blocks the memory from t2 tot3. Finally, the data is returned to the processor via the third consume call,lasting from t3 to t4. It is important to note that there is no contention foreither the bus or the memory in this example because we’re illustrating onlyone read. In the presence of multiple simultaneous reads, multiple commu-nication threads using the “Access Memory” template would be spawned,creating contention for the bus and shared memory.

In the following section we will put these newly introduced model elementsto use by incorporating a simple bus into our example.

2.5.2 Simple Bus-Based Example

Let us start with a simple system of two processors, much like one fromFigure 2.4, and connect the processors to a single shared memory bankvia a bus (Figure 2.26). During the computation of matrix product, eachprocessor will have to access the shared memory via the bus in order to getthe matrix data. We assume that program instructions reside locally witheach processor.

46 2 Tutorial

Resource 1 Resource 2

Shared Memory

Figure 2.26: Bus based interconnect for our architecture.

So, how does this system view fit into the split stack MESH view presentedin the previous section? Figure 2.27 shows the amended view of the sys-tem. Note that all the worker threads can access a top layer behavior (i.e.communication thread) named Bus Access. Therefore, whenever a bus ac-cess is required, this communication thread is instantiated and runs untilthe access is satisfied. Since a thread that gains exclusive access to the busgains exclusive access to the memory as well, we will model these two as asingle resource (see hardware layer of comm. stack). For the same reason,the bus arbiter will also serve as a de facto memory arbiter, deciding whichthread receives access to both the bus and memory. Similarly, the Bus Ac-cess thread’s consume calls will model both the bus and memory delay atonce. In a later example, we will split the bus and the memory into separateshared resources with their own arbiters in order to model more complexsituations.



Resource 1

Schedulers

HardwareResourcesResource 2 Bus/

Memory

BusArbiter

Bus Access

Execution Stack Comm. Stack

Figure 2.27: Modeling a shared bus and memory.

We make our changes to matrix_mul3.c to create matrix_mul12.c that im-plements our bus model. First, we will create an arbiter for the bus/memoryresource together with a communication thread that models the behavior ofa bus access:

103 //create schedulers104 de f au l t s ch ed=mesh c r ea t e s chedu l e r ( ” d e f au l t s ch ed ” ,105 mesh schedu l e r r r ) ;106


107 bu s a rb i t e r = mesh c r ea t e s chedu l e r ( ” bu s a rb i t e r ” ,108 mesh schedu l e r r r ) ;109110 //create communication threads111 mesh create comm thread ( ” bus acc e s s ” , bu s a rb i t e r , bu s ac c e s s ) ;

In line 107 we create a scheduler named bus_arbiter using the well knownmesh_create_scheduler call. Just like creating custom scheduling strate-gies for execution resources, a custom arbitration strategy can be createdhere as described in Section 2.3.5. For simplicity of this example, we willuse the default round-robin behavior.

The mesh_create_comm_thread function behaves very similarly to the func-tion for creation of execution thread. Just like mesh_create_thread, themesh create comm

thread()details on pg. 88

mesh_create_comm_thread takes in arguments for thread name, defaultscheduler, and a function pointer to the thread behavior. What is miss-ing is a void * type optional argument that all execution threads can haveat creation. As we will see later, we will be given an opportunity to passoptional arguments when calling communication threads through the read-/write interface.

116 //create execution resources117 mesh c r ea t e r e sou r c e ( ” r e source1 ” , c f l ,118 de fau l t s ched , mesh re source de fau l t ,119 1 ) ;120121122 mesh c r ea t e r e sou r c e ( ” r e source2 ” , c f l ,123 de fau l t s ched , mesh re source de fau l t ,124 1 ) ;125126 //create communication resources127 mesh c r ea t e r e sou r c e ( ”bus” , c f l ,128 bus a rb i t e r , mesh re source de fau l t ,129 1 ) ;

As seen in the code above, we will create a bus resource along with execu-tion resources. Note that the communication resource creation is identicalto execution resource creation. The simulator does not distinguish betweencommunication or execution resources; all resources have a certain “compu-tation power” that allows them to be consumed and thus generate systemtiming. Whether the consumes come from execution or communication en-tities does not matter. Note that in line 128 we set bus arbiter to be thebus resource’s scheduler.

The next issue to consider is how to perform the reading of memory fromwithin the worker threads. For this purpose, we will consider only memoryaccesses of the matrix data, ignoring the memory accesses brought about byinstruction loading, thread spawning, etc. Since all the matrix data accesseshappen within the worker threads, let us return to the piece of code thatactually performs the addition and multiplication operations:

35 //perform matrix multiplication for this row36 for ( i =0; i<MAX COLS; i ++) {37 //get necessary data38 mesh comm read ( ” bus acc e s s ” ,NULL,NULL,39 BLOCK RESOURCE) ;40 mesh comm read ( ” bus acc e s s ” ,NULL,NULL,41 BLOCK RESOURCE) ;4243 //perform calculations44 r e s u l t += matrixA [ i ] [ row ] ∗ matrixB [ i ] ;45 mesh consume str ( ”ADD=1:MUL=1” ) ;46 }

48 2 Tutorial

4748 //write result back to shared memory49 mesh comm write ( ” bus acc e s s ” ,NULL,NULL,50 BLOCK RESOURCE) ;

Big change here is the addition of the mesh_comm_read API call that willgrab the data values from the A and B matrices in order to perform the cal-culation. The mesh_comm_read call, along with its counterpart mesh_commmesh comm read()

details on pg. 87 _write, is the main method for execution threads to instantiate and usemesh comm read()

details on pg. 87communication threads. Remember that in line 111 we’ve created a com-munication thread called bus access. Through the use of mesh_comm_read,the worker thread instantiates the communication thread bus access just asshown by the arrows between threads in Figure 2.27. The mesh_comm_readargument BLOCK RESOURCE means that the worker thread will sub-sequently block the processor it is running on until the mesh_comm_readrequest is complete (i.e. the bus access completes). There are several otherblocking options available which will be discussed later in this tutorial. Wewill also postpone discussion of other two mesh_comm_read arguments (setto NULL in this example), which allow data and parameters to be passedto the communication threads.

Finally, we need to describe what happens when control is passed fromworker threads onto the bus access thread. Obviously, this part of thesimulation should model an access to the shared bus and memory, applyingthe appropriate delay. Let us look at the code for bus access:

13 //bus access comm thread14 void ∗ bus acc e s s ( void ∗ arg )15 {16 mesh comm thread data ∗ comm thread data = ( mesh comm thread data ∗) arg ;1718 // i f write i s requested19 if ( comm thread data−>func==WRITE ACCESS)20 mesh l ightweight consume ( 2 , 0 ) ;2122 // i f read i s requested23 if ( comm thread data−>func==READ ACCESS)24 mesh l ightweight consume ( 1 , 0 ) ;2526 mesh f r e e de layed ( comm thread data ) ;27 return NULL;28 }

First thing to notice is that any communication thread has a mesh_comm_thread_data structure passed to it as an input argument. This struc-ture contains several useful pieces of information for the communicationthread, several of which we will use in this example. One of the fields withinthe mesh_comm_data structure is the func field which specifies whether thethread was started as a result of read or a write request. Depending on thetype of request, we will consume a different amount of resources (lines 19through 24), ultimately resulting in different delays for reads and writes. Inthis example, writes will take twice as long as reads, as seen by the differencesin the consume call values. The mesh_consume_str_and_exit is a slightlydifferent type of consume call; it notifies the simulator that it is the lastmesh consume str

and exit()details on pg. 75

consume call in the thread and that the thread can be immediately termi-nated. The time of bus access thread termination is very important. Afterall, the worker thread that started the memory access is blocking until thecompletion of bus access thread. Additionally, note that every communica-tion thread is responsible to free its own mesh_comm_thread_data structure


(line 26).

The above implementation of bus access thread represents a very simpleexample of writing communication threads. In several following examplesin this tutorial, we will attempt to show other, more advanced features ofcommunication threads. Now that we have set up a bus interconnect model,we will run the matrix_mul12 model and receiving the following output:

MESH Simulation Kernel - compiled on Sep 25 2004 13:20:37

row 0: 0row 1: 0row 2: 0row 3: 0


Resource usage:resource1: 32.000000, for sched: 0.000000resource2: 46.000000, for sched: 0.000000bus: 40.000000, for sched: 0.000000Thread usage:worker3: 12.000000, contended 0.000000boss: 30.000000, contended 0.000000worker0: 12.000000, contended 0.000000worker1: 12.000000, contended 0.000000worker2: 12.000000, contended 0.000000

The output above tells us that the total runtime of the system was 78.0 timeunits and that the bus resource was utilized a little above half the time. Letus look at the trace graph for more information:

Figure 2.28: System simulation including overhead due to memory accesses.(matrix mul12.c)

Figure 2.28 features two separate windows showing the execution stack trace

50 2 Tutorial

on the top, and the communication stack trace on the bottom. The only re-source in the communication stack is the bus resource. Running on it are theread and write varieties of bus access behavior, shown in two different colors.At time t = 5.0 the worker0 thread attempts to run on resource1. However,before it performs its work and consumes computation from its resource,it must first get the appropriate data through using the mesh_comm_readcalls. Therefore, at t = 5.0 and t = 6.0 the bus access thread utilizes thebus to grab the necessary data. Remember that we set the consume value ofa bus access read to be 1 and that the default computational power of thebus resource is 1 as well. Therefore, in the absence of contention (as is thecase here), it takes two simulation time units for worker0 to get both piecesof data it needs. At t = 7.0 the worker0 executes on resource1. However,because the bus read was of type BLOCK RESOURCE, no other task couldhappen on resource1 while data was being fetched.

At t = 20.0 we have a case where both worker1 and worker2 (in darkand light green, respectively) attempt to run on separate processors. Theysimultaneously attempt to access the bus. Remember that we set thebus arbiter scheduler’s behavior to be mesh_scheduler_rr, i.e. round robin.At t = 20.0, bus arbiter will arbitrarily allow worker2 to access the bus first.However, at t = 21.0, the round robin algorithm will give access to worker1.At t = 22.0, worker2 will get a chance to get its second data value. Notethat once this is done, worker2 is free to execute on resource2 at t = 23.0.The worker1 is delayed a cycle by its second bus access at t = 23.0 and getsto execute on resource1 at t = 24.0. This scenario illustrates how contentionfor a shared resource (a bus and memory in this case) as well as arbitrationstrategy can be modeled within the MESH simulation.

2.5.3 Changing Blocking Modes

As mentioned in the previous section, one of the arguments to the mesh_comm_read and mesh_comm_write calls is the type of blocking mode. In theprevious example we used the BLOCK RESOURCE mode which meantthat no other work can go on the processor while a thread is waiting for acommunication event to finish. However, there are some architectures that,in the presence of multiple threads, allow the processor to perform a quickcontext switch to another thread if the current thread is blocked waiting ondata. This type of behavior can be modeled using the BLOCK THREADblocking mode. When using this mode, the scheduler will place anotherthread on a processor if it finds that the thread calling mesh_comm_read isblocked. We will take the previous example (matrix_mul12.c) and replacethe blocking modes on all reads and writes to the bus. Figure 2.29 showsthe output of the new simulation (in matrix_mul13.c).

It is not unexpected to see better performance from a system that is capa-ble of context switching when threads are blocked. In this case, the systemruntime is decreased to 65.0 from 78.0 simulation time units. Additionally,it is easy to see that the bus is much better utilized. At t = 5.0 behavior ofthe system is much the same, since not enough threads have been startedfor the scheduler to context switch the worker0 thread away from resource1.However, at t = 10.0, worker0 is placed on resource2. After blocking on abus read, the worker0 thread is removed and the third iteration of the bossthread is run on resource2. Similar situations happen throughout the sys-

2.6 Conclusion 51

Figure 2.29: System simulation implementing the BLOCK THREAD mode.(matrix mul13.c)

tem runtime, increasing performance significantly. Of course, this examplelacks in realism a bit, since context switches happen in 0 time. However, ifwe included an appropriate overhead for scheduling and context switchingwithin our scheduler implementation, this model could serve as an interest-ing platform to study the tradeoffs when and how to context switch blockedthreads.

Another possible blocking mode is lack of blocking at all. In other words, theexecution thread might perform a read or a write, but does not depend onthe data from that transaction. Thus, from the thread’s point of view, theread or write do not introduce any additional delay. We will make one moreadjustment to matrix_mul13.c file by replacing all BLOCK RESOURCEinstances with NO BLOCKING blocking mode, creating matrix_mul14.

As seen in Figure 2.30, the worker threads do not need to wait for busaccesses anymore. In this example, the NO BLOCKING blocking modedoes not make much sense; the purpose of bus accesses is to gather essentialdata. However, imagine a system where threads are expected to periodicallybroadcast their status to others without having to wait for an acknowledg-ment. In that case, the thread can place a message on the bus and continueto execute.

2.6 Conclusion

We’ve covered the basics of MESH model creation using the matrix multi-plication example. Even though this example serves well as an introductionto MESH, it is not a good example of a typical MESH application becauseit is simply too small. Example such as this one can be quickly simulatedon an instruction set simulator with greater accuracy than MESH brings.True uses for MESH would include architectures with tens or even hundredsof heterogeneous processors, each running a set of complicated media, secu-

52 2 Tutorial

Figure 2.30: System simulation implementing the NO BLOCKING mode.(matrix mul14.c)

rity or consumer applications. Impact of application parallelization, variousscheduling decisions, and reorganization of hardware would be difficult todetermine due to the complex set of interactions exhibited in the system.MESH is a tool designed to mimic these system interactions at a low simu-lation cost and help the designer make sense of them.

One of the important features of MESH not mentioned in this tutorial isthe ability to generate simulation models without the full source code of theapplication. For example, it was not necessary to actually compute the ma-trix multiplication in this tutorial as long as the designer was aware of howthe application is split up into threads and which threads depend on eachother. Similarly, a designer that is familiar with an application, but doesnot have the source code can quickly “sketch out” an application descriptionusing control flow (for, while loops and if statements) with consume callsinserted instead of the code. This enables the designer to consider somehigh-level design decisions in the absence of the full source code for thesoftware functionality while later refining the model and gradually addingdetail.

As mentioned above, this tutorial presents only a brief taste of the fullfeature set of the MESH simulator. It is meant to familiarize the designerwith the performance simulation tool. Current research is being done ondevelopment of various model parameters (consume calls, resource models,contention resolution models) as well as novel design methodologies. Forfurther information, please make use of the MESH API Reference and other

2.6 Conclusion 53

documentation included with the simulator, as well as the project webpageat http://www.ece.cmu.edu/~mesh.

3 The MESH Viewer

The MESH viewer is a Java application that allows the designer to viewthe simulation traces extracted from the MESH simulation. Previously,this function was accomplished by exporting the simulation traces in Scal-able Vector Graphics (SVG) format, to be viewed by any of the availableSVG viewers. Because SVG format is not custom to this application, thisapproach resulted in large simulation output files as well as slow viewingperformance. We hope to slowly move away from SVG type output to acustom trace file format that is viewable by the MESH Viewer. As theMESH Viewer becomes more full featured and powerful, we plan to dropthe SVG support in future versions of MESH.

3.1 Using the MESH Viewer

In order to use the MESH Viewer, data traces must be extracted fromthe simulation. This is accomplished through two MESH API calls, mesh_trace_init() and mesh_trace_print(). The example below illustratesmesh trace init()

details on pg. 96mesh trace print()

details on pg. 96

the proper way to extract simulation traces from the MESH simulation:

mesh_init();

//define resources, schedulers, and threads

mesh_trace_init(0,2500);mesh_kernel();mesh_trace_print("output.trace");

As seen above, mesh_init() must be specified before any of the trace APIfunctions are used. The mesh_trace_init() takes as arguments start andstop timestamps, determining for which time simulation period the tracewill be collected. In this example, any data executed between t = 0 andt = 2500 will be collected. Because of a potentially huge amount of data forlarge simulations, it is important to specify trace collection limits that arewithin reason.

After mesh_trace_init() the simulation is executed via the mesh_kernel()call. After the simulation terminates, mesh_trace_print() outputs the col-lected trace to a file, in this case output.trace.

To run the MESH viewer run the viewer script, passing the trace file to it:

viewer output.trace

To make sure that the viewer script is set up correctly, see the MESHViewer installation instructions (Section 1.3).

Figure 3.1 shows the main screen of the MESH viewer. On the left, in aresizable pane, is a list of resources within the simulation. To its right is the

54

3.2 Viewer Features 55

Figure 3.1: MESH Viewer

main canvas of the viewer showing the trace data from the simulation. Eachindividual annotation area (area between consume calls) is represented viaa box on the canvas, and is color coded according to the thread it belongsto. Below the main canvas is a scale slider that adjusts the zoom level ofthe viewer. In Figure 3.1 the slider is set on 10( − 1) which means thatevery value on the time axis above the main canvas should be divided by10 to get the actual simulation time. Each individual consume call box canbe selected by clicking on it, displaying its detailed information in the areabelow the scale slider.

3.2 Viewer Features

This feature list discusses additional features of the MESH viewer not men-tioned in the previous section. Since this is a work in progress, this list willchange with subsequent releases of the MESH Viewer.

• File → ImportColorTable. By specifying a tab delimited text file, itis possible to specify a custom color table, relating individual threadnames to colors. The file should have a .ctb extension. Here’s anexample of the color table file:

//commentthread1 redthread2 violet//thead colors can be specified via R,G,B format as well

56 3 The MESH Viewer

thread3 250,0,250

Available preset colors include: red, royalblue, green, tomato, green,wheat, yellowgreen, lightpink, azure, darkolivegreen, indigo, magenta,brown, palevioletred, olive, violet, midnightblue, slategray, plum, salmon,lawngreen

• File → ExporttoPNG Exports the entire main canvas in a PNGimage format. Incomplete feature, resource labels and timeline is notdrawn.

• File → ExporttoEPSExports the entire main canvas in an EPS vec-tor graphics format. Incomplete feature, resource labels and timelineis not drawn.

• V iew → ColorbyConsumeLabels When selected, the consume blocksin the main canvas are not colored by threads, but instead by consumelabels. Consume labels can be inserted through mesh_consume_str,using the name construct. For more info, see the description of mesh_consume_str on page 74

3.3 Known Issues

Since this is a work in progress, this list will change with subsequent releasesof the MESH Viewer.

• On Linux (and possibly other platforms), the scroll bars for the maincanvas will sometimes fail to show, especially after the scale slider barhas been moved. To get the scroll bars to show again, resize the MESHViewer window by dragging on the corner of it.

• The memory limit on the Java VM running the Viewer will limit thenumber of consume calls that can be included and viewed within thesimulation trace. In our tests, about 100MB was necessary to view200,000 consume blocks. Within the distribution, the JVM memorylimit is set to 256MB. It is possible to change this limit (see the viewerscript) by adjusting the -Xmx256M tag when starting the application.

4 API Reference

Brief overview of what functions are listed where and in what order

• mesh kernel.h

– g slist split

– g slist vplit

– mesh cleanup

– mesh create feature list

– mesh create interrupt

– mesh create resource

– mesh create scheduler

– mesh create shared resource

– mesh create shared scheduler

– mesh create thread

– mesh create thread delayed

– mesh current thread

– mesh define thread

– mesh execute at commit

– mesh execute at finish

– mesh execute at interrupt

– mesh execute at squash

– mesh feature add

– mesh get feature index by name

– mesh get scheduler by name

– mesh get thread by name

– mesh getTime

– mesh init

– mesh kernel

– mesh kernel thread

– mesh label watch

– mesh print blocked threads

– mesh print not completed threads

57

58 4 API Reference

– mesh raise interrupt

– mesh scheduler is eligible

– mesh set error

– mesh set interrupt controller

– mesh set label watch output

– mesh squash thread

– mesh start thread

– mesh start thread delayed

• mesh syscalls.h

– mesh consume

– mesh consume str

– mesh consume str no sched

– mesh consume str and exit

– mesh exit

– mesh free delayed

– mesh lightweight autocommit

– mesh lightweight consume

– mesh lightweight consume str

– mesh memcpy delayed

– mesh resource create entry

– mesh resource get entry

– mesh resource set entry

– mesh scheduler create entry

– mesh scheduler get entry

– mesh scheduler set entry

– mesh thread cond broadcast

– mesh thread cond destroy

– mesh thread cond init

– mesh thread cond signal

– mesh thread cond wait

– mesh thread cond timedwait

– mesh thread create entry

– mesh thread get and clear counter

– mesh thread get entry

– mesh thread has entry

59

– mesh thread inc counter

– mesh thread join

– mesh thread mutex destroy

– mesh thread mutex init

– mesh thread mutex lock

– mesh thread mutex trylock

– mesh thread mutex unlock

– mesh thread sem destroy

– mesh thread sem getvalue

– mesh thread sem init

– mesh thread sem wait

– mesh thread sem trywait

– mesh thread sem post

– mesh thread sem post delayed

– mesh thread set entry

– mesh yield

• mesh comm.h

– mesh comm read

– mesh comm read delayed

– mesh comm write

– mesh comm write delayed

– mesh create comm thread

• mesh testbench.h

– mesh create tb thread

– mesh tb wait for

• mesh utils.h

– mesh depend

– mesh get live threads

– mesh fifo init

– mesh fifo insert

– mesh fifo num elements

– mesh fifo remove

• mesh def interrupts.h

– mesh default interrupt controller

• mesh def resources.h

60 4 API Reference

– mesh contention resolution default

– mesh resource default

• mesh def schedulers.h

– mesh sched find idle resources

– mesh sched get eligible thread by name

– mesh scheduler rr

– mesh shared scheduler default

• mesh trace.h

– mesh enable trace collection

– mesh ignore trace collection

– mesh trace init

– mesh trace print

• mesh energy.h

– mesh add energy state

– mesh add energy state feature

– mesh check current energy state

– mesh check target energy state

– mesh create energy

– mesh create energy resource

– mesh find energy state by name

– mesh find energy state feature

– mesh get current energy state

– mesh get energy

– mesh get energy state list

– mesh get target energy state

– mesh print energy statistics

– mesh set current energy state

– mesh set energy

– mesh set target energy state

– mesh update energy state utilization

– mesh update energy state utilization consume

• mesh def energy resources.h

– mesh energy resource default

– mesh energy resource default power

• mesh def energy schedulers.h

61

– mesh energy sched find idle resources

– mesh energy scheduler rr

62 4 API Reference

4.1 mesh kernel.h

g slist splitvoid g slist split(GSList **src,

gint (*h)(void *data, void *arg),void *arg,int N,GSList *dest[]);

src Address of the list to split.h Partitioning function. Takes a list item and user-supplied argument; returns the index

of the sublist the item belongs in (0 . . . N − 1)arg User data argument to pass to hN Number of sublists

dest Array to place N destination lists indescription: Splits a list into N sublists using the supplied partitioning function h, storing the

resulting sublists in the destination array. The source list is destroyed in the processand set to NULL.

return value: none

g slist vsplitvoid g slist vsplit(GSList **src,

gint (*h)(void *data, void *arg),void *arg,int N,. . .);

src Address of the list to split.h Partitioning function. Takes a list item and user-supplied argument; returns the index

of the sublist the item belongs in (0 . . . N − 1)arg User data argument to pass to hN Number of sublists

. . . N GSList**description: Splits a list into N sublists using the supplied partitioning function h, storing the

resulting sublists in the destination varargs. The source list is destroyed in the processand set to NULL.

return value: none

4.1 mesh kernel.h 63

mesh cleanupvoid mesh cleanup();

description: Frees up all remaining data structures associated with MESH. Useful after executingmesh kernel.

mesh init();....mesh kernel();mesh cleanup();

return value: nonesee also: mesh kernel(), mesh init()

mesh create feature listmesh feature list* mesh create feature list();

description: Creates an empty feature list to be used when creating new resources.return value: Pointer to new feature list

see also: mesh create scheduler(), mesh create thread()

mesh create interrupt)

void mesh create interrupt(mesh resource *cr,gchar *name,gboolean maskable,void *(*isr)(void *),void *arg);

cr Resource to create interrupt onname Interrupt name

maskable Maskable status of interrupt (Not yet supportedisr Interrupt service routine. Supply NULL if none requiredarg Argument passed to ISR

description: Creates an interrupt on a resource. The resource must already have an interruptcontroller.

return value: Nothingsee also: mesh set interrupt controller(), mesh raise interrupt()

64 4 API Reference

mesh create resourcemesh resource* mesh create resource(gchar* name,

mesh feature list* cfl,mesh scheduler* scheduler,(*timing resolution)(mesh resource *, GSList *)double default power

name Name of resourcecfl List of consume features that the resource accepts

scheduler Scheduler tied to the resource(*timing resolution) Function pointer to the function that implements the resolution of consume fea-

tures onto physical timing for this particular resource. Should pass a pointer to theresource executing the consume call, and a pointer to the list of consume call featurearrays.

default power Computational power associated with the default featuredescription: Creates a resource with a given name, accepted feature list of multi-dimensional

consumes, a controlling scheduler, and a function that implements the resourcebehavior.

mesh resource *cr;cr = mesh create resource(“test resource”, mesh scheduler rr,NULL, mesh resource default, 1.0);

return value: Pointer to new resourcesee also: mesh create scheduler(), mesh create thread()

mesh create schedulermesh scheduler* mesh create scheduler(gchar* name,

GSList* (*exec scheduler)(mesh scheduler *));name Name of the scheduler

(*exec scheduler) Function pointer to the function that implements the scheduler functionality. Acceptsa pointer to the scheduler and returns a list of thread/resource pairs to be run.

description: Creates an execution scheduler and sets the scheduler functionality.

mesh scheduler *cs;cs = mesh create scheduler(“test scheduler”,mesh scheduler rr);

return value: Pointer to new schedulersee also: mesh create shared scheduler()


mesh create shared resourcemesh resource* mesh create shared resource(gchar* name,

GSList feature list,mesh scheduler* scheduler,double (*timing resolution)(mesh resource *,double *)void (*contention)(mesh resource *,GSList *,mesh time)double default power);

name Name of resourcefeature list List of consume features that the resource accepts


tures onto physical timing for this particular resource. Should pass a pointer to theresource executing the consume call, pointer to the consume data structure and thecurrent simulation time.

(*contention) Function that determines how much of a penalty should be applied given variousphysical usage times and a timeslice duration.

default power Computational power associated with the default featuredescription: Creates a shared resource with a given name, accepted feature list of multi-

dimensional consumes, a controlling scheduler, and a function that implements theresource behavior. In addition, shared resources require a contention resolutionfunction.

mesh resource *cr;cr = mesh create shared resource(“test resource”, mesh scheduler rr,NULL, mesh resource default, mesh contention resolution default, 1.0);

return value: Pointer to new resourcesee also: mesh create scheduler(), mesh create thread()

mesh create shared schedulermesh scheduler* mesh create shared scheduler(gchar* name,

void (*exec shared)(GSList *));name Name of the scheduler

(*exec shared) Function pointer to the function that implements the shared scheduler functionality.Accepts a list of shared resource accesses during the current time period.

description: Creates an shared resource scheduler and sets the scheduler functionality.return value: Pointer to new scheduler

see also: mesh create scheduler()

66 4 API Reference

mesh create threadmesh thread* mesh create thread(gchar* name,

mesh scheduler *exec scheduler,void (*thread)()void *arg);

name Name of the thread*exec scheduler Execution scheduler that handles the consume calls of this thread.

(*thread) Function pointer to the function that implements the thread functionality.arg Argument to pass to the new thread

description: Creates an execution thread and starts it immediately.return value: Pointer to new thread

see also: mesh define thread(), mesh start thread()

mesh create thread delayedmesh thread* mesh create thread delayed(gchar* name,




description: Creates an execution thread. The creation of the execution thread is delayed untilthe end of the next consume call.

return value: Pointer to new threadsee also: mesh define thread(), mesh start thread delayed()

mesh current threadmesh thread *mesh current thread();

description: Retrieves the mesh thread associated with the current contextreturn value: Current mesh thread

see also: mesh kernel thread()


mesh define threadmesh thread* mesh define thread(gchar* name,




description: Creates an execution thread but does not make it eligible for execution. To start thethread, use mesh start thread() and mesh start thread delayed()

return value: Pointer to new threadsee also: mesh start thread(), mesh start thread delayed()

mesh execute at commitvoid mesh execute at commit(void (*function)(mesh consume data *));

(*function) Function pointer to the function to be executed after each consume block is commited.description: Allows the user to add functionality to be executed after each consume block is

committed. This is a very powerful extension allowing the user to extend theMESH simulator by adding additional functionality at every simulation time step.The added functionality has access to the mesh consume data structure that con-tains all the information about the consume block currently committed. Themesh execute at commit() can be run multiple times by appending more than onefunction to be executed.

mesh execute at finishvoid mesh execute at finish(void (*function)(mesh resource *, mesh thread *));

(*function) Function pointer to the function to be executed when a thread finishesdescription: Allows the user to add functionality to be executed when a thread finishes (there

are no more executable statements in the thread) in its execution. The added func-tionality has access to the thread that finished and the resource it executed on. Themesh execute at finish() can be run multiple times by calling this function more thanonce with different function pointers.

mesh execute at squashvoid mesh execute at squash(void (*function)(mesh resource *, mesh thread *));

(*function) Function pointer to the function to be executed when a thread squashesdescription: Allows the user to add functionality to be executed when the simulator squashes a

thread (it will be automatically restarted). The added functionality has access to thethread that squashed and the resource it executed on. The mesh execute at squash()can be run multiple times by calling this function more than once with differentfunction pointers.

68 4 API Reference

mesh execute at stallvoid mesh execute at stall(void (*function)(mesh resource *, mesh thread *));

(*function) Function pointer to the function to be executed when a thread stallsdescription: Allows the user to add functionality to be executed when a thread stalls in its exe-

cution e.g. due to synchronization. The added functionality has access to the threadthat stalled and the resource it executed on. The mesh execute at stall() can be runmultiple times by calling this function more than once with different function pointers.

mesh feature addmesh feature list * mesh feature add(mesh feature list *cfl,

gchar *feature,double power);

*cfl Feature list to add to*feature Name of feature to add

power Computational power to associate with this featuredescription: Adds a named feature to the given feature list. A computation power is associated

with the feature at add-time. This feature list should be used to describe a physicalresource.

return value: New feature list with feature addedsee also: mesh create feature list(), mesh create resource()

mesh get feature index by namegint mesh get feature index by name(gchar* name);

name Name of featuredescription: Searches the global feature list and returns the index into the list that corresponds

to the feature name passed as an argument.return value: Integer index into a global feature list

mesh get scheduler by namemesh scheduler* mesh get scheduler by name(gchar* name);

name Name of the schedulerdescription: Finds the scheduler by its name and returns the pointer to it.

return value: Pointer to new scheduler, NULL if not foundsee also: mesh get thread by name()


mesh get thread by namemesh thread* mesh get thread by name(gchar* name);

name Name of the threaddescription: Finds the thread by its name and returns the pointer to it.

return value: Pointer to new thread, NULL if not foundsee also: mesh get scheduler by name()

mesh getTimemesh time mesh getTime()

description: Finds the simulation time at the beginning of the currently executing consume block.return value: Begin time of currently executing consume block.

mesh initvoid mesh init(gint verbose);

verbose Verbosity of simulator, 0 means no extra messages, 1 displays large volumes of infor-mation.

description: Initializes data structures for the MESH simulation. This function must be calledbefore any other MESH functions.

see also: mesh kernel(), mesh cleanup()

mesh kernelvoid mesh kernel()

description: Executes the MESH simulation. This function will not return until all threads ofexecution are done, or the simulation deadlocks.

see also: mesh init(), mesh cleanup()

mesh kernel threadvoid mesh kernel thread()

description: Retrieves the mesh thread associated with the MESH kernel.return value: The kernel thread

see also: mesh current thread()

mesh label watchvoid mesh label watch(gchar *name);

name Name of consume annotation label to add to the watch listdescription: Adds a consume label to a list to watch during the simulation execution. When

a consume call with a label on the watch list is encountered, the timestamp andexecution resource are printed to either a text file or STDOUT. Text file name canbe specified with mesh set label watch output().

see also: mesh set label watch output()

70 4 API Reference

mesh print blocked threads. void mesh print blocked threads(gboolean flag);

flag True or falseprecondition: Must be called before mesh kernel()description: Sets whether a list of blocked threads will be printed out at the end of the simulation.

This function is useful for finding deadlocked threads. It is set TRUE by default.see also: mesh print not completed threads()

mesh print not completed threads. void mesh print not completed threads(gboolean flag);

flag True or falseprecondition: Must be called before mesh kernel()description: Sets whether a list of threads that have not completed execution by the end of simu-

lation will be printed. Not completed thread is defined as a thread that never returnsfrom the function pointer provided to it in mesh create thread(). This function isuseful to identify whether the simulation has ended sooner than expected (perhapsas a result of a deadlock). It is set FALSE by default.

see also: mesh print blocked threads()

mesh raise interruptvoid mesh raise interrupt(mesh resource *cr,

gchar *name);

cr Resource to raise interrupt onname Interrupt to raise

description: Raises an interrupt on a resource. It will fire as soon as the interrupt controller allowsit to.

see also: mesh create interrupt()

mesh scheduler is eligiblevoid mesh scheduler is eligible(mesh scheduler *cs);

cs Scheduler to rundescription: Ensures that a scheduler is on the kernel “todo list.” It will be called as long as there

is at least one idle resource and one eligible thread available


mesh set errorvoid mesh set error(mesh time error)

error Maximum distance between consume events to be combined.description: Sets the error term to be used when creating timeslices in shared resource simulations.

It determines how close consume call end-times can be and still be considered as onetimeslice.

mesh set interrupt controllervoid mesh set interrupt controller(mesh resource *cr,

char *(*controller)(void *arg, GSList *pending),void *arg);

cr Resource to set interrupt controller oncontroller Interrupt controller to use

arg User data arg to pass to the controllerdescription: Sets an interrupt controller on a resource, allowing the creation and raising of inter-

rupts. By default a resource has no interrupt controller (also achievable by passing aNULL controller arg) and will abort simulation if interrupts are used.

see also: mesh default interrupt controller

mesh set label watch outputvoid mesh set label watch output(gchar *filename);

filename Name of file to redirect label watch outputdescription: Sets the filename to output the label watch information. If this filename is left at

NULL or the function not called at all, label watch information will be printed toSTDOUT.

see also: mesh label watch()

mesh squash threadvoid mesh squash thread(mesh thread *ct);

ct Thread to squashdescription: Immediately aborts and restarts a thread. All threads this one is joined on will also

be squashed (ie comm call threads). Threads joined on this one will remain joinedon the restarted thread.

see also: mesh execute at squash()

72 4 API Reference

mesh start threadvoid mesh start thread(mesh thread *thread);

thread Thread to make eligible for execution.precondition: Thread must be defined using mesh define thread()description: Once a thread is defined via mesh define thread(), this function will make the thread

eligible for execution immediately.see also: mesh define thread(), mesh start thread delayed()

mesh start thread delayedvoid mesh start thread delayed(mesh thread *thread);

thread Thread to make eligible for execution.precondition: Thread must be defined using mesh define thread(). Must only be used within an exe-

cution or testbench thread (i.e. any thread where it’s legal to use mesh consume str()or mesh wait for()

description: Once a thread is defined via mesh define thread(), this function will make the threadeligible for execution after the consume call is completed.

see also: mesh define thread(), mesh start thread()

4.2 mesh syscalls.h 73

4.2 mesh syscalls.h

mesh consumevoid mesh consume(double cost,

int num mem pairs,. . .);

cost Complexity supplied to the default feature of the execution schedulernum mem pairs TODO: Alex

. . . TODO: Alexdescription: Shortcut method for the common case of applying complexity to the default feature

of the execution scheduler.see also: mesh consume str()

74 4 API Reference

mesh consume str.

void mesh consume str(gchar *str,...)

str printf style format stringprecondition: Must be run inside of a regular functionality thread. Cannot run inside of a testbench

threaddescription: Annotates the complexity of software regions. Insert calls to this function throughout

your application code to allow the simulation framework to determine physical timingof the code. The string you pass to this function specifies which schedulers andwhich features data is passed to.

In its most basic form the string can be a single number, i.e. “10.1”. This denotes acomputational complexity of 10.1 directed to the execution scheduler and the defaultfeature. Furthurmore, alternate schedulers can be specified by listing their name andconsumed complexity inside curly braces.

For instance, if in the above some shared resource access occurred as well, the stringcould be changed to “10.1:sched1{20}” to denote 10.1 from the execution scheduler,and 20 from the shared scheduler sched1.

Each of these schedulers may also be passed one or more features. This looks like:“10.1:feat1=5:sched1{20:shared feat1=10}”.

Typically, if two resources do not have the same set of features, any complex-ity values for features not present on a resource will be ignored. When this is notthe desired behavior, brackets may be used to distinguish between different wayscomplexity may be executed on resources with different sets of features. For example,“[10.1:add=1:mul=1]:[10.1:mac=1]” provides the timing resolution function with twofeature lists to select from, one that uses the add and mul features, while the otheruses the mac feature.

Since the string is a printf style, these numbers can be inserted programmati-cally if necessary. Often, this function is wrapped in a macro on a per file basis.That way, the macro needs only accept a single number, yet can subsitute it in partof a larger string describing what is being consumed.

It is possible to attach a label to each individual consume call by using areserved scheduler name and placing a label string inside.Examples:mesh consume str(“%i”,10); // consume 10 from the execution schedulermesh consume str(“sched3{feat1=10}”); // consume 10 from feat1 in sched3mesh consume str(“10:name{consume label}”); //simple consume 10 with a label

see also: mesh consume str no sched


mesh sched consume str no sched.

void mesh sched consume str no sched(gchar *str,...)


threaddescription: Same behavior as mesh consume str() except that the scheduler is not given an op-

portunity to run. Therefore, the thread calling mesh consume str no sched() willcontinue to execute on the same resource. Useful when consume calls are necessaryfor simulation detail but the thread should not be preempted.

see also: mesh consume str

mesh sched consume str and exitvoid mesh sched consume str and exit(gchar *str,

...)str printf style format string

description: Same behavior as mesh consume str() except that the thread terminates immediatelyafter the consume. Usually, the thread will not terminate until it is woken up againto ensure that all consume calls have been run. This may or may not happen at thesame simulation time as the last consume. By using mesh consume str and exit, thedesigner ensures that the thread termination happens immediately after the endingof the last consume call. This is very useful if the thread termination is expected totrigger system wide events.

see also: mesh consume str

mesh sched consume strvoid mesh sched consume str(mesh resource *cr,

gchar *str,...)

cr Resource to apply consume call tostr printf style format string

description: Annotates the complexity of software regions specifically within schedulers. Thus, thecomputational complexity of scheduling strategies can be determined in order to as-sign scheduling overhead. Its usage is identical to mesh consume str, except that theresource performing the scheduling is also specified. Should be used only inside sched-ulers. For example implementation see the code of mesh scheduler rr w overhead.

see also: mesh consume str, mesh scheduler rr w overhead

mesh exitvoid mesh exit()

description: Terminates execution of the simulation.see also: mesh create thread()

76 4 API Reference

mesh free delayedvoid mesh free delayed(void *ptr);

ptr Pointer to freedescription: Frees a pointer after the current consume call is committed. Useful for freeing data

accessed by both application threads and schedulers, since pre-emption can cause thelast consume call of a thread to commit arbitrarily long (and scheduling to occurarbitrarily many times) after the thread finishes executing.

mesh lightweight autocommitvoid mesh lightweight autocommit(int mesh autocommit mode,

int threshold);threshold Number of lightweight consumes to trigger an autocommit

description: Causes lightweight consumes to be automatically committed when the size of a con-sume trace size reaches the threshold value, preventing excessive memory use. Bydefault the threshold is set to a value that typically performs well. Use a negativethreshold to restore the default or a zero threshold to disable autocommits completely.When mesh autocommit mode is set to NUMBER OF CONSUMES, threshold valuecontrols number of LW consumes that will be included in this consume call. In theCOMPLEXITY THRESHOLD mode, consume will be finished once a certain com-plexity of the default feature is reached.

see also: mesh lightweight consume()

mesh lightweight consume.

void mesh lightweight consume(double cost,int num mem pairs,. . .);

cost Complexity supplied to the default feature of the execution schedulernum mem pairs TODO: Alex

. . . TODO: Alexprecondition: Must be run inside of a regular functionality thread. Cannot run inside of a testbench

threaddescription: The lightweight version of mesh consume. Shortcut method for the common case of

requesting utilization of the default feature.see also: mesh lightweight consume str(), mesh consume()


mesh lightweight consume str.

void mesh lightweight consume str(gchar *str,...)


threaddescription: Identical in functionality to mesh consume str() but does not return control to the

MESH kernel. Lightweight consume calls should be used whenever possible becausethey drastically reduce simulation overhead without impacting accuracy. Explicitscheduling points can be marked using mesh consume str() or mesh yield(), or au-tomatically inserted at regular intervals by calling mesh lightweight autocommit()before simulation begins.

see also: mesh consume str(), mesh yield(), mesh lighweight autocommit()

mesh memcpy delayedvoid* mesh memcpy delayed(void *dest,

const void *src,size t n)

dest Destination pointer to copy value tosrc Source pointer for the value to be pointedn Number of bytes to copy

description: Copies memory from the source pointer into the destination pointer. The copy oper-ation is delayed until immediately after the next consume call. This function is usefulfor implementing double buffering behavior.

return value: Original value of dest

mesh resource create entryvoid mesh resource create entry(mesh resource *cr,

gchar *key,void *value);

cr Resource to create entry forkey Entry key

value Entry starting valueprecondition: The key must not already be in usedescription: Associates a keyvalue pair with a resource. Useful for storing resourcelocal data

without having to change its type definition.see also: mesh resource get entry(), mesh resource set entry()

78 4 API Reference

mesh resource get entry.

void *mesh resource get entry(mesh resource *cr,gchar *key);

cr Resource to get entry forkey Entry’s key

precondition: Entry must have been previously created by calling mesh resource create entry()description: Retrives the current value of the specified resource entry.

return value: The entry’s valuesee also: mesh resource create entry(), mesh resource set entry()

mesh resource set entry.

void *mesh resource set entry(mesh resource *cr,gchar *key,void *value);

cr Resource to set entry forkey Entry’s key

value Entry’s new valueprecondition: Entry must have been previously created by calling mesh resource create entry()description: Changes the current value of the specified resource entry.

see also: mesh resource create entry(), mesh resource get entry()

mesh scheduler create entryvoid mesh scheduler create entry(mesh scheduler *cr,


cr Scheduler to create entry forkey Entry’s key

value Entry’s starting valueprecondition: The key must not already be in usedescription: Associates a key-value pair with a scheduler. Useful for storing scheduler-local data

without having to change its type definition.see also: mesh scheduler get entry(), mesh scheduler set entry()


mesh scheduler get entry.

void *mesh scheduler get entry(mesh scheduler *cr,gchar *key);

cr Scheduler to get entry forkey Entry’s key

precondition: Entry must have been previously created by calling mesh scheduler create entry()description: Retrives the current value of the specified scheduler entry.

return value: The entry’s valuesee also: mesh scheduler create entry(), mesh scheduler set entry()

mesh scheduler set entry.

void *mesh scheduler set entry(mesh scheduler *cr,gchar *key,void *value);

cr Scheduler to set entry forkey Entry’s key

value Entry’s new valueprecondition: Entry must have been previously created by calling mesh scheduler create entry()description: Changes the current value of the specified scheduler entry.

see also: mesh scheduler create entry(), mesh scheduler get entry()

mesh thread cond broadcastgint mesh thread cond broadcast(mesh thread cond *cond)

cond Condition variable to operate ondescription: Restarts all threads waiting on the condition variable cond. Nothing happens if no

threads are waiting on cond.return value: 0 for success, non-zero for error

see also: pthread cond broadcast()

mesh thread cond broadcast delayedgint mesh thread cond broadcast delayed(mesh thread cond *cond)

cond Condition variable to operate ondescription: Restarts all threads waiting on the condition variable cond at the end of the current

consume region. Nothing happens if no threads are waiting on cond.return value: 0 for success, non-zero for error

see also: pthread cond broadcast()

80 4 API Reference

mesh thread cond destroygint mesh thread cond destroy(mesh thread cond *cond)

cond Condition variable to operate ondescription: Destroys the condition variable cond, freeing the resources it holds. No threads must

be waiting on the condition on entrance.return value: 0 for success, non-zero for error

see also: pthread cond destroy()

mesh thread cond initgint mesh thread cond init(mesh thread cond *cond,

void *attr)cond Condition variable to createattr Attributes, currently ignored

description: Initializes the condition variable cond.return value: 0 for success, non-zero for error

see also: pthread cond init()

mesh thread cond signalgint mesh thread cond signal(mesh thread cond *cond)

cond Condition variable to operate ondescription: Restarts one of the threads that are waiting on the condition variable cond.

return value: 0 for success, non-zero for errorsee also: pthread cond signal()

mesh thread cond signal delayedgint mesh thread cond signal delayed(mesh thread cond *cond)

cond Condition variable to operate ondescription: Restarts one of the threads that are waiting on the condition variable cond at the

end of the current consume region.return value: 0 for success, non-zero for error

see also: pthread cond signal()


mesh thread cond waitgint mesh thread cond wait(mesh thread cond *cond,

mesh thread mutex *mutex)cond Condition variable to operate onmutex Mutex variable to operate on

description: Atomically unlocks the mutex and waits for the condition variable cond tobe signaled. The mutex must be locked by the calling thread on en-trance to mesh thread cond wait. Before returning to the calling threadmesh thread cond wait re-acquires the mutex.

return value: 0 for success, non-zero for errorsee also: pthread cond wait()

mesh thread cond timedwaitgint mesh thread cond timedwait(mesh thread cond *cond,

mesh thread mutex *mutex)cond Condition variable to operate onmutex Mutex variable to operate on

description: Not implemented.see also: pthread cond timedwait()

mesh thread create entryvoid mesh thread create entry(mesh thread *cr,


cr Thread to create entry forkey Entry’s key

value Entry’s starting valueprecondition: The key must not already be in usedescription: Associates a key-value pair with a thread. Useful for storing thread-local data without

having to change its type definition.see also: mesh thread get entry(), mesh thread has entry(), mesh thread set entry()

mesh thread get and clear counterdouble mesh thread get and clear counter()

description: Return and clear the thread specific counter. This can be used to implement coarse-grained consumes.

return value: Current contents of thread specific countersee also: mesh thread inc counter()

82 4 API Reference

mesh thread get entry.

void *mesh thread get entry(mesh thread *cr,gchar *key);

cr Thread to get entry forkey Entry’s key

precondition: Entry must have been previously created by calling mesh thread create entry()description: Retrives the current value of the specified thread entry.

return value: The entry’s valuesee also: mesh thread create entry(), mesh thread has entry(), mesh thread set entry()

mesh thread has entryvoid *mesh thread has entry(mesh thread *cr,

gchar *key,void **value);

cr Thread to check for entrykey Entry’s key

value Pointer to hold entry (if it exists)description: Retrives the current value of the specified thread entry if it exists

return value: TRUE if the entry existssee also: mesh thread create entry(), mesh thread get entry(), mesh thread set entry()

mesh thread inc countervoid mesh thread inc(double val)

val Value to adddescription: Increment the thread specific counter by val

see also: mesh thread get and clear counter()

mesh thread joingint mesh thread join(mesh thread *ct,

void **retval)ct Thread to wait for

retval Return valuedescription: Wait until the thread identified by ct terminates. If the thread returns a value, it will

be stored at the location pointed to by retval.return value: 0 for success, non-zero for failure

see also: pthread join()


mesh thread sem destroygint mesh thread sem destroy(mesh thread sem *sem)

sem Semaphore to destroydescription: Destroys the semaphore specified by sem.

return value: 0 for success, non-zero for failuresee also: sem destroy()

mesh thread sem getvaluegint mesh thread sem getvalue(mesh thread sem *sem)

sem Semaphore to get the value ofdescription: Get the value of the semaphore specified by sem.

return value: 0 for success, non-zero for failuresee also: sem getvalue()

mesh thread sem initgint mesh thread sem init(mesh thread sem *sem,

int pshared,unsigned int value)

sem Semaphore to createpshared Whether the semaphore is shared across processesvalue Value to initialize the semaphore to, should be nonnegative

description: Initializes the semaphore pointed to by sem to the value in value. pshared must be 0.return value: 0 for success, non-zero for failure

see also: sem init()

mesh thread sem waitgint mesh thread sem wait(mesh thread sem *sem)

sem Semaphore to decrementdescription: Block until the semaphore value is greater than zero, then decrement it.

return value: 0 for success, non-zero for failuresee also: sem wait()

mesh thread sem trywaitgint mesh thread sem trywait(mesh thread sem *sem)

sem Semaphore to decrementdescription: Attempt to decrement the semaphore. Return -1 if the semaphore value is not greater

than zero. Return 0 if the semaphore value is greater than zero and the decrementsucceeds. This call does not block.

return value: 0 for success, non-zero for failuresee also: sem trywait()

84 4 API Reference

mesh thread sem postgint mesh thread sem post(mesh thread sem *sem)

sem Semaphore to incrementdescription: Increment the semaphore value. This should always succeed.

return value: 0 for success, non-zero for failuresee also: sem post()

mesh thread sem post delayedgint mesh thread sem post delayed(mesh thread sem *sem)

sem Semaphore to incrementdescription: Increment the semaphore value after the next consume call. This uses an implicit

increment-memcpy operation in the kernel perform delayed actions() function that isactivated when the src and dest of memcpy is equal. Since memcpy() does not supportequal src and dest addresses, this was special case was used to minimize changes inthe code.

return value: 0 for success, non-zero for failuresee also: sem post()

mesh thread mutex destroygint mesh thread mutex destroy(mesh thread mutex *mutex)

mutex Mutex to destroydescription: Destroys the mutex specified by mutex. The mutex must be unlocked on entrance.

This frees all resources associated with the mutex.return value: 0 for success, non-zero for failure

see also: pthread mutex destroy()

mesh thread mutex initgint mesh thread mutex init(mesh thread mutex *mutex,

void *attr)mutex Mutex to createattr Attributes, not used

description: Initializes the mutex pointed to by mutex.return value: 0 for success, non-zero for failure

see also: pthread mutex init()


mesh thread mutex lockgint mesh thread mutex lock(mesh thread mutex *mutex)

mutex Mutex to lockdescription: Locks the given mutex. If the mutex is currently locked mesh thread mutex lock

returns immediately. Otherwise it blocks until the mutex is released.return value: 0 for success, non-zero for failure

see also: pthread mutex lock()

mesh thread mutex trylockgint mesh thread mutex trylock(mesh thread mutex *mutex)

mutex Mutex to attempt to lockdescription: Behaves identically to mesh thread mutex lock except that it does not block if the

mutex is already locked. Instead it returns immediately with an error code.return value: 0 for success, non-zero for failure

see also: pthread mutex trylock(), mesh thread mutex lock()

mesh thread mutex unlockgint mesh thread mutex unlock(mesh thread mutex *mutex)

mutex Mutex to unlockdescription: Unlocks the given mutex. Resumes any other logical threads which are blocked on

this mutex.return value: 0 for success, non-zero for failure

see also: pthread mutex unlock(), mesh thread mutex lock()

mesh thread mutex unlock delayedgint mesh thread mutex unlock delayed(mesh thread mutex *mutex)

mutex Mutex to unlockdescription: Unlocks the given mutex at the end of this consume region. Resumes any other logical

threads which are blocked on this mutex.return value: 0 for success, non-zero for failure

see also: pthread mutex unlock(), mesh thread mutex lock()

86 4 API Reference

mesh thread set entry.

void *mesh thread set entry(mesh thread *cr,gchar *key,void *value);

cr Thread to set entry forkey Entry’s key

value Entry’s new valueprecondition: Entry must have been previously created by calling mesh thread create entry()description: Changes the current value of the specified thread entry.

see also: mesh thread create entry(), mesh thread get entry()

mesh yieldvoid mesh yield();

description: Immediately returns control to the scheduler committing any outstanding lightweightconsumes in the process. Useful for forcing the scheduler to run after creating newthreads, etc.

see also: mesh lightweight consume()

4.3 mesh comm.h 87

4.3 mesh comm.h

mesh comm readvoid mesh comm read(gchar *comm thread,

void* arg,mesh shared data *msd,gint blocking mode);

comm thread Name of the communication thread that will model the functionality of this read callarg An optional argument to the communication threadmsd A shared data entity that allows data to be passed through mesh comm read or

mesh comm write callsblocking mode Defines what happens to the calling execution thread while a comm thread is execut-

ing. BLOCK RESOURCE – the resource running the execution thread will not beable to run until the comm thread has completed. BLOCK THREAD – the executionthread must wait for the comm thread, but the resource does not (i.e. resource cancontext switch to another thread while waiting for communication). NO BLOCKING– the execution thread is allowed to run without having to wait for the comm thread.

precondition: The comm thread must be created via mesh create comm thread().description: Simulates a read to the communication infrastructure. The behavior of this read

is specified within a comm thread which is identified by name. This API functionwill create an instance of a comm thread and suspend the calling thread accordingto the blocking mode specified. Depending on the comm thread specified, additionalarguments or shared data might be passed as well.

see also: mesh create comm thread(), mesh comm read delayed(), mesh comm write(),mesh comm write delayed()

mesh comm read delayedvoid mesh comm read delayed(gchar *comm thread,


description: Performs mesh comm read() after the delay of the next consume call has been applied.For description of arguments, see the entry for mesh comm read().

see also: mesh create comm thread(), mesh comm read(), mesh comm write(),mesh comm write delayed()

88 4 API Reference

mesh comm writevoid mesh comm write(gchar *comm thread,


comm thread Name of the communication thread that will model the functionality of this read callarg An optional argument to the communication threadmsd A shared data entity that allows data to be passed through mesh comm read or

mesh comm write callsblocking mode Defines what happens to the calling execution thread while a comm thread is execut-

ing. See mesh comm read() for more information.precondition: The comm thread must be created via mesh create comm thread().description: Simulates a write to the communication infrastructure. The behavior of this write

is specified within a comm thread which is identified by name. This API functionwill create an instance of a comm thread and suspend the calling thread accordingto the blocking mode specified. Depending on the comm thread specified, additionalarguments or shared data might be passed as well.

see also: mesh create comm thread(), mesh comm write delayed(), mesh comm read(),mesh comm read delayed()

mesh comm write delayedvoid mesh comm write delayed(gchar *comm thread,


description: Performs mesh comm write() after the delay of the next consume call has been ap-plied. For description of arguments, see the entry for mesh comm write().

see also: mesh create comm thread(), mesh comm read(), mesh comm write(),mesh comm read delayed()

mesh create comm threadvoid mesh create comm thread(gchar* name,

mesh scheduler *ms,void* (*thread)(void*));

name Name of the threadms Pointer to the mesh scheduler that will serve as the default scheduler for this thread

(*thread) Function pointer to the function that implements the comm thread functionality.description: Creates a communication threads. Unlike conventional execution threads, a creation

of a communication thread does not automatically start the thread and make iteligible to run. Instead, the thread is registered with the simulator and is availableto be started via mesh comm read and mesh comm write calls.

see also: mesh comm read(), mesh comm read delayed(), mesh comm write(),mesh comm write delayed()

4.4 mesh testbench.h 89

4.4 mesh testbench.h

mesh create tb threadmesh thread* mesh create tb thread(gchar* name,

void (*thread)()void *arggboolean trace visible);

name Name of the thread(*thread) Function pointer to the function that implements the testbench thread functionality.

arg Argument to pass to the testbench threadtrace visible Controls whether this thread will be visible in the trace output

description: Creates a testbench thread. Testbench threads do not use conventional consumecalls, instead they model time in pure simulation time using the mesh tb wait for()function. It is not necessary to create a resource or a scheduler for a testbench threadsince it is not part of the modeled system. Several other API functions are availablethat can run only inside a testbench thread.

return value: Pointer to the new testbench threadsee also: mesh tb wait for()

mesh tb wait forvoid mesh tb wait for(mesh time time);

time Number of simulation cycles to spend waitingprecondition: Can only be used in a testbench threaddescription: Inserts a specified delay into the runtime of the testbench implementation.

90 4 API Reference

4.5 mesh utils.h

mesh dependvoid mesh depend(GSList* in,

GSList* out,void* (*func)(void *)void *arg);

in A glib linked list of all threads that mesh depend() blocks on until completionout A glib linked list of all threads that are dependent on completion of threads in the

“in” set(*func) Function pointer to a user-specified function to be executed once the “in” set threads

have completed, but while “out” threads have not been started yetarg Argument to pass to the func thread

precondition: Threads must be defined via mesh define thread(), but not yet started viamesh start thread()

description: Makes it easy to specify a dependency of one set of threads on another set of threads.The “out” set of threads will not run until all threads in set “in” has finished. Threadsmust be defined, but not yet started. A user defined function can be specified andis executed after all “in” threads have completed, but before the “out” threads havestarted. The user should construct his dependency graph by first specifying all de-pendencies using the mesh depend(), and once all dependencies are specified, startthe tasks at the head of the graph via mesh start thread().

mesh get live threadsvoid mesh get live threads(mesh scheduler *execution scheduler);

execution scheduler A pointer to a mesh schedulerdescription: Examines the bound threads list of the provided mesh scheduler and extracts all non-

testbench threads whose mesh thread.finished flag is set to false. Note that this flagis not set until the next time the thread is scheduled AFTER the final consume callof the user function. This occurs an arbitrary time after the thread has completedthe consume call; the user thread should use a mesh consume str no sched as its finalconsume call to ensure an accurate completion time.

return value: A glib linked list of all non-testbench threads controlled by the provided schedulerthat have not finished

mesh fifo initmesh fifo* mesh fifo init(gint size);

size Number of elements the FIFO can supportdescription: Initializes a model of a FIFO that is thread safe and can be used to exchange data

between multiple communicating processes. Data can be moved to and from the FIFOusing the mesh fifo insert() and mesh fifo remove() calls.

return value: Pointer of type mesh fifo pointing to the newly created FIFO.see also: mesh fifo insert(), mesh fifo remove()

4.5 mesh utils.h 91

mesh fifo insertmesh fifo* mesh fifo insert(mesh fifo *fifo,

void *data);fifo Existing FIFO into which a new piece of data is to be inserted.data Data to insert.

precondition: mesh fifo init() must be run to create a FIFO.description: Attempts to insert a data value into the FIFO. In case the FIFO is full, this function

will block, waiting until an element is removed from the FIFO.return value: Pointer to the modified FIFO

see also: mesh fifo init()

mesh fifo num elementsint mesh fifo num elements(mesh fifo *fifo);

fifo Existing FIFOprecondition: mesh fifo init() must be run to create a FIFO.description: Returns the number of elements currently in the FIFO.

return value: Integer with the number of elements in a FIFOsee also: mesh fifo init()

mesh fifo removemesh fifo* mesh fifo remove(mesh fifo *fifo,

void **data);fifo Existing FIFO from which data is to be removed.data Extracted data.

precondition: mesh fifo init() must be run to create a FIFO.description: Attempts to remove data from the FIFO. In case the FIFO is empty, this function

will block, waiting until an element is added to the FIFO.return value: Pointer to the modified FIFO

see also: mesh fifo init()

92 4 API Reference

4.6 mesh def interrupts.h

mesh default interrupt controllerchar *mesh default interrupt controller(void *arg,

GSList *pending);

arg User-supplied argument (Ignored)pending List of pending interrupts supplied by the scheduler

description: A simple interrupt controller implementation. It simply raises the first availableinterrupt each time it is called.

see also: mesh set interrupt controller

4.7 mesh def resources.h 93

4.7 mesh def resources.h

mesh resource defaultdouble mesh resource default(mesh resource *cr,

double *feature array)

cr Resource structure this function is operating forfeature array Simple array containing one double for each feature

description: This routine provides the default resource timing resolution function. It simply takesthe requested power from the feature array, and uses division to find the requiredphysical time duration for each of these features. It then computes the total phys-ical duration by summing up those from each of the features. It can be passed tomesh create resource as the timing resolution function.

return value: Physical time durationsee also: mesh create resource(), mesh create shared resource()

mesh contention resolution defaultvoid mesh contention resolution default(mesh resource *cr,

GSlist *uncontended times,mesh time time step)

cr Resource structure this function is operating foruncontended times List of doubles, giving the uncontended physical time durations of all logical threads

accessing this resource in this timeslice.time step Pre-penalty physical timeslice duration

description: This routine is the default contention resolution function. It applies a penalty to thephyical resource by using a completely made up exponential penalty function thathas no basis in reality. The final penalty is added directly into the mesh resourcestructure.

see also: mesh create shared resource()

94 4 API Reference

4.8 mesh def schedulers.h

mesh get resource by nameGSList *mesh get resource by name(gchar *name,

GSList *list,mesh resource **resource)

name String containing the resource namelist List of mesh resources

resource Pointer to the found resourcedescription: This function takes the resource name and a list of resources as inputs and returns

the pointer to the found resource. The found resource is removed from the list.return value: A GSList of mesh resources with the found resource removed.

see also: mesh scheduler rr()

mesh sched find idle resourcesGSList *mesh sched find idle resources(mesh scheduler *cs)

cs Scheduler to search for idle resourcesdescription: Searches the given scheduler, and returns a GSList of mesh resource pointers for every

resource that is idle. The caller is responsible for freeing this list. This function isuseful when writing custom scheduler routines.

return value: A GSList of mesh resource*see also: mesh scheduler rr()

mesh sched get eligible thread by namemesh thread* mesh sched get eligible thread by name(gchar *name,

mesh scheduler *cs)name Textual name of thread to search forcs Scheduler to search for eligible thread

description: Searches the given scheduler, for an eligible thread named ’name’ and returns it’smesh thread pointer. This is useful when writing custom schedulers.

return value: mesh thread * of named thread, or NULL if not foundsee also: mesh sched find idle resources()

mesh scheduler rrGSList* mesh scheduler rr(mesh scheduler *cs)

cs Current schedulerdescription: This is the default scheduler function. It uses a round-robin algorithm to schedule all

the eligible threads onto idle resources, with a preference to using idle resources witha larger default computational power. It returns a list of thread-resource pairs whichshould be run by the kernel. This function can be passed to mesh create scheduler.

return value: GSList of mesh resource thread pair to be runsee also: mesh resource thread pair, mesh create scheduler()

4.8 mesh def schedulers.h 95

mesh scheduler rr w overheadGSList* mesh scheduler rr w overhead(mesh scheduler *cs)

cs Current schedulerdescription: Default round-robin scheduler. Keeps track of execution overhead of scheduling. If

mesh set sched resource() is set, that resource will be used to do all the scheduling.If sched resource is NULL, the scheduling decision overhead will be applied to theprocessor decision is made on.

return value: GSList of mesh resource thread pair to be runsee also: mesh resource thread pair, mesh create scheduler(), mesh set sched resource()

mesh set sched resourcemesh scheduler* mesh set sched resource(mesh scheduler *scheduler,

mesh resource *resource);scheduler Already created scheduler whose execution needs to be changedresource Pointer to the resource which will execute this scheduler

precondition: mesh create scheduler() must be run to create a scheduler firstdescription: Sets the default resource to execute scheduler on for centralized scheduling strategies.

If a centralized scheduling resource is not set, a scheduling strategy may assumedistributed scheduling.

return value: Pointer to the adjusted scheduler.

96 4 API Reference

4.9 mesh trace.h

mesh enable trace collectionvoid mesh enable trace collection(mesh resource *resource);

resource Pointer to the resource for which traces will be collectedprecondition: A valid mesh resource must be created through the use of mesh create resource()description: Will enable the collection of trace information for a specific resource. This function

can be used to start trace collection for a resource which has been disabled throughmesh ignore trace collection(). Since trace collection is turned on by default whenmesh create resource() is run, this function does not need to be used in the commoncase.

see also: mesh ignore trace collection()

mesh ignore trace collectionvoid mesh ignore trace collection(mesh resource *resource);

resource Pointer to the resource for which traces will be collectedprecondition: A valid mesh resource must be created through the use of mesh create resource()description: Will disable the collection of trace information for a specific resource. This function

is useful to ignore other resources when only a subset of resources are under exami-nation, or to ignore “phantom” resources such as ones used to run testbenches. Themesh create tb thread function uses this construct to remove itself from trace graphs.

see also: mesh enable trace collection()

mesh trace initvoid mesh trace init(mesh time start,

mesh time stop);

start Simulation time at which the trace data collection will startstop Simulation time at which the trace data collection will stop

precondition: mesh init() must be rundescription: Initializes printing out of debugging data by collecting traces of consume call blocks.

Must be called before mesh trace print() is used, but after mesh init(). Since tracedata collection can potentially be very memory intensive, mesh svg init() takes instart and stop values as arguments.

see also: mesh trace print()

mesh trace printvoid mesh trace print(gchar *filename);

filename Filename for the trace file outputprecondition: mesh trace init() and mesh kernel() must be rundescription: Prints out the debugging data as a text based trace file. Data collection must be

turned on with mesh trace init() and the simulation must be run with mesh kernel().The trace file can be viewed using the java based MESH Viewer.

see also: mesh trace init()

4.10 mesh energy.h 97

4.10 mesh energy.h

mesh add energy statemesh energy state* mesh add energy state(mesh energy *ce, gchar *name, double power,

double comp);

ce Already created energy structure to add an energy state toname Name of the energy state to createpower Energy consumed per unit time in this statecomp Default fraction of computational power available in this state

precondition: A valid mesh energy structure must be created through mesh create energy()description: Adds a new energy state to an energy structure created using mesh create energy().

The name of the energy state must be unique to the specified energy structure. Tospecify comp values for individual features, use mesh add energy state feature()

return value: A pointer to the newly created energy state.see also: mesh create energy(), mesh add energy state feature()

mesh add energy state featuremesh energy state* mesh add energy state feature(mesh energy state *ces, gint feature,

double comp);

ces Already created energy state to add an energy state feature tofeature Index identifying the feature for which an individual comp value is being added

comp Fraction of computational power available to the specified feature in the specifiedstate

precondition: A valid mesh energy state must be created through mesh add energy state()description: Adds an energy state feature, which specifies the fraction of computational power

available to a specific resource feature while in the specified energy state.return value: A pointer to the modified energy state.

see also: mesh add energy state()

mesh check current energy stategboolean mesh check current energy state(mesh resource *cr, gchar *state);

cr Already created energy resource with an associated energy structurestate Name of the energy state being checked against the current energy state

precondition: A valid mesh energy resource created with mesh create energy resource(), anda valid mesh energy structure created with mesh create energy() and set withmesh set energy()

description: Checks if the current energy state of resource cr is state, returning TRUE if so andFALSE otherwise.

return value: TRUE if state is the current energy state of cr, FALSE otherwisesee also: mesh create energy resource(), mesh create energy(), mesh set energy(),

mesh add energy state,() mesh get current energy state(),mesh set current energy state()

98 4 API Reference

mesh check target energy stategboolean mesh check target energy state(mesh resource *cr, gchar *state);

cr Already created energy resource with an associated energy structurestate Name of the energy state being checked against the target energy state

precondition: A valid mesh energy resource created with mesh create energy resource(), anda valid mesh energy structure created with mesh create energy() and set withmesh set energy()

description: Checks if the target energy state of resource cr is state, returning TRUE if so andFALSE otherwise.

return value: TRUE if state is the target energy state of cr, FALSE otherwisesee also: mesh create energy resource(), mesh create energy(), mesh set energy(),

mesh add energy state(), mesh get target energy state(),mesh set target energy state()

mesh create energymesh energy* mesh create energy(mesh feature list *cfl, double power, double comp);

cfl Already created and fully specified mesh feature listpower Energy consumed per unit time in the default energy statecomp Fraction of computational power available in the default energy state

precondition: All features must have been added to cfl using mesh feature add()description: Creates and initializes a mesh energy structure and creates the default energy state

using the specified power consumption and fractional computational power valueprovided. The energy structure by default is not associated with any energy re-sources. A mesh energy structure may be assigned to an energy resource (createdwith mesh create energy resource()) using mesh set energy().

return value: A pointer to the newly created mesh energy structuresee also: mesh feature add(), mesh create energy resource(), mesh set energy()


mesh create energy resourcemesh resource* mesh create energy resource(gchar* name,

mesh feature list* cfl,mesh scheduler* scheduler,(*timing resolution)(mesh resource *, GSList*)double default power);

name Name of resourcecfl List of consume features that the resource accepts


tures onto physical timing for this particular resource. Should pass a pointer to theresource executing the consume call, and a pointer to the list of consume call featurearrays.

description: Adds the relative power complexity feature to the feature list cfl, then creates theresource using mesh create resource().

return value: Pointer to new energy resourcesee also: mesh create resource()

mesh find energy state by namemesh energy state* mesh find energy state by name(mesh energy *ce, gchar *state);

ce Energy structure to searchstate Name of state to find in energy structure ce

description: Returns a pointer to the energy state state in the energy structure ce if it exists.return value: Pointer to the specified energy state, otherwise NULL

mesh find energy state featuredouble mesh find energy state feature(mesh energy state *ces, gint feature);

ces Energy state to searchfeature Index of the feature to find in energy state ces

description: Returns the fraction of available computational power for the specified resource fea-ture in the specified energy state.

return value: Fraction of available computational power for the specified feature in the specifiedenergy state

100 4 API Reference

mesh get current energy statemesh energy state* mesh get current energy state(mesh resource *cr);

cr Resource created using mesh create energy resource()precondition: Resource has been assigned a valid mesh energy structure with mesh set energy()description: Returns the current energy state of the specified resource. The resource must have

an associated, valid mesh energy structure, created with mesh create energy() andassigned with mesh set energy().

return value: A pointer to the current energy state of the specified resourcesee also: mesh create energy resource(), mesh create energy(), mesh set energy()

mesh get energymesh energy* mesh get energy(mesh resource *cr);

cr Resource created using mesh create energy resource()precondition: Resource has been assigned a valid mesh energy structure with mesh set energy()description: Returns a pointer to the mesh energy structure associated with the given resource.

The resource must have an associated, valid mesh energy structure, created withmesh create energy() and assigned with mesh set energy().

return value: A pointer to the mesh energy structure of the specified resourcesee also: mesh create energy resource(), mesh create energy(), mesh set energy()

mesh get energy state listmesh energy state list* mesh get energy state list(mesh resource *cr);

cr Resource created using mesh create energy resource()precondition: Resource has been assigned a valid mesh energy structure with mesh set energy()description: Returns a pointer to the list of available energy states to the given resource.

The resource must have an associated, valid mesh energy structure, created withmesh create energy() and assigned with mesh set energy().

return value: A pointer to the list of available energy states to the specified resourcesee also: mesh create energy resource(), mesh create energy(), mesh set energy()

mesh get target energy statemesh energy state* mesh get target energy state(mesh resource *cr);

cr Resource created using mesh create energy resource()precondition: Resource has been assigned a valid mesh energy structure with mesh set energy()description: Returns the target energy state of the specified resource. The resource must have

an associated, valid mesh energy structure, created with mesh create energy() andassigned with mesh set energy().

return value: A pointer to the target energy state of the specified resourcesee also: mesh create energy resource(), mesh create energy(), mesh set energy()


mesh print energy statisticsvoid mesh print energy statistics(mesh resource *cr);

cr Resource created using mesh create energy resource()precondition: Resource has been assigned a valid mesh energy structure with

mesh set energy(), and either mesh update energy state utilization() ormesh update energy state utilization consume() has been called at least once

description: Prints the energy state usage statistics (time spent in each, energy consumed in each)for the given resource, and provides summary statistics (total energy consumed, totaltime accounted for by energy states).

see also: mesh set energy() mesh update energy state utilization(),mesh update energy state utilization consume()

mesh set current energy stategboolean mesh set current energy state(mesh resource *cr, gchar *state);

cr Resource created using mesh create energy resource()state Name of the energy state to make the current energy state for the specified resource

precondition: Resource has been assigned a valid mesh energy structure with mesh set energy()description: If possible, changes the energy state of the specified resource to the energy state

named state. Returns TRUE if the operation succeeds (the specified energy stateexists) and FALSE otherwise.

return value: TRUE if state is a valid energy state for cr, FALSE otherwisesee also: mesh create energy resource(), mesh create energy(), mesh set energy(),

mesh add energy state()

mesh set energymesh resource* mesh set energy(mesh resource *cr, mesh energy *ce);

cr Resource created using mesh create energy resource()ce mesh energy structure created with mesh create energy()

precondition: cr has been created with mesh create energy resource() and ce has been created withmesh create energy()

description: Assigns the previously created (using mesh create energy()) mesh energy structurece to the previously created (using mesh create energy resource()) resource cr.

return value: A pointer to the modified resourcesee also: mesh create energy resource(), mesh create energy()

102 4 API Reference

mesh set target energy stategboolean mesh set target energy state(mesh resource *cr, gchar *state);

cr Resource created using mesh create energy resource()state Name of the energy state to make the target energy state for the specified resource

precondition: Resource has been assigned a valid mesh energy structure with mesh set energy()description: If possible, changes the energy state of the specified resource to the energy state

named state. Returns TRUE if the operation succeeds (the specified energy stateexists) and FALSE otherwise.

return value: TRUE if state is a valid energy state for cr, FALSE otherwisesee also: mesh create energy resource(), mesh create energy(), mesh set energy(),

mesh add energy state()

mesh update energy state utilizationvoid mesh update energy state utilization(mesh resource *cr);

cr Resource created using mesh create energy resource()precondition: Resource has been assigned a valid mesh energy structure with mesh set energy()description: Updates the record of how much time has been spent and energy has been dissipated

in the current energy state by comparing the current simulation time to the time uti-lization was last updated for this resource. Used to determine the energy consumptionof a resource by tracking how much time each resource spends in each energy state.

see also: mesh update energy state utilization consume()

mesh update energy state utilization consumevoid mesh update energy state utilization consume(mesh resource *cr, double length,

double rpc);

cr Resource created using mesh create energy resource()length Length of the consume call executed on the given resource

rpc The relative power complexity of the consume callprecondition: Resource has been assigned a valid mesh energy structure with mesh set energy()description: Updates the record of how much time has been spent and energy has been dissipated

in the current energy state based on the length of the consume call executed andthe relative power complexity of the consume call. Used to determine the energyconsumption of a resource by tracking how much time each resource spends in eachenergy state.

see also: mesh update energy state utilization()

4.11 mesh def energy resources.h 103

4.11 mesh def energy resources.h

mesh energy resource defaultdouble mesh energy resource default(mesh resource *cr, GSList *feature arrays);

cr Resource created using mesh create energy resource()feature arrays The list of feature arrays containing a double for each resource featureprecondition: Resource has been assigned a valid mesh energy structure with mesh set energy()description: This is the default energy resource timing resolution function. It finds the first fea-

ture array in feature arrays that the resource is capable of completely executing (theresource has a feature for each non-zero feature in the consume). Then, the functionuses the fraction of available computation for each feature to determine the totalphysical time consumed by the features.

return value: Physical time durationsee also: mesh set energy()

mesh energy resource default powerdouble mesh energy resource default power(mesh resource *cr);

cr Resource created using mesh create energy resource()precondition: Resource has been assigned a valid mesh energy structure with mesh set energy()description: Returns the default computational power of the resource, using the fraction of avail-

able computational power determined by the current energy state.return value: Fractional default computational power of the resource

see also: mesh set energy()

104 4 API Reference

4.12 mesh def energy schedulers.h

mesh energy sched find idle resourcesGSList *mesh energy sched find idle resources(mesh scheduler *cs);

cs Scheduler to search for idle resourcesdescription: Searches the given scheduler and returns a GSList of mesh resource pointers to idle

resources. The caller is responsible for freeing this list. This function is useful whenwriting custom scheduler routines.

return value: A GSList of pointers to idle mesh resource structures

mesh energy scheduler rrGSList *mesh energy scheduler rr(mesh scheduler *cs);

cs Scheduler to perform round-robin scheduling withdescription: This is the default energy scheduler function. It uses a round-robin algorithm to

schedule all the eligible threads onto idle resources, with a preference to using idleresources with higher fractional available computational power (using the defaultfeature). It returns a list of thread-resource pairs which should be run by the kernel.This function can be passed to mesh create scheduler().

return value: A GSList of thread-resources pairs (rt pair)

MESH User’s Manualcosim/mesh-tutorial.pdf · 2007. 8. 17. · 1.3 Installing the MESH Viewer 5...

Documents

Transcript of MESH User’s Manualcosim/mesh-tutorial.pdf · 2007. 8. 17. · 1.3 Installing the MESH Viewer 5...