STREAM COMPUTING

32
STREAM COMPUTING A SEMINAR REPORT Submitted by AMIT KUMAR in partial fulfillment for the award of the degree of BACHELOR OF TECHNOLOGY in COMPUTER SCIENCE & ENGINEERING SCHOOL OF ENGINEERING COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY, COCHIN – 682022 AUGUST 2008

Transcript of STREAM COMPUTING

Page 1: STREAM COMPUTING

STREAM COMPUTING

A SEMINAR REPORT

Submitted by

AMIT KUMAR

in partial fulfillment for the award of the degree

of

BACHELOR OF TECHNOLOGY

in

COMPUTER SCIENCE & ENGINEERING

SCHOOL OF ENGINEERING

COCHIN UNIVERSITY OF SCIENCE AND

TECHNOLOGY,

COCHIN – 682022

AUGUST 2008

Page 2: STREAM COMPUTING

DIVISION OF COMPUTER ENGINEERING

SCHOOL OF ENGINEERING

COCHIN UNIVERSITY OF SCIENCE AND

TECHNOLOGY, COCHIN – 682022

Bonafide Certificate

Certified that this seminar report titled

“Stream Computing”

Presented by

AMIT KUMAR

of the VII semester, Computer Science and Engineering in the year 2008 in partial

fulfillment of the requirements in the award of Degree of Bachelor of Technology in

Computer Science and Engineering of Cochin University of Science and Technology.

Ms.Preetha SSEMINAR GUIDE DATE

Dr.David Peter SHead of the Deaprtment

Page 3: STREAM COMPUTING

Acknowledgement

At the outset, we thank God almighty for making our endeavor a

success. I express my gratitude to Ms.Preetha S . Her guidance as well as

her patience has been instrumental in enabling me to develop my ideas and

research goals. Her vision and leadership has been fundamental to the

progress of the seminar . Her breadth and depth of knowledge and the

variety of her interests and pursuits will always provide a standard for me to

aspire to.

I am also indebted to Dr.David Peter S, Head of the

Department for providing me the adequate facilities, ways, means by which

I am able to finish the seminar.

I express my immense pleasure and thankfulness to all teachers and

staff of the Department of Computer Science and Engineering, CUSAT for

their cooperation and support.

Last but not the least I am also grateful to my family and friends who

have been a constant source of support and encouragement from well before

the start of my graduate career.

AMIT KUMAR

Page 4: STREAM COMPUTING

Abstract

Stream computing is a programming paradigm that models a com-

puter program as a stream of data between several processing units,

rather than as an implemented algorithm processing data. The prin-

ciple originates from the needs of real time multimedia applications.

These applications can be divided into multiple data streams - e.g. au-

dio and video streams - that must be delivered to the data consumer

in a just-in-time manner.

In this paper, we explore the problems and opportunities that this

new paradigm can bring to the field of classical high performance

computing (HPC). Stream computing is well suited for application

on non-classical types hardware, such as asymmetric multiprocessors

or graphical processing units (GPU). We show how classic HPC can

be mapped on non-classic hardware. This mapping alters the imple-

mented algorithms, but also incluses information of the specific prob-

lem that is solved. Solutions are less generic than in a normal imple-

mentation or library. As an illustration, we present a streaming frame-

work on the Cell processor that shows the obstructions that must be

overcome.

Page 5: STREAM COMPUTING

The main task is to pull in streams of data, process the data and

stream it back out as a single flow and thereby analyzes multiple data

streams from many sources live. Stream computing uses software

algorithms that analyzes the data in real time as it streams in to increase

speed and accuracy when dealing with data handling and analysis.

System S, the stream computing system of IBM, introduced in June

2007, runs on 800 microprocessors and the System S software enables

software applications to split up tasks and then reassemble the data into

an answer. ATI Technologies also announced a stream computing

technology derived from a class of applications that run on the GPU

instead of a CPU which enables the graphics processors (GPUs) to work

in conjunction with high-performance, low-latency CPUs to solve

complex computational problems.

Page 6: STREAM COMPUTING

i

Table of Contents

Chapter

No.

Title Page

No.

List of figures iii

1 Introduction 1

2 Stream computing 2

2.1 Characteristics of stream computing 3

3 Need for Stream computing 4

4 Enabling Technologies 5

4.1 Stream Processor architecture

4.2 Processing stages in Stream processing

systems

6

7

4.3 StreamIt Language Overview 8

4.4 StreamIt Language Constructs 9

4.5 Filters as computational elements 10

4.6 Applications 11

5 StreamIt Compiler 13

5.1 Linear Filter optimization

5.2 Extracting Linear Representation

5.3 Combining Linear Filters

5.4 Linear optimization of Stream graph

5.5 Backend for parallel platforms

13

15

16

17

18

6 Development Support 19

Page 7: STREAM COMPUTING

ii

6.1 StreamIt Development tool 19

6.2 Debugging Parallel StreamIt programs 20

6.3 StreamIt graphical Editor

6.4StreamIt Debugging Environment

21

22

7 Conclusion 23

8 References 24

Page 8: STREAM COMPUTING

iii

List of figures

Sl.

No.

Images Page

No.

4.1

4.3

Stream processor architecture

StreamIt Language Overview

6

8

4.4

4.6

5.0

5.2

5.3

5.4

5.5

6.2

6.3

6.4

Language constructs

Applications Radar Front end

StreamIt Compiler

Extracting Linear Representation

Combining Linear Filters

Linear Optimization of Stream Graph

Backend For Parallel platforms

Debugging Parallel StreamIt programs

StreamIt graphical Editor

StreamIt Debugging environment

9

12

13

15

16

17

18

19

21

21

Page 9: STREAM COMPUTING

Stream Computing

1Division Of Computer Science and Engineering ,SOE,CUSAT

INTRODUCTION

What is stream computing exactly? As a beginning here is a definition

“Stream computing is a programming paradigm that models a computer program as

a stream of data between several processing units,rather than as an implemented

algorithm processing data”

Like other definitions of topics like these, an understanding of the term stream

computing requires an understanding of various other terms which are closely

related to this. While there is a lack of precise scientific definitions for many of

these terms, general definitions can be given.

Computing can be described as any activity of using and/or developing

computer hardware and software. It includes everything that sits in the

bottom layer, i.e. everything from raw compute power to storage

capabilities. Stream processing is a computer programming paradigm,

related to SIMD that allows some applications to more easily exploit a

limited form of parallel processing. Such applications can use multiple

computational units, such as the floating poinits on a GPU without explicitly

managing allocation, synchronization, or communication among those

units.

The stream processing paradigm simplifies parallel software and hardware by

restricting the parallel computation that can be performed. Given a set of data (a

stream), a series of operations (kernel functions) are applied to each element in the

stream. Uniform streaming, where one kernel function is applied to all elements in

the stream, is typical. Kernel functions are usually pipelined, and local on-chip

memory is reused to minimize external memory bandwidth. Since the kernel and

stream abstractions expose data dependencies, compiler tools can fully automate

and optimize on-chip management tasks. Stream processing hardware can use score

boarding, for example, to launch DMAs at runtime, when dependencies become

known. The elimination of manual DMA management reduces software

Page 10: STREAM COMPUTING

Stream Computing

2Division Of Computer Science and Engineering ,SOE,CUSAT

complexity, and the elimination of hardware caches reduces the amount of die area

not dedicated to computational units such as ALUs.

2. Stream Computing

“Stream computing is a programming paradigm that models a computer

program as a stream of data between several processing units, rather than as an

implemented algorithm processing data”. StreamIt is a programming language and

a compilation infrastructure, specifically engineered for modern streaming systems.

It is designed to facilitate the programming of large streaming applications, as well

as their efficient and effective mapping to a wide variety of target architectures,

including commercial-off-the-shelf uniprocessors , multicore architectures, and

clusters of workstations.

I.B.M.is introducing a high-performance computer system that is intended

to rapidly analyze data as it streams in from many sources, increasing the

speed and accuracy of decision making in fields as diverse as security

surveillance and Wall Street trading.The company plans to demonstrate the

system, called System S, at a conference of Wall Street technology

managers today. The announcement, analysts say, is a significant step in

the commercialization of the emerging technology of stream computing.

Early this month Google acquired Peak Stream, a start-up in stream

computing, and industry analysts say its software could help Google

improve its video search functions.

Stream computing is an effort to deal with two issues: the need for faster

data handling and analysis in business and science, and the growing flood

of information in digital form, including Web sites, blogs, e-mail, video and

news clips, telephone conversations, transaction data and electronic

sensors. In stream computing, advanced software algorithms analyze the

data as it streams in. Text, voice and image-recognition technology, for

example, can be used to determine that some data is more relevant to a

particular problem than others. The priority data is then shuttled off into a

Page 11: STREAM COMPUTING

Stream Computing

3Division Of Computer Science and Engineering ,SOE,CUSAT

program tailored to work on complex, fast-changing problems like tracking

an epidemic and predicting its spread, or culling data from electronic

sensors in a computer chip plant to quickly correct flaws in manufacturing.

I.B.M. deems its System S research project ready to make its way into the

marketplace. The planned announcement to the Wall Street group is the

beginning of its effort to find industry partners .The initial system runs on

about 800 microprocessors, though it can scale up to tens of thousands as

needed, I.B.M. said. The most notable step, researchers say, lies in the

System S software, which enables software applications to split up tasks like

image recognition and text recognition, and then reassemble the pieces of

the puzzle into an answer.

2.1 Characteristics of stream computing

Enable new applications on new architectures

Parallel problems other than graphics that map well on GPU architecture .

Transition from fixed function to programmable pipelines.

Various proof points in research and industry under the name GPGPU .

Data dependencies and parallelism.

A great advantage of the stream programming model lies in the kernel

defining independent and local data usage. Kernel operations define the basic data

unit, both as input and output. This allows the hardware to better allocate resources

and schedule global I/O. Although usually not exposed in the programming model,

the I/O operations seems to be much more advanced on stream processors (at least,

on GPUs). I/O operations are also usually pipelined by themselves while chip

structure can help hide latencies. Definition of the data unit is usually explicit in the

kernel, which is expected to have well-defined inputs (possibly using structures,

which is encouraged) and outputs. In some environments, output values are fixed

(in GPUs for example, there is a fixed set of output attributes, unless this is

relaxed). Having each computing block clearly independent and defined allows to

schedule bulk read or write operations, greatly increasing cache and memory bus

efficiency.

Page 12: STREAM COMPUTING

Stream Computing

4Division Of Computer Science and Engineering ,SOE,CUSAT

3. Need for stream computing

How does stream computing differ from computation on the CPU?

Stream computing takes advantage of a SIMD methodology (single instruction,

multiple data) whereas a CPU is a modified SISD methodology (single instruction,

single data); modifications taking various parallelism techniques into account.

Where by tens to hundreds of parallel operations are performed with each clock

cycle whereas the CPU The benefit of stream computing stems from the highly

parallel architecture of the GPU can at best work only a small handful of parallel

operationsperclockcycle

AMD's Fire Stream™ 9170, What are AMD stream computing product

features?

our latest generation stream computing GPU, features:

320 stream cores (compute units or ALUs)

2GB on-board GDDR3 memory

Double precision floating point support

PCIe 2.0 x16 interface

What are AMD’s stream computing product advantages? AMD's FireStream

9170

hardware:

Only company positioned to offer a unique platform with strengths in accelerated

GPU as Stream Computing

Stream computing today leading to fusion tomorrow

AMD's open systems SDK approach:

CTM initiative — Release low level specifications to enable developers and

wnd user to understand the architecture and tuning to maximize performance

Page 13: STREAM COMPUTING

Stream Computing

5Division Of Computer Science and Engineering ,SOE,CUSAT

Deliver high level, multi-targeted compilers through Brook, 3rd parties like

rapid mind, and partnerships with universities and industry.

Is stream computing return to the old coprocessor days? In many ways

stream computing does resemble the days when vector co-processors handled

substantial mathematical tasks. The benefit then as now is the remarkable

performance boost gained through implementing these specialized components.

4. Enabling technologies

4.1 Stream Processor architecture

Stream processors are programmable processors that are optimized for executing

programs expressed using the stream programming model. A block diagram of a

stream processor The stream processor operates as a coprocessor under the control

of the host processor, which is often a standard general-purpose CPU. A stream

program executing on the host processor orchestrates the sequence of kernels to

be executed and then necessary transfer of input and output data streams between

the stream processor and o�-chip memory. Kernel execution takes place directly on

the stream processor from instructions stored in the microcontroller. New kernels

may be loaded into them microcontroller as needed, possibly under explicit control

of the host processor. The sequence of operations initiated by the host processor to

Page 14: STREAM COMPUTING

Stream Computing

6Division Of Computer Science and Engineering ,SOE,CUSAT

orchestrate the stream program .Dependencies between these host-issued operations

. The host interface of the stream processor issues the commands received from the

host to the appropriate units as resources become available, subject to dependencies

among the commands. Arithmetic units of the stream processor are grouped in to n

identical compute clusters. Each cluster consists of several functional units and

associated registers. A block diagram of an example cluster organization. The local

register files (LRFs) attached to each functional unit provide the input operands for

that unit, and results are written to one or more of the LRFs via the intra cluster

network . Loop-carried state and other shared data may be communicated among

Block diagram of stream processor architecture

4.2Processing stages in Stream Processing systems

"A model that uses sequences of data and computation kernels to expose and

exploit concurrency and locality for efficient. When using such a board for stream

processing, a common system model is to distribute data out from the FPGA to

Page 15: STREAM COMPUTING

Stream Computing

7Division Of Computer Science and Engineering ,SOE,CUSAT

other processors in the multicomputer, either in a round-robin or a next-avail-

able-processor fashion FPGA toolkits provide drivers and a software library for

managing these complex data movement strategies as well as interfaces for a wide

range of board-related features, including node configuration , temperature and

current sensors and control bus access. They also provide elements such as IP block

libraries, simulation environments, BSPs, algorithm libraries and middleware.

Processing stages in Stream Processing systems

Page 16: STREAM COMPUTING

Stream Computing

8Division Of Computer Science and Engineering ,SOE,CUSAT

4.3 StreamIt Language Overview

StreamIt is an architecture-independent language for streaming applications. It

adopts the Cyclo-Static Dataflow [1] model of computation which is a

generalization of Synchronous Dataflow . StreamIt programs are represented as

graphs where nodes represent computation and edges represent FIFO-ordered

communication of data over tapes.

The basic programmable unit in StreamIt is a filter.Each filter contains a work

function that executes atomically, popping (i.e., reading) a fixed number of item

from the filters input tape and pushing (i.e., writing) a fixed number of items to the

filters output tape. A filter may also “peek” at a given index on its input tape

without consuming the item; this makes it simple to rep-resent computation over a

“sliding-window”. The push,pop, and peek rates are declared as part of the work

function, thereby enabling the compiler to construct a static schedule of filter

Page 17: STREAM COMPUTING

Stream Computing

9Division Of Computer Science and Engineering ,SOE,CUSAT

firings StreamIt provides three hierarchical structures for composing filters into

larger stream graphs .

4.4 StreamIt language Constructs

Programming paradigm is modular. Important for large scale development

.Parametrized templates allows program to change behavior with small source code

modifications ,it shows the malleability property. Composition of simple structures

create and large graphs.It enables inductive reasoning about correctness.

Application is architecture independent.

Page 18: STREAM COMPUTING

Stream Computing

10Division Of Computer Science and Engineering ,SOE,CUSAT

4.5 Filter as computational elements

Filters are the programmable units. An initialization function and a steady sate

work fuction.communicate via FIFO’s:pop(),peek(index),push(value)

float→float filter FIR (int N) {

float[N] weights;

init {

weights = calculate_weights(N

}

work push 1 pop 1 peek N {

float result = 0;

for (int i = 0; i < N; i++) {

result += weights[i] * peek(i);

}

push(result);

pop();}}

Page 19: STREAM COMPUTING

Stream Computing

11Division Of Computer Science and Engineering ,SOE,CUSAT

The filter can now serve as a module that is incorporated into stream graphs as

necessary for example as part of an acoustic beam former. A filter is akin to a

classin object oriented programming with the work function serving as the main

method. A filter may also declare a constructor function to initialize the filter state

before any other method is invoked. The implementation of the work function in

StreamIt obviates the need for explicit buffer management. The application

developer instead focuses on the hierarchical assembly of the stream graph and its

communication topology.

4.6 Applications

Stream processing is essentially a compromise, driven by a data-centric model that

works very well for traditional DSP or GPU-type applications (such as image,

video and digital signal processing) but less so for general purpose processing with

more randomized data access (such as databases). By sacrificing some flexibility in

the model, the implications allow easier, faster and more efficient execution.

Depending on the context, processor design may be tuned for maximum efficiency

or a trade-off for flexibility.

Stream processing is especially suitable for applications that exhibit three

application characteristics. Compute Intensity the number of arithmetic operations

per I/O or global memory reference. In many signal processing applications today it

is well over 50:1 and increasing with algorithmic complexity .Data Parallelism

exists in a kernel if the same function is applied to all records of an input stream

and a number of records can be processed simultaneously without waiting for

results from previous records. Data Locality is a specific type of temporal locality

common in signal and media processing applications where data is produced once,

read once or twice later in the application , and never read again. Intermediate

streams passed between kernels as well as intermediate data within kernel functions

can capture this locality directly using the stream processing programming model.

Page 20: STREAM COMPUTING

Stream Computing

12Division Of Computer Science and Engineering ,SOE,CUSAT

Example application radar Front-end

Page 21: STREAM COMPUTING

Stream Computing

13Division Of Computer Science and Engineering ,SOE,CUSAT

5. StreamIt Compiler

StreamIt compiler hides granularity of execution and architecture details. compiler

backend supports uniprocessor ,cluster of workstations and MIT Raw. Innovative

compiler

Technology focuses on the core set of challenges to deliver high performance in

future architectures .Automating domain specific Optimizations by optimizing of

linear streams and translation of the frequency domain .It helps in partitioning,

routing etc.

5.1 Linear Filter optimizations

StreamIt provides three hierarchical structures for composing filters into larger

stream graphs (see Figure 1). The pipeline construct composes streams in sequence,

Page 22: STREAM COMPUTING

Stream Computing

14Division Of Computer Science and Engineering ,SOE,CUSAT

with the output of one connected to the input of the next. The split join construct

distributes data to a set of parallel streams, which are then joined together in a

round robin fashion. The feedback loop provides a mechanism for introducing

cycles in the graph. An ex-ample of a pipeline appears in Figure 2. It contains as

ingle FIR (Finite Impulse Response) filter which could be implemented as follows.

Page 23: STREAM COMPUTING

Stream Computing

15Division Of Computer Science and Engineering ,SOE,CUSAT

5.2 Extracting Linear Representation

Resembles constant propagation

Maintains linear form �v, b� for each variable

Peek expression: generate fresh v

Push expression: copy v into A

Pop expression: increment o

Page 24: STREAM COMPUTING

Stream Computing

16Division Of Computer Science and Engineering ,SOE,CUSAT

5.3 Combining Linear Filters

Pipelines and split joins can be collapsed .for example

pipeline. Given below figure describes about the Combinational example

COMBINATION EXAMPLE

Page 25: STREAM COMPUTING

Stream Computing

17Division Of Computer Science and Engineering ,SOE,CUSAT

5.4 Linear optimization of Stream graph

Page 26: STREAM COMPUTING

Stream Computing

18Division Of Computer Science and Engineering ,SOE,CUSAT

5.5 Backend for parallel platforms

StreamIt exposes communication pattern Automatic generation and optimization of

routing code Otherwise, may require extensive (assembly) programming FIR –

Raw backend. 15 statements of StreamIt code achieve the same performance as 352

statements of manually-tuned C. Frequency Hopping Radio-cluster backend 50%

higher throughput and 35% less communication, when using StreamIt’ messaging

construct.

Page 27: STREAM COMPUTING

Stream Computing

19Division Of Computer Science and Engineering ,SOE,CUSAT

6. Development Support

6.1 StreamIt Development tool

The StreamIt Development Tool (SDT) features many aspects of an

IDE, including a text editor and a debugger. For example, the SDT

debugger supports line. and method breakpoints, watchpoints, program

suspen-sion, code stepping, variable inspection and value modification

to list a few.Moreover, the SDT offers features tailored to the StreamIt

language. The SDT graphically represents StreamIt programs, and

preserves hierarchical information to allow an application engineer to

focus on the parts of the stream program that are of interest. In

addition, the SDT can track the flow of data between filters and most

importantly, it provides a deterministic mechanism to debug parallel

streams.The SDT is implemented in Java as an Eclipse [3]plug-in. The

Eclipse universal tools platform is an ex-tensible development

environment. We leverage the built-in user interfaces for editing and

viewing files, the resource management system, the documentation

infrastructure, and the runtime support of launching, runningand

Page 28: STREAM COMPUTING

Stream Computing

20Division Of Computer Science and Engineering ,SOE,CUSAT

debugging programs.

6.2 Debugging Parallel StreamIt programs

Parallelism and communication are exposed. Tracking the flow of data in a stream

graph affords a frame of reference for reasoning about “time”. Powerful advantage

when debugging parallel programs versus Multiple threads with independent

program counters Non-deterministic execution.

Page 29: STREAM COMPUTING

Stream Computing

21Division Of Computer Science and Engineering ,SOE,CUSAT

6.3 StreamIt graphical editor

We are making available the first public version of the StreamIt Development Tool

(SDT). It is implemented in Java as an Eclipse plug in ,and intended for

developing, debugging, and visualizing programs written in StreamIt. As a

graphical programming environment, it can simply and intuitively convey the

hierarchical and structured nature of a StreamIt application. In addition, the

Page 30: STREAM COMPUTING

Stream Computing

22Division Of Computer Science and Engineering ,SOE,CUSAT

debugger can interpret and visually represent the stream graph and its dynamic

behavior, including the flow of information in parallel stream graphs. The SDT is

composed of the following modules: an IDE-integrated debugger, a graphical text

editor, a runtime stream graph view, and a corresponding graph overview.

6.4 StreamIt Debugging Environment

As seen in Figure , a StreamIt program can be visually depicted as a

hierarchical directed graph of streams, with graph nodes representing streams and

graph edges representing tapes or channels. The containers are rendered according

to the code declarations, and the visualization tools in the SDT allow the user to

selectively collapse and expand containers. This is useful in large streams where the

application developers are only interested in visualizing a particular subset, for

example to verify the interconnect topology of the graph. In Figure , we show a

screen shot of the SDT for a simple StreamIt program which consists of a filter that

generates input data (Int Source), a split join (Echo) that operates on the data

Page 31: STREAM COMPUTING

Stream Computing

23Division Of Computer Science and Engineering ,SOE,CUSAT

produced by the source and whose data is in turn consumed by an Adder. Lastly, a

filter (IntPrinter) reads and prints the computed values

7. Conclusion

Stream processing has been shown to outperform mainstream programmable

computing solutions while consuming less power for data parallel applications.

Exploiting the data- and instruction-level parallelism inherent in these applications,

stream processors sustain many operations in parallel, and overlap them with

memory accesses in order to improve computation throughput. Realizing the

performance potential of stream processing, however, depends on the ability to

manage bandwidth demands in the memory hierarchy to sustain the operands

needed for highly parallel computation.

We introduced an indexed stream register file architecture that enabled data

reuse patterns found in a broad range of data parallel applications to be captured in

on chip memories of stream processors, reducing o� chip bandwidth demands by

several fold in some cases. This, in e�ect , enables classes of data parallel

applications that ,due to bandwidth bottlenecks, could not previously be e�ciently

executed on stream processors to be supported e�ciently .

Page 32: STREAM COMPUTING

Stream Computing

24Division Of Computer Science and Engineering ,SOE,CUSAT

8. References

1 . http://cag.csail.mit.edu/streamit

“StreamIt Homepage”

2. http://www.csm.ornl.gov/workshops/SOS11/presentations/d_rich.pdf

“welcome to the new era of stream computing ppt”