STREAM COMPUTING
Transcript of STREAM COMPUTING
STREAM COMPUTING
A SEMINAR REPORT
Submitted by
AMIT KUMAR
in partial fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE & ENGINEERING
SCHOOL OF ENGINEERING
COCHIN UNIVERSITY OF SCIENCE AND
TECHNOLOGY,
COCHIN – 682022
AUGUST 2008
DIVISION OF COMPUTER ENGINEERING
SCHOOL OF ENGINEERING
COCHIN UNIVERSITY OF SCIENCE AND
TECHNOLOGY, COCHIN – 682022
Bonafide Certificate
Certified that this seminar report titled
“Stream Computing”
Presented by
AMIT KUMAR
of the VII semester, Computer Science and Engineering in the year 2008 in partial
fulfillment of the requirements in the award of Degree of Bachelor of Technology in
Computer Science and Engineering of Cochin University of Science and Technology.
Ms.Preetha SSEMINAR GUIDE DATE
Dr.David Peter SHead of the Deaprtment
Acknowledgement
At the outset, we thank God almighty for making our endeavor a
success. I express my gratitude to Ms.Preetha S . Her guidance as well as
her patience has been instrumental in enabling me to develop my ideas and
research goals. Her vision and leadership has been fundamental to the
progress of the seminar . Her breadth and depth of knowledge and the
variety of her interests and pursuits will always provide a standard for me to
aspire to.
I am also indebted to Dr.David Peter S, Head of the
Department for providing me the adequate facilities, ways, means by which
I am able to finish the seminar.
I express my immense pleasure and thankfulness to all teachers and
staff of the Department of Computer Science and Engineering, CUSAT for
their cooperation and support.
Last but not the least I am also grateful to my family and friends who
have been a constant source of support and encouragement from well before
the start of my graduate career.
AMIT KUMAR
Abstract
Stream computing is a programming paradigm that models a com-
puter program as a stream of data between several processing units,
rather than as an implemented algorithm processing data. The prin-
ciple originates from the needs of real time multimedia applications.
These applications can be divided into multiple data streams - e.g. au-
dio and video streams - that must be delivered to the data consumer
in a just-in-time manner.
In this paper, we explore the problems and opportunities that this
new paradigm can bring to the field of classical high performance
computing (HPC). Stream computing is well suited for application
on non-classical types hardware, such as asymmetric multiprocessors
or graphical processing units (GPU). We show how classic HPC can
be mapped on non-classic hardware. This mapping alters the imple-
mented algorithms, but also incluses information of the specific prob-
lem that is solved. Solutions are less generic than in a normal imple-
mentation or library. As an illustration, we present a streaming frame-
work on the Cell processor that shows the obstructions that must be
overcome.
The main task is to pull in streams of data, process the data and
stream it back out as a single flow and thereby analyzes multiple data
streams from many sources live. Stream computing uses software
algorithms that analyzes the data in real time as it streams in to increase
speed and accuracy when dealing with data handling and analysis.
System S, the stream computing system of IBM, introduced in June
2007, runs on 800 microprocessors and the System S software enables
software applications to split up tasks and then reassemble the data into
an answer. ATI Technologies also announced a stream computing
technology derived from a class of applications that run on the GPU
instead of a CPU which enables the graphics processors (GPUs) to work
in conjunction with high-performance, low-latency CPUs to solve
complex computational problems.
i
Table of Contents
Chapter
No.
Title Page
No.
List of figures iii
1 Introduction 1
2 Stream computing 2
2.1 Characteristics of stream computing 3
3 Need for Stream computing 4
4 Enabling Technologies 5
4.1 Stream Processor architecture
4.2 Processing stages in Stream processing
systems
6
7
4.3 StreamIt Language Overview 8
4.4 StreamIt Language Constructs 9
4.5 Filters as computational elements 10
4.6 Applications 11
5 StreamIt Compiler 13
5.1 Linear Filter optimization
5.2 Extracting Linear Representation
5.3 Combining Linear Filters
5.4 Linear optimization of Stream graph
5.5 Backend for parallel platforms
13
15
16
17
18
6 Development Support 19
ii
6.1 StreamIt Development tool 19
6.2 Debugging Parallel StreamIt programs 20
6.3 StreamIt graphical Editor
6.4StreamIt Debugging Environment
21
22
7 Conclusion 23
8 References 24
iii
List of figures
Sl.
No.
Images Page
No.
4.1
4.3
Stream processor architecture
StreamIt Language Overview
6
8
4.4
4.6
5.0
5.2
5.3
5.4
5.5
6.2
6.3
6.4
Language constructs
Applications Radar Front end
StreamIt Compiler
Extracting Linear Representation
Combining Linear Filters
Linear Optimization of Stream Graph
Backend For Parallel platforms
Debugging Parallel StreamIt programs
StreamIt graphical Editor
StreamIt Debugging environment
9
12
13
15
16
17
18
19
21
21
Stream Computing
1Division Of Computer Science and Engineering ,SOE,CUSAT
INTRODUCTION
What is stream computing exactly? As a beginning here is a definition
“Stream computing is a programming paradigm that models a computer program as
a stream of data between several processing units,rather than as an implemented
algorithm processing data”
Like other definitions of topics like these, an understanding of the term stream
computing requires an understanding of various other terms which are closely
related to this. While there is a lack of precise scientific definitions for many of
these terms, general definitions can be given.
Computing can be described as any activity of using and/or developing
computer hardware and software. It includes everything that sits in the
bottom layer, i.e. everything from raw compute power to storage
capabilities. Stream processing is a computer programming paradigm,
related to SIMD that allows some applications to more easily exploit a
limited form of parallel processing. Such applications can use multiple
computational units, such as the floating poinits on a GPU without explicitly
managing allocation, synchronization, or communication among those
units.
The stream processing paradigm simplifies parallel software and hardware by
restricting the parallel computation that can be performed. Given a set of data (a
stream), a series of operations (kernel functions) are applied to each element in the
stream. Uniform streaming, where one kernel function is applied to all elements in
the stream, is typical. Kernel functions are usually pipelined, and local on-chip
memory is reused to minimize external memory bandwidth. Since the kernel and
stream abstractions expose data dependencies, compiler tools can fully automate
and optimize on-chip management tasks. Stream processing hardware can use score
boarding, for example, to launch DMAs at runtime, when dependencies become
known. The elimination of manual DMA management reduces software
Stream Computing
2Division Of Computer Science and Engineering ,SOE,CUSAT
complexity, and the elimination of hardware caches reduces the amount of die area
not dedicated to computational units such as ALUs.
2. Stream Computing
“Stream computing is a programming paradigm that models a computer
program as a stream of data between several processing units, rather than as an
implemented algorithm processing data”. StreamIt is a programming language and
a compilation infrastructure, specifically engineered for modern streaming systems.
It is designed to facilitate the programming of large streaming applications, as well
as their efficient and effective mapping to a wide variety of target architectures,
including commercial-off-the-shelf uniprocessors , multicore architectures, and
clusters of workstations.
I.B.M.is introducing a high-performance computer system that is intended
to rapidly analyze data as it streams in from many sources, increasing the
speed and accuracy of decision making in fields as diverse as security
surveillance and Wall Street trading.The company plans to demonstrate the
system, called System S, at a conference of Wall Street technology
managers today. The announcement, analysts say, is a significant step in
the commercialization of the emerging technology of stream computing.
Early this month Google acquired Peak Stream, a start-up in stream
computing, and industry analysts say its software could help Google
improve its video search functions.
Stream computing is an effort to deal with two issues: the need for faster
data handling and analysis in business and science, and the growing flood
of information in digital form, including Web sites, blogs, e-mail, video and
news clips, telephone conversations, transaction data and electronic
sensors. In stream computing, advanced software algorithms analyze the
data as it streams in. Text, voice and image-recognition technology, for
example, can be used to determine that some data is more relevant to a
particular problem than others. The priority data is then shuttled off into a
Stream Computing
3Division Of Computer Science and Engineering ,SOE,CUSAT
program tailored to work on complex, fast-changing problems like tracking
an epidemic and predicting its spread, or culling data from electronic
sensors in a computer chip plant to quickly correct flaws in manufacturing.
I.B.M. deems its System S research project ready to make its way into the
marketplace. The planned announcement to the Wall Street group is the
beginning of its effort to find industry partners .The initial system runs on
about 800 microprocessors, though it can scale up to tens of thousands as
needed, I.B.M. said. The most notable step, researchers say, lies in the
System S software, which enables software applications to split up tasks like
image recognition and text recognition, and then reassemble the pieces of
the puzzle into an answer.
2.1 Characteristics of stream computing
Enable new applications on new architectures
Parallel problems other than graphics that map well on GPU architecture .
Transition from fixed function to programmable pipelines.
Various proof points in research and industry under the name GPGPU .
Data dependencies and parallelism.
A great advantage of the stream programming model lies in the kernel
defining independent and local data usage. Kernel operations define the basic data
unit, both as input and output. This allows the hardware to better allocate resources
and schedule global I/O. Although usually not exposed in the programming model,
the I/O operations seems to be much more advanced on stream processors (at least,
on GPUs). I/O operations are also usually pipelined by themselves while chip
structure can help hide latencies. Definition of the data unit is usually explicit in the
kernel, which is expected to have well-defined inputs (possibly using structures,
which is encouraged) and outputs. In some environments, output values are fixed
(in GPUs for example, there is a fixed set of output attributes, unless this is
relaxed). Having each computing block clearly independent and defined allows to
schedule bulk read or write operations, greatly increasing cache and memory bus
efficiency.
Stream Computing
4Division Of Computer Science and Engineering ,SOE,CUSAT
3. Need for stream computing
How does stream computing differ from computation on the CPU?
Stream computing takes advantage of a SIMD methodology (single instruction,
multiple data) whereas a CPU is a modified SISD methodology (single instruction,
single data); modifications taking various parallelism techniques into account.
Where by tens to hundreds of parallel operations are performed with each clock
cycle whereas the CPU The benefit of stream computing stems from the highly
parallel architecture of the GPU can at best work only a small handful of parallel
operationsperclockcycle
AMD's Fire Stream™ 9170, What are AMD stream computing product
features?
our latest generation stream computing GPU, features:
320 stream cores (compute units or ALUs)
2GB on-board GDDR3 memory
Double precision floating point support
PCIe 2.0 x16 interface
What are AMD’s stream computing product advantages? AMD's FireStream
9170
hardware:
Only company positioned to offer a unique platform with strengths in accelerated
GPU as Stream Computing
Stream computing today leading to fusion tomorrow
AMD's open systems SDK approach:
CTM initiative — Release low level specifications to enable developers and
wnd user to understand the architecture and tuning to maximize performance
Stream Computing
5Division Of Computer Science and Engineering ,SOE,CUSAT
Deliver high level, multi-targeted compilers through Brook, 3rd parties like
rapid mind, and partnerships with universities and industry.
Is stream computing return to the old coprocessor days? In many ways
stream computing does resemble the days when vector co-processors handled
substantial mathematical tasks. The benefit then as now is the remarkable
performance boost gained through implementing these specialized components.
4. Enabling technologies
4.1 Stream Processor architecture
Stream processors are programmable processors that are optimized for executing
programs expressed using the stream programming model. A block diagram of a
stream processor The stream processor operates as a coprocessor under the control
of the host processor, which is often a standard general-purpose CPU. A stream
program executing on the host processor orchestrates the sequence of kernels to
be executed and then necessary transfer of input and output data streams between
the stream processor and o�-chip memory. Kernel execution takes place directly on
the stream processor from instructions stored in the microcontroller. New kernels
may be loaded into them microcontroller as needed, possibly under explicit control
of the host processor. The sequence of operations initiated by the host processor to
Stream Computing
6Division Of Computer Science and Engineering ,SOE,CUSAT
orchestrate the stream program .Dependencies between these host-issued operations
. The host interface of the stream processor issues the commands received from the
host to the appropriate units as resources become available, subject to dependencies
among the commands. Arithmetic units of the stream processor are grouped in to n
identical compute clusters. Each cluster consists of several functional units and
associated registers. A block diagram of an example cluster organization. The local
register files (LRFs) attached to each functional unit provide the input operands for
that unit, and results are written to one or more of the LRFs via the intra cluster
network . Loop-carried state and other shared data may be communicated among
Block diagram of stream processor architecture
4.2Processing stages in Stream Processing systems
"A model that uses sequences of data and computation kernels to expose and
exploit concurrency and locality for efficient. When using such a board for stream
processing, a common system model is to distribute data out from the FPGA to
Stream Computing
7Division Of Computer Science and Engineering ,SOE,CUSAT
other processors in the multicomputer, either in a round-robin or a next-avail-
able-processor fashion FPGA toolkits provide drivers and a software library for
managing these complex data movement strategies as well as interfaces for a wide
range of board-related features, including node configuration , temperature and
current sensors and control bus access. They also provide elements such as IP block
libraries, simulation environments, BSPs, algorithm libraries and middleware.
Processing stages in Stream Processing systems
Stream Computing
8Division Of Computer Science and Engineering ,SOE,CUSAT
4.3 StreamIt Language Overview
StreamIt is an architecture-independent language for streaming applications. It
adopts the Cyclo-Static Dataflow [1] model of computation which is a
generalization of Synchronous Dataflow . StreamIt programs are represented as
graphs where nodes represent computation and edges represent FIFO-ordered
communication of data over tapes.
The basic programmable unit in StreamIt is a filter.Each filter contains a work
function that executes atomically, popping (i.e., reading) a fixed number of item
from the filters input tape and pushing (i.e., writing) a fixed number of items to the
filters output tape. A filter may also “peek” at a given index on its input tape
without consuming the item; this makes it simple to rep-resent computation over a
“sliding-window”. The push,pop, and peek rates are declared as part of the work
function, thereby enabling the compiler to construct a static schedule of filter
Stream Computing
9Division Of Computer Science and Engineering ,SOE,CUSAT
firings StreamIt provides three hierarchical structures for composing filters into
larger stream graphs .
4.4 StreamIt language Constructs
Programming paradigm is modular. Important for large scale development
.Parametrized templates allows program to change behavior with small source code
modifications ,it shows the malleability property. Composition of simple structures
create and large graphs.It enables inductive reasoning about correctness.
Application is architecture independent.
Stream Computing
10Division Of Computer Science and Engineering ,SOE,CUSAT
4.5 Filter as computational elements
Filters are the programmable units. An initialization function and a steady sate
work fuction.communicate via FIFO’s:pop(),peek(index),push(value)
float→float filter FIR (int N) {
float[N] weights;
init {
weights = calculate_weights(N
}
work push 1 pop 1 peek N {
float result = 0;
for (int i = 0; i < N; i++) {
result += weights[i] * peek(i);
}
push(result);
pop();}}
Stream Computing
11Division Of Computer Science and Engineering ,SOE,CUSAT
The filter can now serve as a module that is incorporated into stream graphs as
necessary for example as part of an acoustic beam former. A filter is akin to a
classin object oriented programming with the work function serving as the main
method. A filter may also declare a constructor function to initialize the filter state
before any other method is invoked. The implementation of the work function in
StreamIt obviates the need for explicit buffer management. The application
developer instead focuses on the hierarchical assembly of the stream graph and its
communication topology.
4.6 Applications
Stream processing is essentially a compromise, driven by a data-centric model that
works very well for traditional DSP or GPU-type applications (such as image,
video and digital signal processing) but less so for general purpose processing with
more randomized data access (such as databases). By sacrificing some flexibility in
the model, the implications allow easier, faster and more efficient execution.
Depending on the context, processor design may be tuned for maximum efficiency
or a trade-off for flexibility.
Stream processing is especially suitable for applications that exhibit three
application characteristics. Compute Intensity the number of arithmetic operations
per I/O or global memory reference. In many signal processing applications today it
is well over 50:1 and increasing with algorithmic complexity .Data Parallelism
exists in a kernel if the same function is applied to all records of an input stream
and a number of records can be processed simultaneously without waiting for
results from previous records. Data Locality is a specific type of temporal locality
common in signal and media processing applications where data is produced once,
read once or twice later in the application , and never read again. Intermediate
streams passed between kernels as well as intermediate data within kernel functions
can capture this locality directly using the stream processing programming model.
Stream Computing
12Division Of Computer Science and Engineering ,SOE,CUSAT
Example application radar Front-end
Stream Computing
13Division Of Computer Science and Engineering ,SOE,CUSAT
5. StreamIt Compiler
StreamIt compiler hides granularity of execution and architecture details. compiler
backend supports uniprocessor ,cluster of workstations and MIT Raw. Innovative
compiler
Technology focuses on the core set of challenges to deliver high performance in
future architectures .Automating domain specific Optimizations by optimizing of
linear streams and translation of the frequency domain .It helps in partitioning,
routing etc.
5.1 Linear Filter optimizations
StreamIt provides three hierarchical structures for composing filters into larger
stream graphs (see Figure 1). The pipeline construct composes streams in sequence,
Stream Computing
14Division Of Computer Science and Engineering ,SOE,CUSAT
with the output of one connected to the input of the next. The split join construct
distributes data to a set of parallel streams, which are then joined together in a
round robin fashion. The feedback loop provides a mechanism for introducing
cycles in the graph. An ex-ample of a pipeline appears in Figure 2. It contains as
ingle FIR (Finite Impulse Response) filter which could be implemented as follows.
Stream Computing
15Division Of Computer Science and Engineering ,SOE,CUSAT
5.2 Extracting Linear Representation
Resembles constant propagation
Maintains linear form �v, b� for each variable
Peek expression: generate fresh v
Push expression: copy v into A
Pop expression: increment o
Stream Computing
16Division Of Computer Science and Engineering ,SOE,CUSAT
5.3 Combining Linear Filters
Pipelines and split joins can be collapsed .for example
pipeline. Given below figure describes about the Combinational example
COMBINATION EXAMPLE
Stream Computing
17Division Of Computer Science and Engineering ,SOE,CUSAT
5.4 Linear optimization of Stream graph
Stream Computing
18Division Of Computer Science and Engineering ,SOE,CUSAT
5.5 Backend for parallel platforms
StreamIt exposes communication pattern Automatic generation and optimization of
routing code Otherwise, may require extensive (assembly) programming FIR –
Raw backend. 15 statements of StreamIt code achieve the same performance as 352
statements of manually-tuned C. Frequency Hopping Radio-cluster backend 50%
higher throughput and 35% less communication, when using StreamIt’ messaging
construct.
Stream Computing
19Division Of Computer Science and Engineering ,SOE,CUSAT
6. Development Support
6.1 StreamIt Development tool
The StreamIt Development Tool (SDT) features many aspects of an
IDE, including a text editor and a debugger. For example, the SDT
debugger supports line. and method breakpoints, watchpoints, program
suspen-sion, code stepping, variable inspection and value modification
to list a few.Moreover, the SDT offers features tailored to the StreamIt
language. The SDT graphically represents StreamIt programs, and
preserves hierarchical information to allow an application engineer to
focus on the parts of the stream program that are of interest. In
addition, the SDT can track the flow of data between filters and most
importantly, it provides a deterministic mechanism to debug parallel
streams.The SDT is implemented in Java as an Eclipse [3]plug-in. The
Eclipse universal tools platform is an ex-tensible development
environment. We leverage the built-in user interfaces for editing and
viewing files, the resource management system, the documentation
infrastructure, and the runtime support of launching, runningand
Stream Computing
20Division Of Computer Science and Engineering ,SOE,CUSAT
debugging programs.
6.2 Debugging Parallel StreamIt programs
Parallelism and communication are exposed. Tracking the flow of data in a stream
graph affords a frame of reference for reasoning about “time”. Powerful advantage
when debugging parallel programs versus Multiple threads with independent
program counters Non-deterministic execution.
Stream Computing
21Division Of Computer Science and Engineering ,SOE,CUSAT
6.3 StreamIt graphical editor
We are making available the first public version of the StreamIt Development Tool
(SDT). It is implemented in Java as an Eclipse plug in ,and intended for
developing, debugging, and visualizing programs written in StreamIt. As a
graphical programming environment, it can simply and intuitively convey the
hierarchical and structured nature of a StreamIt application. In addition, the
Stream Computing
22Division Of Computer Science and Engineering ,SOE,CUSAT
debugger can interpret and visually represent the stream graph and its dynamic
behavior, including the flow of information in parallel stream graphs. The SDT is
composed of the following modules: an IDE-integrated debugger, a graphical text
editor, a runtime stream graph view, and a corresponding graph overview.
6.4 StreamIt Debugging Environment
As seen in Figure , a StreamIt program can be visually depicted as a
hierarchical directed graph of streams, with graph nodes representing streams and
graph edges representing tapes or channels. The containers are rendered according
to the code declarations, and the visualization tools in the SDT allow the user to
selectively collapse and expand containers. This is useful in large streams where the
application developers are only interested in visualizing a particular subset, for
example to verify the interconnect topology of the graph. In Figure , we show a
screen shot of the SDT for a simple StreamIt program which consists of a filter that
generates input data (Int Source), a split join (Echo) that operates on the data
Stream Computing
23Division Of Computer Science and Engineering ,SOE,CUSAT
produced by the source and whose data is in turn consumed by an Adder. Lastly, a
filter (IntPrinter) reads and prints the computed values
7. Conclusion
Stream processing has been shown to outperform mainstream programmable
computing solutions while consuming less power for data parallel applications.
Exploiting the data- and instruction-level parallelism inherent in these applications,
stream processors sustain many operations in parallel, and overlap them with
memory accesses in order to improve computation throughput. Realizing the
performance potential of stream processing, however, depends on the ability to
manage bandwidth demands in the memory hierarchy to sustain the operands
needed for highly parallel computation.
We introduced an indexed stream register file architecture that enabled data
reuse patterns found in a broad range of data parallel applications to be captured in
on chip memories of stream processors, reducing o� chip bandwidth demands by
several fold in some cases. This, in e�ect , enables classes of data parallel
applications that ,due to bandwidth bottlenecks, could not previously be e�ciently
executed on stream processors to be supported e�ciently .
Stream Computing
24Division Of Computer Science and Engineering ,SOE,CUSAT
8. References
1 . http://cag.csail.mit.edu/streamit
“StreamIt Homepage”
2. http://www.csm.ornl.gov/workshops/SOS11/presentations/d_rich.pdf
“welcome to the new era of stream computing ppt”