Comparison of Multimedia SIMD, GPUs and Vector

Architectures(Data Parallelism – Hennessy Section 4.4)

By-Harsh Prasad

2008CS50210

05-Apr-12CSL718

Introduction● A common way to increase parallelism among

instructions is to exploit data parallelism among independent iterations of a loop

● SIMD architectures can exploit significant data-level parallelism for:○ matrix-oriented scientific computing○ media-oriented image and sound processors

● SIMD is more energy efficient than MIMD● SIMD Parallelism

○ Vector architectures○ SIMD extensions○ Graphics Processor Units (GPUs)

● These architectures are designed to execute Data Level parallel Programs

05-Apr-12CSL718

Vector Architectures

● Read sets of data elements into “vector registers”● Operate on those registers● Disperse the results back into memory● Example: VMIPS

● Improvements○ Multiple Lanes○ Gather-Scatter Memory Addressing

05-Apr-12CSL718

05-Apr-12

Basic Structure of Vector Register Architecture (Vector MIPS)

VLR Vector Length RegisterVM Vector Mask Register

Vector Load-Store Units (LSUs)

Multi-Bankedmemoryfor bandwidth and latency-hiding

PipelinedVector Functional Units

Vector Control Registers

Each Vector Registerhas MVL elements(each 64 bits)

MVL = Maximum Vector Length

CSL718

SIMD Extensions● Media applications operate on data types narrower than

the native word size

● Limitations, compared to vector instructions:○ Number of data operands encoded into op code○ No sophisticated addressing modes (stride, scatter-gather)○ No mask registers

05-Apr-12CSL718

Graphics Processing Unit

● Offers higher potential performance than traditional multicore computers.

● Heterogeneous execution model○ CPU is the host, GPU is the device

● Develop a C-like programming language for GPU● Unify all forms of GPU parallelism as CUDA (Compute

Unified Device Architecture) thread● Programming model is “Single Instruction Multiple

Thread”

05-Apr-12CSL718

Comparison: Vector Architectures and GPUs

05-Apr-12CSL718

many lanes in GPU, therefore GPU chimes are smaller

compiler manages mask register explicitly in software

Implicitly using branch synchronization markers and internal stack to save, complement and restore masks.

Vector processor and a multithreaded SIMD Processor of a

05-Apr-12CSL718

Supplies scalar operands for scalar-vector operations, increments addressing for unit and non-unit stride accesses to memory

one PC per SIMD thread

Ensures High Memory Bandwidth

05-Apr-12CSL718

GPU have hardware supportfor Multithreading

VMIPS register holdsthe entire vector.

Vector is spread across the registers of SIMD lanes.

05-Apr-12CSL718

● Memory Latency is hidden by paying latency once per load/store instructions in Vector Architecture. GPU hides it using Multithreading.

● Conditional Branch Mechanism of GPU handles Strip-Mining problem of Vector Architectures by iterating the loop until all the SIMD lanes reach the loop bound.

Comparison: Multimedia SIMD Computers and

05-Apr-12CSL718

Scalar processor and Multimedia instructions are separated by an I/O bus in GPUs with separate main memories.

● Also, Multimedia SIMD instructions do not support scatter-gather memory accesses.

● In short it can be said that GPUs are multithreaded SIMD

processors with more number of lanes, processors and better hardware for multi-threading.

05-Apr-12CSL718

Thank You

05-Apr-12CSL718

Comparison of Multimedia SIMD, GPUs and Vector

Documents

Transcript of Comparison of Multimedia SIMD, GPUs and Vector

Bottlenecks of SIMD Haibin Wang Wei tong. Paper Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements One IEEE.

SIMD meets ES7

Data-Parallel Execution using SIMD Instructions · Data-Parallel Execution using SIMD Instructions Intrinsics intrinsics provide an interface to SIMD instructions without writing

Optimisation et parallélisation de code pour processeur à instructions SIMD multimedia François Ferrand.

Linear Algebra on GPUs Vasily Volkov. GPU Architecture Features SIMD architecture – Don’t be confused by scalar ISA which is only a program model We use.

Computer Architecture: SIMD and GPUs (Part I)ece740/f13/lib/exe/fetch.php?media=onur...Computer Architecture: SIMD and GPUs (Part I) ... A vector processor is one whose instructions

DATA-LEVEL PARALLELISMIN VECTOR, SIMD GPU A (P 2) … · 2016-04-05 · GPUS–NVIDIA ARCHITECTURE NVIDIA GPU has 32,768 32-bit registers Divided across the SIMD lanes Each SIMD thread

TDT24 Fast Sort on CPUs and GPUs: A Case for Bandwidth ... › elster › tdt24 › tdt24-f12 › ... · Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort Fast

Computer Architecture: SIMD and GPUs (Part I)

Intel SIMD architecture

Lecture 24: SIMD Processors and GPUs

Copyright Deependra Tallalca.ece.utexas.edu/pubs/deepu_talla_dissertation.pdf · On multimedia ap-plications, a 2-/4-way SIMD GPP augmented with MediaBreeze hardware is superior to

Scalable Machine Learning - Centrum Wiskunde & … Machine...• hardware for deep learning –CPUs (SIMD), GPUs, TPUs • parallel training: does deep learning scale? –Trivially

Bottlenecks in Multimedia Processing with SIMD style Extensions …lca.ece.utexas.edu/pubs/toc02_deepu_draft.pdf · 2005-02-02 · with SIMD extensions do not match very well with

Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Introduction SIMD Parallelismgec.di.uminho.pt/Discip/MInf/cpd1112/CSP/DataParallelism...SIMD Parallelism " Vector architectures " SIMD extensions " Graphics Processor Units (GPUs)

Time Optimization of HEVC Encoder over X86 Processors using SIMD Kushal Shah 1000857252 kushal.shah7@mavs.uta.edu Advisor: Dr. K. R. Rao Spring 2013 Multimedia.

Optimising Purely Functional GPU Programskeller/Papers/acc-optim.pdf · 2013-11-04 · Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs.

Lecture 15 Multimedia Instruction Sets: SIMD and Vector Christoforos E. Kozyrakis (kozyraki@cs.berkeley.edu) CS252 Graduate Computer Architecture University.

SIMD Programming and What You Must Know about …Contents 1 Introduction 2 SIMD Instructions 3 SIMD programming alternatives Auto loop vectorization OpenMP SIMD Directives GCC’s