ECEN 5653/4653 RT Digital Media Systems Hardware and...

January 24, 2012 Sam Siewert

ECEN 5653/4653

RT Digital Media Systems

Hardware and Software

Funadamentals

Lecture 2

RT Digital Media Systems Embedded Systems – Set-Top Boxes and IPTV

– Mobile Media: Smart Phone, Tablet, eBook Readers, Netbooks, Blue-Ray & DVD Players, iPODs, etc.

– Consumer/Pro-sumer/DVB/DCI Digital Camera Systems (SD, HD, HD-SDI, 2K, 4K, 6K)

Resolutions/Formats - http://en.wikipedia.org/wiki/File:Vector_Video_Standards2.svg

– Game Consoles: X-box, PS3, Wii, Nintendo

– Mobile Systems and Cloud-based Media Driving Innovation

Scalable Systems (Head-End, Cloud, CDN) – Post Production for Digital Cinema, TV, Web

2K, 4K, 6K Streams from Digital Cameras

Frame/Color Editing, CGI (Computer Generated Imagery), Soundtrack, Write to Distribution Media

– Digital Cable Head-Ends: Server 10K+ Customers, Broadcast, On-Demand, Guide Data, DOCSIS Internet, VoIP

– IPTV Head-Ends: Internet, Switched-Digital Video, On-Demand

– Web/CDN Viral Video and Social Networking Video/Audio Streaming

– Digital Cinema: HD Digital Projectors, 3D Digital Projectors

– Cloud – iTunes, Hulu, Netflix, Sony Store, Xfinity, eBooks, GoogleTV

– Augmented Reality

– Closed Circuit Security Systems: Multi-Camera NTSC/HD Sam Siewert 2

http://en.wikipedia.org/wiki/File:Vector_Video_Standards2.svg

Old School Media NTSC OTA (1941, 1953 color, 2009 dead)

– Analog, Interlaced, Continuous OTA Broadcast Transmission

– Tuner with Immediate CRT Display

– No Buffers, No Routing, No De-mux

– No Compression

Analog Cable

AM/FM OTA

Film Projectors

Sam Siewert 3

New Digital Media Digital Cable – QAM 256, 30+ Mbps, 10+ MPEG Programs per 6Mhz Channel

– Minimal Buffering (In Set-top Box for Digital Tuning and On-Demand)

– Dedicated Coaxial RF Carrier (Hybrid Fiber to Coaxial Networks)

– On-Demand, Trick-Play, Start-Over

– DOCSIS for Internet and Return Path (Streaming Control)

ATSC Digital OTA – Supports HD 1080p or Multiple SD Programs per 6Mhz Channel

– Digital Modulation (8VSB) at 19+ Mbps per Channel

Digital Cinema – 1080p, 2K, 4K Resolutions

– Automated Digital Delivery and Projection

IPTV, IP Radio and Mobile Media – Routed, Buffered, Compressed

– Multiplexed Video/Audio Transport Streams

– File Download or Network Streaming

– Streaming over UDP or RTP/UDP with RTSP Most Often, No Re-transmission

Sam Siewert 4

Differences Analog vs Digital Encoding for Transmission – NTSC Frequency Modulation on Channels

– Broadband QPSK, QAM, 8VSB OTA

– Baseband Packet Switched Networks (Optical, Ethernet)

Routed (Diversely?)

Buffered

Compressed

Multiplexed (Shares Transmission Carrier)

Transported by IP (Large Packets)

QoS?

Continuous Transmission with Instant Tuning vs. Digital Network Streaming vs. Download and Playback (e.g. YouTube)

Sam Siewert 5

NTSC (Analog TV)

Sam Siewert 6

AM Video to CRT

FM Audio

Chroma Added Later

Odd/Even Lines (Interlaced)

29.97 FPS (30 before color)

Vertical Blanking (CRT Retrace Time, Closed Captioning)

525 Lines, 262.5 per Field, 60 Fields per Second

Sam Siewert 7

MPEG2 Fundamentals Basic Head-End Broadband MPEG2 System

PCI

QAM-RF

DVB-ASI

Server

DVB-ASI

Analyzer

STBs

QAM-SA

IP

Network

SPTS

Playback MPTS

Playback

QAM Driver

Control Interface

Video Services

Bit-streams Pre-mux

Tools

PRO-1000 Quad

Broadcast

VoD

Services

Config &

Playlist

Linux in Digital Media

Common in Digital Cable Set-Top Boxes

Common in Android Mobile Media

Used in Digital Video VoD Head-Ends

Used in Post Production

Common for IPTV

Used in ECEN 5653

Sam Siewert 8

Digital Transport QoS Latency – To Tune in a Program, Turn-on

– To Deliver a Video Frame or Audio PCM Sample

– To Start, FF, REW, Start-Over, Pause

Bandwidth – Resolution, Lossy/Lossless Compression, High Motion

– Pixel Encoding for Color

– Frame Rate

– Constant Bit-rate Transport?

– Variable Bit-rate Transport and Encoding?

Jitter – Decode and Presentation Rates

– Elasticity in Decode to Presentation Buffering Necessary

Sam Siewert 9

January 21, 2008 Sam Siewert

Linux System Options

(Linux for Soft Real-time and Embedded

Systems)

Sam Siewert 11

Outline Many-Core Linux Host(s)

– Intel Nehalem, Westmere, …, Atom CE

– AMD Shanghai Quad/Quad-core

– Cavium MIPS64, Tilera, ARM Coretex

Alternative Cell Broadband Engine Architecture with Linux

– The Cell Broadband Engine chip: High-speed offload for the masses

GP-GPU Vector Processing PCI-E (NVIDIA Tesla/Fermi, AMD, Intel Knight’s Ferry)

Liu and Layland Paper Discussion

– Digital Video and Audio Encoding

– Digital Media Capture, Post Production, Delivery, Playback

CPU Scheduling Overview

– Scheduling Methods and Classes

– Policy, Feasibility

– Tuning Execution

NPTL – Native POSIX Threads Library

NPTL Example Code Walkthrough

http://www.ibm.com/developerworks/power/library/pa-soc12/index.html



Sam Siewert 12

Conceptual View of RT Resources Three-Space View of Utilization Requirements

– CPU Margin?

– IO Latency (and Bandwidth) Margin?

– Memory Capacity (and Latency) Margin?

Upper Right Front Corner – Low-Margin

Origin – High-Margin

Mobile – Must Consider Battery Life Too (Power)

CPU-Utility

IO-Utility

Memory-Utility

Processing – Initial Focus

Processing and Scaling Frame

Transformation, Encode, Decode is

Critical

Memory for Buffering (Frame

Transformations, CPU Integrated or GPU

Offloaded – e.g. Linux VDPAU)

I/O for Networking (Transport)

I/O for Storage (On-Demand, Post, Non-

Linear Editing)

Sam Siewert 13

Flynn’s Computer Architecture

Taxonomy Single Instruction Multiple Instruction

Single Data SISD (Traditional Uni-

processor)

MISD (Voting schemes

and active-active

controllers)

Multiple Data SIMD (e.g. SSE 4.2, GP-

GPU, Vector Processing)

MIMD (Distributed

systems (MPMD),

Clusters with MPI/PVM

(SPMD), AMP/SMP)

Sam Siewert 14

GPC has gone MIMD with SIMD Instruction Sets and SIMD Offload

(GP-GPU) NUMA vs. UMA (Trend away from UMA to NUMA or MCH vs. IOH)

SMP with One OS (Shared Memory, CPU-balanced Interrupt Handling,

Process Load Balancing, Mutli-User, Multi-Application, CPU Affinity

Possible)

MIMD - Single Program Multi-Data vs. Multi-Program Multi-Data

Sam Siewert 15

CPU Scheduling Taxonomy Execution Scheduling

Global-MP Local-Uniprocessor

Distributed Asymmetric

(AMP )

Symmetric

(SMP OS)

Preemptive Non-Preemptive

Fixed-Priority

Hybrid

Dynamic-Priority Cooperative

Batch

FCFS SJN

Co-Routine Continuation

Function

Heuristic EDF/LLF RR Timeslice

(desktop)

Multi-Frequency

Executives

Static Dynamic

Rate

Monotonic

Deadline

Monotonic

Dataflow

(Preemptive, Non-Preemptive Subtree

Under Each Global-MP Leaf)

SMT

(Micro-Paralell)

Sam Siewert 16

A Service Release and Response Ci WCET

Input/Output Latency

Interference Time

Event

Sensed Interrupt Dispatch Preemption Dispatch

Interference

Completion

(IO Queued)

Actuation

(IO Completion)

Input-Latency

Dispatch-Latency

Execution Execution

Output-Latency

Time

Response Time = TimeActuation – TimeSensed

(From Release to Response)

Sam Siewert 17

Many-Core MIMD Thread Scaling Symmetric MP and NUMA Many-Core Thread Scaling

SMP – All Memory Access is Uniform Latency, Full Load and Resource Balancing

NUMA – Non-Uniform Memory Access, Requires Affinity to Avoid High Latency Access

0.145615 0.287925 0.681104

1.148319

2.296593

4.593669

9.481171

14.355285

0.099654 0.155106 0.39617 0.622645 1.197544

2.345302

4.670147

7.244723

0

2

4

6

8

10

12

14

16

2 4 8 16 32 64 128 200

Tim

e (

se

c)

Number of Threads

Increasing Work and Threads on Xeon

ST

MT

Sam Siewert 18

Cell Broadband Engine Scaling –

PS3 One SMT PPE, 8 SPEs (Synergistic Proc Elements)

PPE is Symmetric Multi-threaded PowerPC

SPEs are Vector Procs – Send Code and Data Over Racetrack Network


1.68

1.7

1.72

1.74

1.76

1.78

1.8

1.82

2 4 8 16 32 64 128 256

Sp

ee

du

p

Number of Threads

Speedup with Increasing Threads

Speedup




SIMD Vector Instructions

Intel MMX, SSE 1, 2, 3, 4.x Code Generation

Using SIMD Extensions to Accelerate Algorithms (Edge Enhancement) – http://software.intel.com/en-us/articles/using-intel-

streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms/

Sam Siewert 19

PSF

http://software.intel.com/en-us/articles/using-intel-streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms/




























Sam Siewert 20

Offload, Co-Proc, Vector Proc 1. GPU (Graphics Processing Units)

– Evolved for Consumer CGI and Games Physics Engines

3D Rendering + Texture (4D Vector Operations)

Game Engines and Simulation

HD Output: HDMI, HD-SDI, Headless GP-GPU

– Higher End Used for Digital Cinema / Post Production, Broadcast

PNY Quadro FX

NVIDIA CUDA for Post

– GP-GPU Being Used to Accelerate Encode, Transcode, Trans-rate, etc. - http://www.elementaltechnologies.com/

2. Built-In SIMD Instruction Set Extensions – Intel SSE

http://www3.pny.com/Communities/HDBroadcastFilm.aspx




http://www.nvidia.com/object/io_1252562432787.html

http://www.elementaltechnologies.com/

GP-GPU, What Is It? Ideal for Large Bitwise,

Integer, and Floating

Point Vector Math

Flynn’s Taxonomy

SIMD Architecture often

leverages GP-GPU Co-

Processors or Cell for

MPMD

21

Single Instruction/Prog Multiple Instruction

Single Data SISD (Traditional Uni-

processor)

MISD (Voting schemes

and active-active

controllers)

Multiple Data SIMD (SSE 4.2, Vector

Processing)

SPMD (Single Program

Multiple Data), GP-GPU

MIMD (Distributed

systems (MPMD),

Clusters with MPI/PVM

(SPMD), AMP/SMP)

SSE – Streaming SIMD

Extensions 128-bit registers known as XMM0 through XMM7

Large Operands and Operators (Multi-Word)

E.g. 128-bit XOR of Two Operands

Multiple Multiply and Accumulate Operations for Floating Point (DSP Kernel Operations) – E.g. 4 Component Vector addition

– 4 Single Precision Pixel Multiply and Accumulate in Single Instruction

Sam Siewert 22

vec_res.x = v1.x + v2.x;

vec_res.y = v1.y + v2.y;

vec_res.z = v1.z + v2.z;

vec_res.w = v1.w + v2.w;

16 operations

to load 2 operands, add, store

movaps xmm0,address-of-v1

addps xmm0,address-of-v2

movaps address-of-vec_res,xmm0

3 SSE operations to load, add, store ;xmm0=v1.w | v1.z | v1.y | v1.x

;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x

Scheduling Parallel/Cluster HW MIMD – OS SMP threading, provides load balancing, affinity operations, routable

interrupts (e.g. MSI-X), e.g. NPTL

– RTOS AMP is most often used in Embedded Systems

MPMD – OpenCL, CUDA, DirectCompute (DirectX extension)

– Cell BBE Developer’s Kit

– Intel OpenMP, Linux Cluster, MPI

Note on OS/CPU Virtualization and Digital Media – Hypervisors

Type 1 - run directly on the host's hardware to control the hardware and to monitor guest operating systems, guest operating system thus runs on another level above the hypervisor (e.g. VMWare ESXi)

Type 2 - hypervisors run within a conventional operating system environment. With the hypervisor layer as a distinct second software level, guest operating systems run at the third level above the hardware (e.g. VMWare for Windows)

– Enables Guest OS to Share Resources on System

– Typically DM Scales without Virtualization due to Client/Server Workload, but can Exploit for IT reasons

Sam Siewert 23

Sam Siewert 24

Elements of a Scheduling Class Scheduling Policy

– How is Dispatch Decision Made? – Non-Preemptive, Cooperative or Batch (Hard Coded) – Preemptive

Fixed Priority Encoding – Rate Monotonic (Shortest Period Gets Highest Priority) – Deadline Monotonic (Shortest Deadline Gets Highest Priority)

Dynamic Priority - Programmed Priorities – EDF or Deadline Driver - Earliest Deadline Gets Highest Priority, Updated Continuously – LLF (Least Laxity First) – Most Urgent Deadline Gets Highest Priority, Updated Continuously

Heuristic (Fuzzy Logic Scheduler, Heuristically Guided Iterative Repair)

Scheduling Feasibility Determination – Will Schedule Work? – Can a Set of Services Be Scheduled Given:

CPU Resources Available I/O Resources Available Memory Resources Available

– RM LUB (Next Week) – Lechoczky, Sha, Ding Theorem (Next Week) – EDF Feasibility (Several Weeks Away)

Ability to Tune Schedule – If Actuals Differ From Expected

WCET Expected vs. Observed Maximum Release Frequency for a Service – Expected vs. Observed

Sam Siewert 25

Real-Time Service Types Types of Services – Hard Real-Time (Flight Software, Anti-Lock Braking)

– Soft Real-Time (Multi-media, Audio, Video, Virtual Reality)

– Best Effort (E.g. Desktop Applications)

– Isochronal Hard Real-Time (Digital Feedback Control Systems)

– Isochronal Soft Real-Time (Continuous Media, Video, Audio)

Real-Time Service Types in Terms of Utility – Utility Curve Shows Value/Harm of Response Over Time

From Release

Both Before and After Deadline Relative to Release

– Full Utility - Service Performs as Required

– Zero Utility- Service is Not Provided Drop-out Causes No Harm

– Negative Utility Harm to System and/or User and Significant Loss of Assets

Sam Siewert 26

Hard Real-Time Service Utility Deadline

Utility

Time

Release

100%

0%

After Deadline, Utility is Negative

Sam Siewert 27

Soft Real-Time Service Utility Deadline

Utility

Time

Release

100%

0%

F(t)

After Deadline, Utility Diminishes

According to Some Function F(t)

Sam Siewert 28

Best Effort Service Utility

Deadline Does Not Exist Utility

Time

Release

100%

0%

Sam Siewert 29

Isochronal Hard Real-Time Utility

Deadline

Utility

Time

Release

100%

0%

After Deadline, Utility is Negative Before Deadline, Utility is Negative

Sam Siewert 30

Isochronal Soft Real-Time Utility (QoS Digital Media – Requires Buffering)

Deadline

Utility

Time

Release

100%

0%

After Deadline, Utility is < 100% Before Deadline, Utility is < 100%

F(t) F(t)

Sam Siewert 31

How Does NPTL Work? No Thread Manager or M-on-N Mapping – Previous POSIX Threading Model

– Manager Becomes Bottleneck

– Two-Level Scheduling Not Deterministic

– Many Pthreads (M) to N Kernel Threads Still an Issue

– O(n) Scheduling for each Manager

Direct Mapping of User to Kernel Thread or 1-to-1 – User Space Pthread Maps Directly onto Kernel Thread (Requires Root

privilege)

– Deterministic (Non-Determinism due to Kernel Preemptability Issues)

– O(1) Scheduling

Scheduling Policies Selectable

Similar to RTOS Tasking

Sam Siewert 32

Linux NPTL Scheduling Policies Fixed Priority Preemptive – SCHED_FIFO – This is Priority Preemptive

– SCHED_RR – This is Fair, but at Kernel Level

– SCHED_OTHER – This is OS default and should not be used

POSIX Threads have – Policy (FIFO, RR, OTHER)

– Priority (RT min to RT max)

– Creation (Fork)

– Join (Wait for thread completion at rendezvous)

– Synchronization Methods Semaphores

Message Queues

– Asynchronous Communication Methods Signals

Queued Signals

POSIX RT Extensions Include – Virtual Timer Services

– Signals Tied to Timer Services

– Priority Inversion Protection (Availability on Linux TBD)

July 7, 2004 Sam Siewert

NPTL Coding

Code Walk-through

Thread Scheduling Policy

Sam Siewert 34

pthread_attr_init(&rt_sched_attr);

pthread_attr_setinheritsched(&rt_sched_attr, PTHREAD_EXPLICIT_SCHED);

pthread_attr_setschedpolicy(&rt_sched_attr, SCHED_FIFO);

rt_max_prio = sched_get_priority_max(SCHED_FIFO);

rt_min_prio = sched_get_priority_min(SCHED_FIFO);

rt_param.sched_priority = rt_max_prio-1;

rc=sched_setscheduler(getpid(), SCHED_FIFO, &rt_param);

pthread_attr_getscope(&rt_sched_attr, &scope);

if(scope == PTHREAD_SCOPE_SYSTEM)

printf("PTHREAD SCOPE SYSTEM\n");

else if (scope == PTHREAD_SCOPE_PROCESS)

printf("PTHREAD SCOPE PROCESS\n");

else printf("PTHREAD SCOPE UNKNOWN\n");

Thread Creation and Join

Sam Siewert 35

rc = pthread_create(&main_thread, &main_sched_attr, testThread, (void *)0);

if (rc) { printf("ERROR; pthread_create() rc is %d\n", rc); perror(NULL); exit(-1); }

pthread_join(main_thread, NULL);

if(pthread_attr_destroy(&rt_sched_attr) != 0) perror("attr destroy");

ECEN 5653/4653 RT Digital Media Systems Hardware and...

Documents

Transcript of ECEN 5653/4653 RT Digital Media Systems Hardware and...