ECEN 5653/4653 RT Digital Media Systems Hardware and...
Transcript of ECEN 5653/4653 RT Digital Media Systems Hardware and...
January 24, 2012 Sam Siewert
ECEN 5653/4653
RT Digital Media Systems
Hardware and Software
Funadamentals
Lecture 2
RT Digital Media Systems Embedded Systems – Set-Top Boxes and IPTV
– Mobile Media: Smart Phone, Tablet, eBook Readers, Netbooks, Blue-Ray & DVD Players, iPODs, etc.
– Consumer/Pro-sumer/DVB/DCI Digital Camera Systems (SD, HD, HD-SDI, 2K, 4K, 6K)
Resolutions/Formats - http://en.wikipedia.org/wiki/File:Vector_Video_Standards2.svg
– Game Consoles: X-box, PS3, Wii, Nintendo
– Mobile Systems and Cloud-based Media Driving Innovation
Scalable Systems (Head-End, Cloud, CDN) – Post Production for Digital Cinema, TV, Web
2K, 4K, 6K Streams from Digital Cameras
Frame/Color Editing, CGI (Computer Generated Imagery), Soundtrack, Write to Distribution Media
– Digital Cable Head-Ends: Server 10K+ Customers, Broadcast, On-Demand, Guide Data, DOCSIS Internet, VoIP
– IPTV Head-Ends: Internet, Switched-Digital Video, On-Demand
– Web/CDN Viral Video and Social Networking Video/Audio Streaming
– Digital Cinema: HD Digital Projectors, 3D Digital Projectors
– Cloud – iTunes, Hulu, Netflix, Sony Store, Xfinity, eBooks, GoogleTV
– Augmented Reality
– Closed Circuit Security Systems: Multi-Camera NTSC/HD Sam Siewert 2
Old School Media NTSC OTA (1941, 1953 color, 2009 dead)
– Analog, Interlaced, Continuous OTA Broadcast Transmission
– Tuner with Immediate CRT Display
– No Buffers, No Routing, No De-mux
– No Compression
Analog Cable
AM/FM OTA
Film Projectors
Sam Siewert 3
New Digital Media Digital Cable – QAM 256, 30+ Mbps, 10+ MPEG Programs per 6Mhz Channel
– Minimal Buffering (In Set-top Box for Digital Tuning and On-Demand)
– Dedicated Coaxial RF Carrier (Hybrid Fiber to Coaxial Networks)
– On-Demand, Trick-Play, Start-Over
– DOCSIS for Internet and Return Path (Streaming Control)
ATSC Digital OTA – Supports HD 1080p or Multiple SD Programs per 6Mhz Channel
– Digital Modulation (8VSB) at 19+ Mbps per Channel
Digital Cinema – 1080p, 2K, 4K Resolutions
– Automated Digital Delivery and Projection
IPTV, IP Radio and Mobile Media – Routed, Buffered, Compressed
– Multiplexed Video/Audio Transport Streams
– File Download or Network Streaming
– Streaming over UDP or RTP/UDP with RTSP Most Often, No Re-transmission
Sam Siewert 4
Differences Analog vs Digital Encoding for Transmission – NTSC Frequency Modulation on Channels
– Broadband QPSK, QAM, 8VSB OTA
– Baseband Packet Switched Networks (Optical, Ethernet)
Routed (Diversely?)
Buffered
Compressed
Multiplexed (Shares Transmission Carrier)
Transported by IP (Large Packets)
QoS?
Continuous Transmission with Instant Tuning vs. Digital Network Streaming vs. Download and Playback (e.g. YouTube)
Sam Siewert 5
NTSC (Analog TV)
Sam Siewert 6
AM Video to CRT
FM Audio
Chroma Added Later
Odd/Even Lines (Interlaced)
29.97 FPS (30 before color)
Vertical Blanking (CRT Retrace Time, Closed Captioning)
525 Lines, 262.5 per Field, 60 Fields per Second
Sam Siewert 7
MPEG2 Fundamentals Basic Head-End Broadband MPEG2 System
PCI
QAM-RF
DVB-ASI
Server
DVB-ASI
Analyzer
STBs
QAM-SA
IP
Network
SPTS
Playback MPTS
Playback
QAM Driver
Control Interface
Video Services
Bit-streams Pre-mux
Tools
PRO-1000 Quad
Broadcast
VoD
Services
Config &
Playlist
Linux in Digital Media
Common in Digital Cable Set-Top Boxes
Common in Android Mobile Media
Used in Digital Video VoD Head-Ends
Used in Post Production
Common for IPTV
Used in ECEN 5653
Sam Siewert 8
Digital Transport QoS Latency – To Tune in a Program, Turn-on
– To Deliver a Video Frame or Audio PCM Sample
– To Start, FF, REW, Start-Over, Pause
Bandwidth – Resolution, Lossy/Lossless Compression, High Motion
– Pixel Encoding for Color
– Frame Rate
– Constant Bit-rate Transport?
– Variable Bit-rate Transport and Encoding?
Jitter – Decode and Presentation Rates
– Elasticity in Decode to Presentation Buffering Necessary
Sam Siewert 9
January 21, 2008 Sam Siewert
Linux System Options
(Linux for Soft Real-time and Embedded
Systems)
Sam Siewert 11
Outline Many-Core Linux Host(s)
– Intel Nehalem, Westmere, …, Atom CE
– AMD Shanghai Quad/Quad-core
– Cavium MIPS64, Tilera, ARM Coretex
Alternative Cell Broadband Engine Architecture with Linux
– The Cell Broadband Engine chip: High-speed offload for the masses
GP-GPU Vector Processing PCI-E (NVIDIA Tesla/Fermi, AMD, Intel Knight’s Ferry)
Liu and Layland Paper Discussion
– Digital Video and Audio Encoding
– Digital Media Capture, Post Production, Delivery, Playback
CPU Scheduling Overview
– Scheduling Methods and Classes
– Policy, Feasibility
– Tuning Execution
NPTL – Native POSIX Threads Library
NPTL Example Code Walkthrough
Sam Siewert 12
Conceptual View of RT Resources Three-Space View of Utilization Requirements
– CPU Margin?
– IO Latency (and Bandwidth) Margin?
– Memory Capacity (and Latency) Margin?
Upper Right Front Corner – Low-Margin
Origin – High-Margin
Mobile – Must Consider Battery Life Too (Power)
CPU-Utility
IO-Utility
Memory-Utility
Processing – Initial Focus
Processing and Scaling Frame
Transformation, Encode, Decode is
Critical
Memory for Buffering (Frame
Transformations, CPU Integrated or GPU
Offloaded – e.g. Linux VDPAU)
I/O for Networking (Transport)
I/O for Storage (On-Demand, Post, Non-
Linear Editing)
Sam Siewert 13
Flynn’s Computer Architecture
Taxonomy Single Instruction Multiple Instruction
Single Data SISD (Traditional Uni-
processor)
MISD (Voting schemes
and active-active
controllers)
Multiple Data SIMD (e.g. SSE 4.2, GP-
GPU, Vector Processing)
MIMD (Distributed
systems (MPMD),
Clusters with MPI/PVM
(SPMD), AMP/SMP)
Sam Siewert 14
GPC has gone MIMD with SIMD Instruction Sets and SIMD Offload
(GP-GPU) NUMA vs. UMA (Trend away from UMA to NUMA or MCH vs. IOH)
SMP with One OS (Shared Memory, CPU-balanced Interrupt Handling,
Process Load Balancing, Mutli-User, Multi-Application, CPU Affinity
Possible)
MIMD - Single Program Multi-Data vs. Multi-Program Multi-Data
Sam Siewert 15
CPU Scheduling Taxonomy Execution Scheduling
Global-MP Local-Uniprocessor
Distributed Asymmetric
(AMP )
Symmetric
(SMP OS)
Preemptive Non-Preemptive
Fixed-Priority
Hybrid
Dynamic-Priority Cooperative
Batch
FCFS SJN
Co-Routine Continuation
Function
Heuristic EDF/LLF RR Timeslice
(desktop)
Multi-Frequency
Executives
Static Dynamic
Rate
Monotonic
Deadline
Monotonic
Dataflow
(Preemptive, Non-Preemptive Subtree
Under Each Global-MP Leaf)
SMT
(Micro-Paralell)
Sam Siewert 16
A Service Release and Response Ci WCET
Input/Output Latency
Interference Time
Event
Sensed Interrupt Dispatch Preemption Dispatch
Interference
Completion
(IO Queued)
Actuation
(IO Completion)
Input-Latency
Dispatch-Latency
Execution Execution
Output-Latency
Time
Response Time = TimeActuation – TimeSensed
(From Release to Response)
Sam Siewert 17
Many-Core MIMD Thread Scaling Symmetric MP and NUMA Many-Core Thread Scaling
SMP – All Memory Access is Uniform Latency, Full Load and Resource Balancing
NUMA – Non-Uniform Memory Access, Requires Affinity to Avoid High Latency Access
0.145615 0.287925 0.681104
1.148319
2.296593
4.593669
9.481171
14.355285
0.099654 0.155106 0.39617 0.622645 1.197544
2.345302
4.670147
7.244723
0
2
4
6
8
10
12
14
16
2 4 8 16 32 64 128 200
Tim
e (
se
c)
Number of Threads
Increasing Work and Threads on Xeon
ST
MT
Sam Siewert 18
Cell Broadband Engine Scaling –
PS3 One SMT PPE, 8 SPEs (Synergistic Proc Elements)
PPE is Symmetric Multi-threaded PowerPC
SPEs are Vector Procs – Send Code and Data Over Racetrack Network
http://www.ibm.com/developerworks/power/library/pa-soc12/index.html
1.68
1.7
1.72
1.74
1.76
1.78
1.8
1.82
2 4 8 16 32 64 128 256
Sp
ee
du
p
Number of Threads
Speedup with Increasing Threads
Speedup
SIMD Vector Instructions
Intel MMX, SSE 1, 2, 3, 4.x Code Generation
Using SIMD Extensions to Accelerate Algorithms (Edge Enhancement) – http://software.intel.com/en-us/articles/using-intel-
streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms/
Sam Siewert 19
PSF
Sam Siewert 20
Offload, Co-Proc, Vector Proc 1. GPU (Graphics Processing Units)
– Evolved for Consumer CGI and Games Physics Engines
3D Rendering + Texture (4D Vector Operations)
Game Engines and Simulation
HD Output: HDMI, HD-SDI, Headless GP-GPU
– Higher End Used for Digital Cinema / Post Production, Broadcast
PNY Quadro FX
NVIDIA CUDA for Post
– GP-GPU Being Used to Accelerate Encode, Transcode, Trans-rate, etc. - http://www.elementaltechnologies.com/
2. Built-In SIMD Instruction Set Extensions – Intel SSE
GP-GPU, What Is It? Ideal for Large Bitwise,
Integer, and Floating
Point Vector Math
Flynn’s Taxonomy
SIMD Architecture often
leverages GP-GPU Co-
Processors or Cell for
MPMD
21
Single Instruction/Prog Multiple Instruction
Single Data SISD (Traditional Uni-
processor)
MISD (Voting schemes
and active-active
controllers)
Multiple Data SIMD (SSE 4.2, Vector
Processing)
SPMD (Single Program
Multiple Data), GP-GPU
MIMD (Distributed
systems (MPMD),
Clusters with MPI/PVM
(SPMD), AMP/SMP)
SSE – Streaming SIMD
Extensions 128-bit registers known as XMM0 through XMM7
Large Operands and Operators (Multi-Word)
E.g. 128-bit XOR of Two Operands
Multiple Multiply and Accumulate Operations for Floating Point (DSP Kernel Operations) – E.g. 4 Component Vector addition
– 4 Single Precision Pixel Multiply and Accumulate in Single Instruction
Sam Siewert 22
vec_res.x = v1.x + v2.x;
vec_res.y = v1.y + v2.y;
vec_res.z = v1.z + v2.z;
vec_res.w = v1.w + v2.w;
16 operations
to load 2 operands, add, store
movaps xmm0,address-of-v1
addps xmm0,address-of-v2
movaps address-of-vec_res,xmm0
3 SSE operations to load, add, store ;xmm0=v1.w | v1.z | v1.y | v1.x
;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x
Scheduling Parallel/Cluster HW MIMD – OS SMP threading, provides load balancing, affinity operations, routable
interrupts (e.g. MSI-X), e.g. NPTL
– RTOS AMP is most often used in Embedded Systems
MPMD – OpenCL, CUDA, DirectCompute (DirectX extension)
– Cell BBE Developer’s Kit
– Intel OpenMP, Linux Cluster, MPI
Note on OS/CPU Virtualization and Digital Media – Hypervisors
Type 1 - run directly on the host's hardware to control the hardware and to monitor guest operating systems, guest operating system thus runs on another level above the hypervisor (e.g. VMWare ESXi)
Type 2 - hypervisors run within a conventional operating system environment. With the hypervisor layer as a distinct second software level, guest operating systems run at the third level above the hardware (e.g. VMWare for Windows)
– Enables Guest OS to Share Resources on System
– Typically DM Scales without Virtualization due to Client/Server Workload, but can Exploit for IT reasons
Sam Siewert 23
Sam Siewert 24
Elements of a Scheduling Class Scheduling Policy
– How is Dispatch Decision Made? – Non-Preemptive, Cooperative or Batch (Hard Coded) – Preemptive
Fixed Priority Encoding – Rate Monotonic (Shortest Period Gets Highest Priority) – Deadline Monotonic (Shortest Deadline Gets Highest Priority)
Dynamic Priority - Programmed Priorities – EDF or Deadline Driver - Earliest Deadline Gets Highest Priority, Updated Continuously – LLF (Least Laxity First) – Most Urgent Deadline Gets Highest Priority, Updated Continuously
Heuristic (Fuzzy Logic Scheduler, Heuristically Guided Iterative Repair)
Scheduling Feasibility Determination – Will Schedule Work? – Can a Set of Services Be Scheduled Given:
CPU Resources Available I/O Resources Available Memory Resources Available
– RM LUB (Next Week) – Lechoczky, Sha, Ding Theorem (Next Week) – EDF Feasibility (Several Weeks Away)
Ability to Tune Schedule – If Actuals Differ From Expected
WCET Expected vs. Observed Maximum Release Frequency for a Service – Expected vs. Observed
Sam Siewert 25
Real-Time Service Types Types of Services – Hard Real-Time (Flight Software, Anti-Lock Braking)
– Soft Real-Time (Multi-media, Audio, Video, Virtual Reality)
– Best Effort (E.g. Desktop Applications)
– Isochronal Hard Real-Time (Digital Feedback Control Systems)
– Isochronal Soft Real-Time (Continuous Media, Video, Audio)
Real-Time Service Types in Terms of Utility – Utility Curve Shows Value/Harm of Response Over Time
From Release
Both Before and After Deadline Relative to Release
– Full Utility - Service Performs as Required
– Zero Utility- Service is Not Provided Drop-out Causes No Harm
– Negative Utility Harm to System and/or User and Significant Loss of Assets
Sam Siewert 26
Hard Real-Time Service Utility Deadline
Utility
Time
Release
100%
0%
After Deadline, Utility is Negative
Sam Siewert 27
Soft Real-Time Service Utility Deadline
Utility
Time
Release
100%
0%
F(t)
After Deadline, Utility Diminishes
According to Some Function F(t)
Sam Siewert 28
Best Effort Service Utility
Deadline Does Not Exist Utility
Time
Release
100%
0%
Sam Siewert 29
Isochronal Hard Real-Time Utility
Deadline
Utility
Time
Release
100%
0%
After Deadline, Utility is Negative Before Deadline, Utility is Negative
Sam Siewert 30
Isochronal Soft Real-Time Utility (QoS Digital Media – Requires Buffering)
Deadline
Utility
Time
Release
100%
0%
After Deadline, Utility is < 100% Before Deadline, Utility is < 100%
F(t) F(t)
Sam Siewert 31
How Does NPTL Work? No Thread Manager or M-on-N Mapping – Previous POSIX Threading Model
– Manager Becomes Bottleneck
– Two-Level Scheduling Not Deterministic
– Many Pthreads (M) to N Kernel Threads Still an Issue
– O(n) Scheduling for each Manager
Direct Mapping of User to Kernel Thread or 1-to-1 – User Space Pthread Maps Directly onto Kernel Thread (Requires Root
privilege)
– Deterministic (Non-Determinism due to Kernel Preemptability Issues)
– O(1) Scheduling
Scheduling Policies Selectable
Similar to RTOS Tasking
Sam Siewert 32
Linux NPTL Scheduling Policies Fixed Priority Preemptive – SCHED_FIFO – This is Priority Preemptive
– SCHED_RR – This is Fair, but at Kernel Level
– SCHED_OTHER – This is OS default and should not be used
POSIX Threads have – Policy (FIFO, RR, OTHER)
– Priority (RT min to RT max)
– Creation (Fork)
– Join (Wait for thread completion at rendezvous)
– Synchronization Methods Semaphores
Message Queues
– Asynchronous Communication Methods Signals
Queued Signals
POSIX RT Extensions Include – Virtual Timer Services
– Signals Tied to Timer Services
– Priority Inversion Protection (Availability on Linux TBD)
July 7, 2004 Sam Siewert
NPTL Coding
Code Walk-through
Thread Scheduling Policy
Sam Siewert 34
pthread_attr_init(&rt_sched_attr);
pthread_attr_setinheritsched(&rt_sched_attr, PTHREAD_EXPLICIT_SCHED);
pthread_attr_setschedpolicy(&rt_sched_attr, SCHED_FIFO);
rt_max_prio = sched_get_priority_max(SCHED_FIFO);
rt_min_prio = sched_get_priority_min(SCHED_FIFO);
rt_param.sched_priority = rt_max_prio-1;
rc=sched_setscheduler(getpid(), SCHED_FIFO, &rt_param);
pthread_attr_getscope(&rt_sched_attr, &scope);
if(scope == PTHREAD_SCOPE_SYSTEM)
printf("PTHREAD SCOPE SYSTEM\n");
else if (scope == PTHREAD_SCOPE_PROCESS)
printf("PTHREAD SCOPE PROCESS\n");
else printf("PTHREAD SCOPE UNKNOWN\n");
Thread Creation and Join
Sam Siewert 35
rc = pthread_create(&main_thread, &main_sched_attr, testThread, (void *)0);
if (rc) { printf("ERROR; pthread_create() rc is %d\n", rc); perror(NULL); exit(-1); }
pthread_join(main_thread, NULL);
if(pthread_attr_destroy(&rt_sched_attr) != 0) perror("attr destroy");