High Performance Embedded Systems MPSoCs

High Performance Embedded Systems

July 2020

Electronics Engineering Department

Electronics Master Program

MPSoCs

Outline

• Multiprocessors Architecture and Taxonomy

• Parallel Execution Mechanism

• Multiprocessors Design Techniques

• Memory Systems

• Processors Symmetry

• Co-processing

Multiprocessors Architecture and Taxonomy

Taken from: https://arstechnica.com/gadgets/2020/05/intels-comet-lake-desktop-cpus-are-here/

Intel 4004 Core i9??

Intel 4004 Core i9

Exynos 7420 finFET transistors

Taken from: https://www.researchgate.net/publication/257711815_Where_Photovoltaics_Meets_Microelectronics/figures?lo=1

Taken from: https://www.semiconductor-digest.com/2020/03/10/transistor-count-trends-continue-to-track-with-moores-law/

Taken from: https://www.elprocus.com/difference-between-soc-system-on-chip-single-board-computer/

Taken from: http://soc.inha.ac.kr/index.php/Project

2-Parallel Radix-

2^4 FFT/IFFT

Processor Chip for

MB-OFDM UWB

communications

Taken from: PrSoC: Programmable System-on-chip (SoC) for silicon prototyping IEEE 2008

Taken from: https://www.elprocus.com/difference-between-soc-system-on-chip-single-board-computer/

Taken from: https://commons.wikimedia.org/wiki/File:ARM-Cortex-A9.gif

¿MPSoCs?

Taken from: W. Wolf Multiprocessor Systems-On-Chip

• Is an integrated circuit that implements

most or all of the functions of a

complete electronic system.

• The most fundamental characteristic of

an SoC is complexity.

Many product categories:

• Cell phones.

• Telecommunications and networking.

• Digital television.

• Videos games.

• …..

SoC Example

Processing Elements

SoC Example

Memory

SoC Example

Communications

SoC Example

MPSoCs?

What is a Parallel Architecture?

Parallel Architecture

“A large collection of processing elements that communicate and cooperate to

solve large problems fast”. - Almasi.

Taken from: M. Aguilar MPSoCs

Technology was increased

MPSoCs Technology was increased

Serial Communication

Parallel Communication

Here we go

What are MPSoCs?

“Are the latest incarnation of very largescale integration (VLSI)

technology”

What are MPSoCs?

technology”

???• Silicon

• Power

• Area

• …

What are MPSoCs?

technology”

“A single integrated circuit can contain over

100 million transistors, and the International Technology Roadmap

for Semiconductors predicts that chips with a billion transistors are

within reach”

MPSoCs

“The multiprocessor System-on-Chip (MPSoC) is a system-on-a-chip

(SoC) which uses multiple processors (see multi-core), usually

targeted for embedded applications”.

MPSoCs Understood!!

MPSoCs

“The multiprocessor system-on-chip (MPSoC) uses multiple CPUs

along with other hardware subsystems to implement a system”. -

Wayne Wolf.

Multiprocessor = Multicore?

General Structure MPSoCs

Processing Elements (PE)

• Relation with application context and requirements.

• MPSoCs Homogenous.

• MPSoCs Heterogenous

• Interconnection Element

• Buses.

• NoCs (Networks on Chip). More information here.

Taken from: M. Agular MPSoCs

Advantage in MPSoCs

• Performance

• Powerful platform (Cores).

• Users.

• Applications.

• Tasks into same application.

Power Consumption

• Low power from parallel approach.

MPSoCs Beneficts

• Wireless.

• Multimedia: video and audio.

• Health.

• Military.

• Avionics.

• Aerospacial

Multiprocessor = Multicore?

Multiprocessor

• Platform with several CPUs.

• Parallel approach was used.

Multicore

• Platform with only one CPU.

• Multiple cores into CPU.

MPSoCs Software

Parallel Approaches

Parallel

Approaches

Parallel Approaches

Parallel

Approaches

Threads

TasksInstructions

MPSoCs Architecture?

MPSoCs

Homogeneous Heterogenous

MPSoCs Heterogeneous

• Different PEs, for example

• GPU (General Purpose Unit).

• DSPs.

• HW Acceleration

• NoC infrastructure.

• Better performance and power consumption

• Use in embedded system.

• Portable system.

• Power consumption.

MPSoCs Homogenous

• PEs to conform a SoC.

• PE is instanced several times.

• Instance is connected by communication

infrastructure.

• Flexibility and Scalability.

• Worst power consumption.

MPSoCs Taxonomy?

Processor Organization

Serial

Uniprocessor

Multi ALUOverlapped

operations

Parallel

SIMD MISD MIMD

Vector

processor

Tightly

coupled

Loosely

coupled

Shared

memory

Symmetric

multiprocessor

(SMP)Nonuniform

memory access

(NUMA)

Distributed

memory

Clusters

Where are located MPSoCs?

Processor Organization

Serial

Uniprocessor

Multi ALUOverlapped

operations

Parallel

SIMD MISD MIMD

Vector

processor

Tightly

coupled

Loosely

coupled

Shared

memory

Symmetric

multiprocessor

(SMP)Nonuniform

memory access

(NUMA)

Distributed

memory

Clusters

MPSoCs

Taken from: M. Aguilar MPSoCs and Parallel Computing Lectures Notes

• This architecture executing

different operations over

different data bundle.

• Multiprocessing approach and

MPSoCs were located in this

category.

MPSoCs

Memory Access

Uniform Access (UMA)

Non-Uniform Access (NUMA)

Processors Symmetry

SMP (Symmetric Multi-processing)

AMP (Asymmetric Multi-processing)

Memory Architecture

Share Memory

Distributed memory

MPSoCs Architecture

ARM Cortex A9

Analog Devices - Blackfin

TI Davinci DM355

TI OMAP5

ST Microelectronic Nomadik

Nexperia

Taken from: http://linuxgizmos.com/new-arm-cortex-a72-nearly-twice-as-fast-as-cortex-a57/

Cortex-A72

Outline

• Memory Systems

• Co-processing

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Consider following approaches

• Shared memory.

• Threads.

• Message Passing.

• Data Parallel.

• Hybrid.

• Others

All these can be implemented on any architecture.

• Shared memory.

• Threads.

• Data Parallel.

• Hybrid.

• Others

Shared Memory

• Tasks share a common address space, which they read and write

asynchronously.

• Various mechanisms such as locks/semaphores may be used control access to

the shared memory.

• Advantage

• No need to explicitly communicate of data tasks simplified programming.

• Disadvantages

• Need to take care when managing memory, avoid synchronization conflicts.

• Harder to control data locality.

In Hardware

• Shared memory systems use:

• UMA (Uniform Memory Access)

• NUMA (Non- Uniform Memory

Access)

• COMA (Cache-only memory

architecture)

In Software

• Inter-process communication (IPC).

• Virtual memory mapping.

• Shared memory.

• Threads.

• Data Parallel.

• Hybrid.

• Others

Threads

• A thread can be considered as a

subroutine in the main program.

• Threads communicate with each other

through the global memory.

• Commonly associated with shared

memory architectures and operating

systems.

• Posix Threads or pthreads.

• OpenMP.

Threads

Advantages

• Responsiveness.

• Faster execution.

• Lower resource consumption.

• Better system utilization.

• Simplified share and communication

• Parallelization.

• Drawbacks

• Synchronization.

• Thread crashes a process.

• Shared memory.

• Threads.

• Data Parallel.

• Hybrid.

• Others.

Message Passing

• A set of tasks that use their own local memory

during computation.

• Data exchange through sending and receiving

messages.

• Data transfer usually requires cooperative

operations to be performed by each process.

• For example, a send operation must have a

matching receive operation.

• MPI

• Example here

• Shared memory.

• Threads.

• Data Parallel.

• Hybrid.

• Others.

Data Parallel

• Consider the following characteristics:

• Parallel work performs operations on a data set,

organized into a common structure.

• Tasks works collectively on the same data structure,

with each task working on a different partition.

• Tasks perform the same operation on their partition.

• Shared memory architectures, all tasks may have

access to the data structure through global memory.

• Distributed memory architectures the data structure is

split up and resides as “chunks” in the local memory

of each task.

• More information here.

• Shared memory.

• Threads.

• Data Parallel.

• Hybrid.

• Others

Hybrid

• Using various models (for example OpenMP/MPI).

• Single Program Multiple Data (SPMD)

• Single program is executed by all tasks simultaneously.

• Multiple Program Multiple Data (MPMD)

• Has multiple executables. Task can execute the same of different programs

as other task

• Shared memory.

• Threads.

• Data Parallel.

• Hybrid.

• Others. (Depends on the architecture)

Others

• MCAPI (Multicore Association)

• Poly-Platform

• CUDA

Others

• Poly-Platform

• CUDA

Taken from: https://en.wikipedia.org/wiki/Multicore_Association

MCAPI (Multicore Association)

• Founded in 2005

• First specification and referred to as MCAPI

• Based on message-passing

• Target is addressed to system, toolchain and programming language

heterogeneous.

• Active working

• MCAPI

• Virtualization.

• Open Asymmetric Multiprocessing (OpenAMP)

Others

• Poly-Platform

• CUDA

Taken from: http://polycoresoftware.com/poly-platform

Poly-Platform

• Collection productivity tools

• Migrating process

• Main approach multicore platforms.

• Driven supports for several SoC, OS and Transport Information.

Others

• Poly-Platform

• CUDA

Taken from: https://en.wikipedia.org/wiki/CUDA

• Initial release 2007.

• Parallel computing platform and

application programming interface.

• Created by NVIDIA.

• GPU approach.

• Supports in Windows, Linux and

macOS.

Outline

• Memory Systems

• Co-processing

Multiprocessors Design Techniques

Taken from: W.Wolf High-Performance Embedded Computing

Embedded Systems Design Flows

• Co-design flows.

• Platform-based design.

• Two-stage process.

• Programming platforms.

• Standards-Based design.

MPSoCs?

Challenges

• Software development is a major challenge for MPSoC designers.

• Software that runs on the multiprocessor must be high performance, real time,

and low power.

• Each MPSoC requires its own software development environment: compiler,

debugger, simulator, and other tools.

• Better understanding of how to abstract tasks properly to capture the essential

characteristics of their low-level behavior for system-level analysis.

Taken from: W.Wolf Multprocessor Systems on Chip

Taken from: W. Wolf Multiprocessor Systems on Chip

Challenges

• Networks-on-chips have emerged over the past few years as an architectural

approach to the design of single-chip multiprocessors.

• FPGAs have emerged as a viable alternative to application-specific integrated

circuits (ASICs) in many markets. FPGA fabrics are also starting to be

integrated into SoCs.

Taken from: SoC Lectures Notes

Challenges

• C code sequence is not easy to replace.

• Algorithm specification contains parallel specifications (Model of computation

KPN, SDF, etc).

• Not new programming languages.

• Automatically and parallel programming.

• Platform-based design (SW synthesis) or SW and HW synthesis.

Taken from: MPSoCs https://slideplayer.com/slide/8773117/

Challenges

All MPSOC design have the following requirements:

• Speed.

• Power.

• Area.

• Application Performance.

• Time to market.

MPSoCs Programming

• Task mapping to multiprocessor or cores.

• Communication inter-processor management.

• Data transfer engine management.

• Shared resource management.

• Memory management

• Debugging.

MPSoCs Exploration

• Divide computational and communications.

Virtual Processing Unit VPU

• Load simulator: It is a high-level simulation of

the core behavior.

• Functional simulator: Native execution of

tasks, scheduling is given by the VPU OS.

Virtual Processing Unit VPU

Allows spatial and temporal modeling of task mapping to PE

Virtual Platform

• It is a software model that allows the exploration of hardware and software.

• It allows hardware platform exploration and optimization.

• Software development, debugging and optimization.

• Concurrent hardware and software design.

Virtual Platform

• Requirements:

• High speed in terms of simulation process.

• Compromise between simulation speed and precision.

• Flexibility.

• Usability by developers not experts in hardware.

Design Techniques

• Core-based Strategy.

• Wrappers.

• System-level design flow.

• Component-based design.

Design Techniques

• Wrappers.

Core-based Strategy

• Core-based synthesis strategy for the IBM CoreConnect bus.

• Coral tool automates many of the tasks required to stitch together multiple

cores using virtual components.

• Each virtual component describes the interfaces for a class of real

components.

• Coral can synthesize some combinational logic.

• Coral also checks the connections between cores using Boolean decision

diagrams.

Core-based Strategy

Core Connect provides three types of busses:

• A high-speed processor local bus (PLB).

• An on-chip peripheral bus (OPB).

• A device control register (DCR) bus for configuration and status information.

Core-based Strategy

Design Techniques

• Wrappers.

Wrappers

• Treats both hardware and software as

components.

• A wrapper is a design unit that interfaces a

module to another module.

• A wrapper can be hardware or software

and may include both.

• The wrapper performs only low-level

adaptations, such as protocol

transformationTaken from: W.Wolf High-Performance Embedded Computing

Wrappers

Heterogeneous multiprocessor introduce several types of problems:

• Many chips have multiple communication networks to match the network to

the processing needs. Synchronizing communication across network

boundaries is more difficult than communicating within a network.

• Specialized hardware is often needed to accelerate interprocess

communication and free the CPU for more interesting computations.

• The communication primitives should be at a higher level of abstraction than

shared memory.

Wrappers

A dedicated CPU is added to the system, its software must be adapted

in several ways:

1. The software must be updated to support the platform’s communication

primitives.

2. Optimized implementations of the host processor’s communication

functions must be provided for interprocessor communication.

3. Synchronization functions must be provided.

Design Techniques

• Wrappers.

System-Level Design

• An abstract platform is created from a combination of system requirements,

models of the software, and models of the hardware components.

• Abstract platform is analyzed to determine the application’s performance

and power/energy consumption.

• Based on the results of this analysis, software is allocated and scheduled

onto the platform.

• Golden abstract architecture that can be used to build the implementation.

System-Level Design

Major elements of an abstract architecture:

1. Software tasks are described by their data and

scheduling dependencies; they

interface to an API.

2. Hardware components consist of a core and an

interface.

3. The hardware/software integration is modeled by

the communication network that connects the CPUs

that run the software and the hardware IP

cores.

Design Techniques

• Wrappers.

Platform-based Design

• Design space: platform selection

• Platform programming

• Multi-CPUs

• Concurrency

• Real-Time

• Platform developer must be

provided tools (compiler, editors,

debuggers, simulators, etc)

Taken from: Introduction to Embedded Systems

• Start with functional specifications

• Task graphs.

• Nodes: Task to complete

• Edges: Communication and

dependence between tasks

• Execution time on the nodes.

• Data communicated on the edges.

• Map task on pre-designed HW.

• Use extended task graph for SW and

Communication

• Map task on pre-designed HW.

• Use extended task graph for SW and

Communication

Design Techniques

• Wrappers.

Component Based Design

• Conceptual MPSOCs platform.

• SW, Processor, IP, Communication

Fabric.

• Parallel Development

• Use APIs.

• Quicker time to market.

Component Based Design

Multicore Application Programming Studio (MAPS)

• Developed at RWTH Aachen University in Germany.

• It is a platform that offers tools and technologies for MPSoC programming.

• Main features are:

• Sequential C code partition.

• Parallel programming model.

• Mapping and scheduling.

• Different types of applications.

• Functional Verification (Virtual Platform).

• Multiple applications environment.

• IDE easy to use.

Taken from: M. Aguilar SoC Lectures Notes

MAPS Flow

MAPS Programming Model: C for Paralell Network (CPN)

• Embedded Systems programming was used C language.

• CPN is a language developed as an extension of ANSI C in order to

describe process networks (KPN and SDF).

• A compiler called cpn-cc performs a transformation source-to-source to

convert code in CPN to code C standard with the APIs of the target

architecture.

MAPS Programming Model: C for Paralell Network (CPN)

MAPS Virtual Platform (MVP)

• MAPS Virtual Platform (MVP)

• High level: abstract PEs based on SystemC.

• Low level: (Instruction Set Simulators) ISS-based virtual platform.

• “mPhone” smartphone virtual.

Virtual Processing Element

• It is a parameterizable processing element.

• Clock frequency.

• Type (RISC, VLIW, DSP, etc).

• Scheduling algorithm (Round robin, EDF, based on priorities, etc).

Outline

• Memory Systems

• Co-processing

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Memory Systems

• The memory system is a traditional bottleneck in computing.

• Not only are memories slower than processors, but processor clock rates

are increasing much faster than memory cycle times.

Taken from: W. Wolf High-Performance Embedded Computing and

https://www.taringa.net/+serviciotecnico/consulta-cuello-de-botella-cpu-debil-en-gpu-potente_15casq

Memory Systems

Taken from: Multi-core architectures

Memory Systems

Taken from: MPSoCs Hardware platforms Lectures Notes

Memory Systems

• Start with a look at parallel memory systems in scientific multiprocessors.

• Consider models for memory and motivations for heterogeneous memory

systems.

• Look at what sorts of consistency mechanisms are needed in embedded

multiprocessors.

Taken from: W. Wolf Hugh-Performance Embedded Computing

Memory Systems

In terms of understanding memory systems considers following case study:

• Scientific processors traditionally use parallel, homogeneous memory

systems to increase system performance.

• Multiple memory banks allow several memory accesses to occur

simultaneously.

Memory Systems

• Each bank is separately addressable.

Memory Systems

• If the memory system has n banks,

then n accesses can be performed in

parallel.

• This is known as the peak access

Memory Systems

• Cannot keep the memory busy all of

the time.

• A simple statistical model lets us

estimate performance of a random-

access program.

Memory Systems

• Assume that the program accesses a

certain number of sequential

locations, then moves to some other

location.

• Where:

• λ describes probability of a

nonsequential memory access (a

branch in code to be a nonconsecutive

data location).

• k describes sequential accesses.Taken from: W. Wolf High-Performance Embedded Computing

Memory Systems

• Where:

• 𝑝 𝑘 = 𝜆 1 − 𝜆 𝑘−1

• And the mean length of a sequential

access sequence is:

• 𝐿𝑏 =1− 1−𝜆 𝑚

Memory Systems

• Use program statistics to estimate

the average probability of

nonsequential accesses, design the

memory system accordingly.

• Use software techniques to

maximize the length of access

sequences wherever possible.

Memory Systems

• Embedded systems can make use of multiple-bank memory systems, but they

also make use of more heterogeneous memory architectures.

• They do so to improve the real-time performance and lower the power

consumption of the memory system.

Memory Systems

Why do heterogeneous memory systems

improve real-time performance?

Memory Systems

• The energy required to perform a memory access depends in part on the size of

the memory block being accessed.

• A heterogeneous memory may be able to use smaller memory blocks, reducing

the access time.

• Energy per access also depends on the number of ports on the memory block.

• By reducing the number of units that can access a given part of memory, the

heterogeneous memory system can reduce the energy required to access that

part of the memory space.

Memory Systems

Consistent Memory Systems

Memory Systems

Shared

variables

Consistent

Memory Systems

Snooping

cachesCache

consistency

Memory Systems

• Shared variables

• To worry about whether two processors see the same state of a shared variable.

• If reads and writes of two processors are interleaved, then one processor may write

the variable after another one has written it, causing that processor to erroneously

assume the value of the variable.

• Critical sections, guarded by semaphores, to ensure that critical operations occur in

the right order.

• Use atomic test-and-set operations (often called spin locks) to guard small pieces of

memory.

Memory Systems

• Cache consistency

• If two processors access the same

memory location, then each may have

a copy of the location in its own cache.

• If one processing element writes that

location, then the other will not

immediately see the change and will

make an incorrect computation.

Memory Systems

• Snooping Cache

• This type of cache contains extra

logic that watches the

multiprocessor interconnect for

memory transactions.

• When it sees a write to a location

that it currently contains, it

invalidates that location.

Memory Systems

Shared

memory

Memory Systems

Architecture

Hybrid

memoryDistributed

memory

Memory Systems

Shared

memory

Memory Systems

Architecture

Hybrid

memoryDistributed

memory

Memory Systems

• Shared Memory

• Shared memory parallel computers vary

widely, but generally have in common the

ability for all processors to access all

memory as global address space.

• Multiple processors can operate

independently but share the same memory

resources.

Taken from: W. Wolf High-Performance Embedded Computing,

https://en.wikipedia.org/wiki/Shared_memory#/media/File:Shared_memory.svg,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Memory Systems

• Shared Memory

• Changes in a memory location effected by

one processor are visible to all other

processors.

• Historically, shared memory machines

have been classified as UMA and NUMA,

based upon memory access times.

Memory Systems

• Shared Memory (Uniform Memory

Access UMA)

• Most commonly represented today by

Symmetric Multiprocessor (SMP)

machines.

• Identical processors.

Memory Systems

• Shared Memory (Uniform Memory

Access UMA)

• Equal access and access times to

memory.

Memory Systems

• Shared Memory (Uniform Memory Access

• Sometimes called CC-UMA - Cache

Coherent UMA. Cache coherent means if one

processor updates a location in shared

memory, all the other processors know about

the update. Cache coherency is accomplished

at the hardware level.

Memory Systems

• Shared Memory (Non-Uniform Memory

Access NUMA)

• Often made by physically linking two or

more SMPs.

• One SMP can directly access memory of

another SMP.

Memory Systems

• Shared Memory (Non-Uniform Memory

Access NUMA)

• Not all processors have equal access time to

all memories.

• Memory access across link is slower

• If cache coherency is maintained, then may

also be called CC-NUMA - Cache Coherent

Memory Systems

• Shared Memory

• Advantages

• Global address space provides a user-

friendly programming perspective to

memory.

• Data sharing between tasks is both fast

and uniform due to the proximity of

memory to CPUs.

Taken from: W. Wolf High-Performance Embedded Computing,,

Memory Systems

• Shared Memory

• Disadvantages

• Primary disadvantage is the lack of

scalability between memory and CPUs.

Adding more CPUs can geometrically

increases traffic on the shared memory-CPU

path, and for cache coherent systems,

geometrically increase traffic associated with

cache/memory management.

Taken from: W. Wolf High-Performance Embedded Computing,,

Memory Systems

• Shared Memory

• Disadvantages

• Programmer responsibility for

synchronization constructs that ensure

"correct" access of global memory.

Memory Systems

Shared

memory

Memory Systems

Architecture

Hybrid

memoryDistributed

memory

Memory Systems

• Distributed Memory

• Like shared memory systems, distributed

memory systems vary widely but share a

common characteristic.

• Distributed memory systems require a

communication network to connect inter-

processor memory.

Memory Systems

• Processors have their own local memory.

Memory addresses in one processor do not

map to another processor, so there is no

concept of global address space across all

processors.

Memory Systems

• Because each processor has its own local

memory, it operates independently.

Changes it makes to its local memory have

no effect on the memory of other

processors. Hence, the concept of cache

coherency does not apply.

Memory Systems

• When a processor needs access to data in

another processor, it is usually the task of

the programmer to explicitly define how

and when data is communicated.

Synchronization between tasks is likewise

the programmer's responsibility.

Memory Systems

• The network "fabric" used for data transfer

varies widely, though it can be as simple as

Ethernet.

Memory Systems

• Advantages

• Memory is scalable with the number

of processors. Increase the number of

processors and the size of memory

increases proportionately.

Memory Systems

• Advantages

• Each processor can rapidly access its

own memory without interference and

without the overhead incurred with

trying to maintain global cache

coherency.

Memory Systems

• Advantages

• Cost effectiveness: can use

commodity, off-the-shelf processors

and networking.

Memory Systems

• Disadvantages

• The programmer is responsible for

many of the details associated with data

communication between processors.

• It may be difficult to map existing data

structures, based on global memory, to

this memory organization.

• .Taken from: W. Wolf High-Performance Embedded Computing,

Memory Systems

• Disadvantages

• Non-uniform memory access times -

data residing on a remote node takes

longer to access than node local data.

Memory Systems

Shared

memory

Memory Systems

Architecture

Hybrid

memoryDistributed

memory

Memory Systems

• Hybrid Memory

• The largest and fastest computers in the

world today employ both shared and

distributed memory architectures.

• The shared memory component can be a

shared memory machine and/or graphics

processing units (GPU).

Memory Systems

• Hybrid Memory

• The distributed memory component is

the networking of multiple shared

memory/GPU machines, which know

only about their own memory - not the

memory on another machine. Therefore,

network communications are required to

move data from one machine to another.

Memory Systems

• Hybrid Memory

• Current trends seem to indicate that this

type of memory architecture will

continue to prevail and increase at the

high end of computing for the

foreseeable future.

Memory Systems

• Hybrid Memory

• Advantages and Disadvantages

• Whatever is common to both shared and

distributed memory architectures.

• Increased scalability is an important

advantage.

• Increased programmer complexity is an

important disadvantage.

Memory Systems

Design Memory Systems?

Memory Systems

Design Memory Systems

A simple model of memory components for parallel memory design would include

three major parameters of a memory component of a given size.

• Area: The physical size of the logical component. This is most important in chip design, but it also

relates to cost in board design.

• Performance: The access time of the component. There may be more than one parameter, with

variations for read and write times, page mode accesses, and so on.

• Energy: The energy required per access. If performance is characterized by multiple modes, energy

consumption will exhibit similar modes.

Memory Systems

Design Memory Systems

Memory Systems

Taken from: https://www.xataka.com/ordenadores/el-cuello-de-botella-de-la-ley-de-moore-no-esta-en-los-procesadores-sino-en-las-memorias

Memory Systems

Taken from: https://www.xataka.com/ordenadores/el-cuello-de-botella-de-la-ley-de-moore-no-esta-en-los-procesadores-sino-en-las-memorias

Outline

• Memory Systems

• Co-processing

Processors Symmetry

Symmetric

Multi-processing

Asymmetric

Processors Symmetry

Symmetric

Multi-processing

Asymmetric

Processors Symmetry

Taken from: M. Aguilar SoCs

Symmetric Multi-processing (SMP)

• System with multiple processors or cores that are communicated by a single

shared memory and are controlled by a single operating system

Processors Symmetry

Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/

• Identical: All the processors are treated equally i.e. all are identical.

• Communication: Shared memory is the mode of communication among

processors.

• Complexity: Are complex in design, as all units share same memory and data

• Expensive: They are costlier in nature.

• Unlike asymmetric where a task is done only by Master processor, here tasks of

the operating system are handled individually by processors.

Processors Symmetry

• Applications

• This concept finds its application in parallel processing, where time-sharing

systems(TSS) have assigned tasks to different processors running in parallel

to each other, also in TSS that uses multithreading i.e. multiple threads

running simultaneously.

Processors Symmetry

• Advantages

• Throughput: Since tasks can be run by all the processors unlike in

asymmetric, hence increased degree of throughput(processes executed in unit

time).

• Reliability: Failing a processor doesn’t fail whole system, as all are equally

capable processors, though throughput do fail a little.

Processors Symmetry

• Disadvantages

• Complex design: Since all the processors are treated equally by OS, so

designing and management of such OS become difficult.

• Costlier: As all the processors share the common main memory, on account

of which size of memory required is larger implying more expensive.

Processors Symmetry

Taken from: https://www.enea.com/globalassets/downloads/operating-systems/enea-oseck/enea-smp-platform-for-xilinx-zynq-datasheet.pdf

Processors Symmetry

Taken from: https://www.enea.com/globalassets/downloads/operating-systems/enea-oseck/enea-smp-platform-for-xilinx-zynq-datasheet.pdf

More information here

Processors Symmetry

Symmetric

Multi-processing

Asymmetric

Processors Symmetry

Asymmetric Multi-processing (AMP)

• Is a system with multiple processors or cores that are communicated by a single

shared memory and each processor or cores is controlled by an independent

operating system (different or equal).

Processors Symmetry

• Characteristics

• Processors are not treated equally.

• Tasks of the operating system are done by master processor.

• No Communication between Processors as they are controlled by the

master processor.

• Process are master-slave.

• Systems are cheaper.

• Systems are easier to design.

Processors Symmetry

Taken from: https://www.openampproject.org/old_website/docs/mca/BKK19%20OpenAMP%20Introduction.pdf

Processors Symmetry

Taken from: https://github.com/OpenAMP/open-amp

Processors Symmetry

Taken from: https://github.com/OpenAMP/open-amp

Processors Symmetry

Outline

• Memory Systems

• Co-processing

Co-processing

Taken from: https://www.researchgate.net/publication/250840737_Automatic_Generation_of_Application-

Specific_Architectures_for_Heterogeneous_MPSoC_through_Combination_of_Processors/figures

Co-processing

Taken from: http://www.cecs.uci.edu/~papers/esweek06/codes/p288.pdf

Co-processing

Taken from: https://www.researchgate.net/publication/221656884_A_Generic_Wrapper_Architecture_for_Multi-

Processor_SoC_Cosimulation_and_Design/figures?lo=1

Co-processing

Taken from: https://link.springer.com/chapter/10.1007/978-3-319-01113-4_1

Co-processing

What is a coprocessor?

Co-processing

A coprocessor is:

• A computer processor used to supplement functions of the primary processor.

• Several operations performed by the coprocessor such as:

• Floating Point (FPU).

• Graphics Processing.

• Signal Processing.

• Cryptography.

• Etc, ……

Taken from: https://youtu.be/xrMUv9ZVKY0

Co-processing

A coprocessor is:

• By offloading processor intensive tasks from the main processor, coprocessor can

accelerate system performance.

• Coprocessors allow a line of computers to be customized, so that customers who

do not need extra performance need not pay for it.

Co-processing

Functions

• A coprocessor may not be a general-purpose processor.

• Coprocessors cannot fetch instructions from memory, execute program flow

control instructions, do input/output operations manage memory and so on.

• The coprocessor requires the host (main) processor to fetch the coprocessor

instructions and handle all other operations aside from the coprocessor functions.

• In some architectures the coprocessor is a more general-purpose computer but

carries out only a limited range of functions under the close control of a

supervisory processor.

Co-processing

Taken from: https://www.doulos.com/knowhow/arm/using_your_c_compiler_to_exploit_neon/Resources/using_your_c_compiler_to_exploit_neon.pdf

Coprocessor

Co-processing

NEON Arm

• v7-A architecture, ARM has introduced a powerful SIMD implementation called

NEON™.

• NEON is a coprocessor which comes with its own instruction set for vector

operations.

• Most vector operations carry out the same operation on all elements of their

operand vector(s) in parallel.

• Using your C compiler to exploit NEON™ Advanced SIMD.

Co-processing

NEON Arm

• The goal of NEON is to provide a powerful, yet comparatively easy to program

SIMD instruction set that covers integer data types of up to 64-bit width as well

as single precision floating point (32 bit).

• Instead it shares its sixteen 128-bit registers with the vector floating point unit.

• Executed on the same processor core, NEON performance is influenced by

context switching overhead, non-deterministic memory access latency

(cache/MMU access) and interrupt handling.

Co-processing

NEON Arm

Co-processing

NEON Arm

Co-processing

NEON Arm

Co-processing

NEON Arm

Co-processing

NEON Arm

Co-processing

NEON Arm

Co-processing

DSP’s

Taken from: Introduccion a los Sistemas Empotrados Lectures Notes

Co-processing

DSP’s

Co-processing

DSP’s

Co-processing

Taken from: https://www.anandtech.com/show/14101/nvidia-announces-jetson-nano

Co-processing

Taken from: https://www.anandtech.com/show/14101/nvidia-announces-jetson-nano

Co-processing

Flight controller UAV

Taken from: https://cdn.sparkfun.com/assets/d/d/9/9/3/Pixhawk4-DataSheet.pdf

Co-processing

Flight controller UAV

Taken from: https://cdn.sparkfun.com/assets/d/d/9/9/3/Pixhawk4-DataSheet.pdf

References

[1] Lectures Notes, Tecnologico de Costa Rica, Course SoC.

[2] W. Wolf. High-Performance Embedded Computing: Architectures, Applications

and Methodologies. Elsevier, United States of America, 2007.

[3] E. Ashford and S. Arunkumar Introduction to Embedded Systems, 2017

Lectures notes and materials are available in TEC-Digital and web portal

www.ie.tec.ac.cr/sarriola/HPEC

www.ie.tec.ac.cr/joaraya

High Performance Embedded Systems MPSoCs

Documents

Transcript of High Performance Embedded Systems MPSoCs

Product Guide UltraScale+ MPSoCs DPUCZDX8G for Zynq PG338 ...

Porting AUTOSAR to a high performance embedded system648352/FULLTEXT01.pdf · Porting AUTOSAR to a high performance embedded system. ... porting AUTOSAR to a high performance embedded

RUNTIME ADAPTIVE QOS MANAGEMENT IN NOC-‐BASED MPSOCS MARCELO RUARO

Rugged Embedded Computer High Performance & Expandable …€¦ · Rugged Embedded Computer High Performance & Expandable Series Speciﬁcations System Processor • 7th Generation

High Performance Embedded Computing © 2007 Elsevier Chapter 2, part 3: CPUs High Performance Embedded Computing Wayne Wolf.

Dynamic real-time scheduler for large-scale MPSoCs

Performance Analysis of Embedded Systems

Analysis and Simulation of Embedded Control Performance ... · Embedded Control Performance using Jitterbug and TrueTime ... –network interface ... Analysis and Simulation of Embedded

Tools and dataflow-based programming models for heterogeneous MPSoCs · 2017-05-26 · Tools and dataflow-based programming models for heterogeneous MPSoCs Jeronimo Castrillon Chair

A Survey on Existing MPSOCs Architectures

AMD Ultra High-Performance Embedded · PDF fileAMD Ultra High-Performance Embedded GPUs Breakthrough Processing Performance for the Most Demanding Graphics Applications OVERVIEW AMD

Part IV - Design technology for MPSoCs - user.it.uu.seuser.it.uu.se/~yi/pdf-files/LucaBeniniSuZhou07/Lec4-DesignTech.pdf · Part IV - Design technology for MPSoCs ... RTL Early 90’s

Exploiting Segregation in Bus-Based MPSoCs to Improve ... · Exploiting Segregation in Bus-Based MPSoCs to Improve Scalability of Model-Checking-Based Performance Analysis for SDFAs

Written Language Performance Following Embedded …

Fast Architecture Evaluation of Heterogeneous MPSoCs by Host-Compiled Simulation

Workload Clustering for Increasing Energy Savings on Embedded MPSoCs S. H. K. Narayanan, O. Ozturk, M. Kandemir, M. Karakoy.

PERFORMANCE EMBEDDED IN PROFESSIONAL SUPERVISION … · PERFORMANCE EMBEDDED IN PROFESSIONAL SUPERVISION ... social workers and team leaders engage in professional ... continuous

High Performance Embedded Computing © 2007 Elsevier Chapter 2, part 2: CPUs High Performance Embedded Computing Wayne Wolf.

High Performance Embedded Computing © 2007 Elsevier Chapter 3, part 2: Programs High Performance Embedded Computing Wayne Wolf.

MARQUISE - An Embedded High Performance Computer …