F28HS Hardware-Software Interface: Systems...
Transcript of F28HS Hardware-Software Interface: Systems...
![Page 1: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/1.jpg)
F28HS Hardware-Software Interface:Systems Programming
Hans-Wolfgang Loidl
School of Mathematical and Computer Sciences,Heriot-Watt University, Edinburgh
Semester 2 — 2019/20
0No proprietary software has been used in producing these slidesHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface 2019/20 1 / 193
![Page 2: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/2.jpg)
Outline
1 Lecture 1: Introduction to Systems Programming
2 Lecture 2: Systems Programming with the Raspberry Pi
3 Lecture 3: Memory HierarchyMemory Hierarchy
Principles of Caches
4 Lecture 5: Exceptional Control Flow and Signals
5 Lecture 6: Computer ArchitectureProcessor Architectures Overview
Pipelining
6 Lecture 10: Revision
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface 2019/20 2 / 193
![Page 3: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/3.jpg)
Lecture 1:Introduction to Systems
Programming
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 3 / 193
![Page 4: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/4.jpg)
Introduction to Systems Programming
This course focuses on how hardware and systems softwarework together to perform a task.We take a programmer-oriented view and focus on software andhardware issues that are relevant for developing fast, secure,and portable code.Performance is a recurring theme in this course.You need to grasp a lot of low-level technical issues in this course.In doing so, you become a “power programmer”.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 4 / 193
![Page 5: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/5.jpg)
Why is this important?
You need to understand issues at the hardware/software interface, inorder to
understand and improve performance and resource consumptionof your programs, e.g. by developing cache-friendly code;avoid progamming pitfalls, e.g. numerical overflows;avoid security holes, e.g. buffer overflows;understand details of the compilation and linking process.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 5 / 193
![Page 6: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/6.jpg)
Questions to be addressed
For each of these issues we will address several common questionson the hardware/software interface:
Optimizing program performance:I Is a switch statement always more efficient than a sequence of
if-else statements?I How much overhead is incurred by a function call?I Is a while loop more efficient than a for loop?I Are pointer references more efficient than array indexes?I Why does our loop run so much faster if we sum into a local
variable instead of an argument that is passed by reference?I How can a function run faster when we simply rearrange the
parentheses in an arithmetic expression?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 6 / 193
![Page 7: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/7.jpg)
Questions to be addressed
Understanding link-time errors:I What does it mean when the linker reports that it cannot resolve a
reference?I What is the difference between a static variable and a global
variable?I What happens if you define two global variables in different C files
with the same name?I What is the difference between a static library and a dynamic
library?I Why does it matter what order we list libraries on the command
line?I Why do some linker-related errors not appear until run time?
Avoiding security holes:I How can an attacker exploit a buffer overflow vulnerability?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 7 / 193
![Page 8: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/8.jpg)
Compilation of hello world
Pre-processor
(cpp)
hello.i Compiler(cc1)
hello.s Assembler(as)
hello.o Linker(ld)
hellohello.c
Sourceprogram
(text)
Modifiedsource
program(text)
Assemblyprogram
(text)
Relocatableobject
programs(binary)
Executableobject
program(binary)
printf.o
We have seen individual phases in the compilation chain so far(e.g. assembly)Using gcc on top level picks the starting point, depending on thefile extension, and generates binary codeYou can view the intermediate files of the compilation using thegcc flag -save-temps
This is useful in checking, e.g. which assembler code is generatedby the compilerWe will be using -D flags to control the behaviour of thepre-processor on the front end
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 8 / 193
![Page 9: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/9.jpg)
The Shell
Your window to the system is the shell, which is an interpreter forcommands issued to the system:
host> echo "Hello world"Hello worldhost> ls...
The Linux Introduction in F27PX-Praxis gave you an overview of whatyou can do in a shell. In this course, we make heavy usage of theshell. Check the later sections in the on-line Linux Introduction, whichexplain some of the more advanced concepts.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 9 / 193
![Page 10: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/10.jpg)
Hardware organisation of a typical system
Mainmemory
I/O bridgeBus interface
ALU
Register file
CPU
System bus Memory bus
Disk controller
Graphicsadapter
USBcontroller
Mouse Keyboard DisplayDisk
I/O bus Expansion slots forother devices suchas network adapters
hello executable stored on disk
PC
0From Bryant and O’Hallaron, Ch 1Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 10 / 193
![Page 11: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/11.jpg)
Components
The picture on the previous slide, mentions several importantconcepts:
Processor: the Central Processing Unit (CPU) is the engine thatexecutes instructions; modern CPUs are complicated in order toprovide additional performance (multi-core, pipelining, cachesetc);Main Memory: temporary storage for both program and data;arranged as a sequence of dynamic random access memory(DRAM) chips;Buses transmit information, as byte streams, betweencomponents of the hardware; the Universal Serial Bus (USB) isthe most common connection for external devices;I/O devices are in charge of input/output and represent theinterface of the hardware to the external world
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 11 / 193
![Page 12: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/12.jpg)
The Hello World Program
#include <stdio.h>
int main(){
printf("hello, world\n");}
What happens when we compile and execute this hello worldprogram?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 12 / 193
![Page 13: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/13.jpg)
Compiling Hello World
When we compile the program by calling
gcc -o hello hello.c
the compilation chain is executed. Note:The source code of Hello World is represented in ASCIIcharacters and stored in a file.The contents of the file is just a sequence of bytesThe context determines whether these bytes are interpreted astext or as graphics etc.
When we execute the resulting binary, the next slides show what’shappening
./hello
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 13 / 193
![Page 14: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/14.jpg)
Compiling Hello World
When we compile the program by calling
gcc -o hello hello.c
the compilation chain is executed. Note:The source code of Hello World is represented in ASCIIcharacters and stored in a file.The contents of the file is just a sequence of bytesThe context determines whether these bytes are interpreted astext or as graphics etc.
When we execute the resulting binary, the next slides show what’shappening
./hello
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 13 / 193
![Page 15: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/15.jpg)
1. Reading the hello program from the keyboard
Mainmemory
I/O bridgeBus interface
ALU
Register file
CPU
System bus Memory bus
Disk controller
Graphicsadapter
USBcontroller
Mouse Keyboard DisplayDisk
I/O bus Expansion slots forother devices suchas network adapters
PC
"hello"
Usertypes
"hello"
The shell reads ./hello from the keyboard, stores it in memory;then, initiates to load the executable file from disk to memory.
0From Bryant and O’Hallaron, Ch 1Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 14 / 193
![Page 16: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/16.jpg)
2. Reading the executable from disk to main memory
Mainmemory
I/O bridgeBus interface
ALU
Register file
CPU
System bus Memory bus
Disk controller
Graphicsadapter
USBcontroller
Mouse Keyboard DisplayDisk
I/O bus Expansion slots forother devices suchas network adapters
hello executable stored on disk
PC
hello code
"hello,world\n"
Using direct memory access (DMA) the data travels from disk directlyto memory.
0From Bryant and O’Hallaron, Ch 1Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 15 / 193
![Page 17: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/17.jpg)
3. Writing the output string from memory to display
Mainmemory
I/O bridgeBus interface
ALU
Register file
CPU
System bus Memory bus
Disk controller
Graphicsadapter
USBcontroller
Mouse Keyboard DisplayDisk
I/O bus Expansion slots forother devices suchas network adapters
hello executable stored on disk
PC
hello code
"hello,world\n"
"hello,world\n"
Once the code and data in the hello object file are loaded into memory,the processor begins executing the machine-language instructions inthe hello program’s main routine.
0From Bryant and O’Hallaron, Ch 1Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 16 / 193
![Page 18: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/18.jpg)
Caches
Copying data from memory to the CPU is slow compared toperforming an arithmetic or logic operation.This difference is called processor-memory gap and it isincreasing with newer generations of processors.Copying data from disk is even slower.On the other hand, these slower devices provide more capacity.To speed up the computation, smaller faster storage devicescalled cache memories are used.These cache memories (or just caches) serve as temporarystaging areas for information that the processor is likely to need inthe near future.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 17 / 193
![Page 19: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/19.jpg)
Cache memories
Mainmemory
I/ObridgeBus interface
ALU
Register fileCPU chip
System bus Memory bus
Cache memories
An L1 cache on the processor chip holds tens of thousands ofbytes and can be accessed nearly as fast as the register file.A larger L2 cache with hundreds of thousands to millions of bytesis connected to the processor by a special bus.It might take 5 times longer for the process to access the L2 cachethan the L1 cache, but this is still 5 to 10 times faster thanaccessing the main memory.The L1 and L2 caches are implemented with a hardwaretechnology known as static random access memory (SRAM).Newer systems even have three levels of cache: L1, L2, and L3.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 18 / 193
![Page 20: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/20.jpg)
Caches and Memory Hierarchy
Regs
L1 cache (SRAM)
Main memory(DRAM)
Local secondary storage(local disks)
Larger, slower,
and cheaper (per byte)storagedevices
Remote secondary storage(distributed file systems, Web servers)
Local disks hold files retrieved from disks on remote network servers.
Main memory holds disk blocks retrieved from local disks.
L2 cache (SRAM)
L1 cache holds cache lines retrieved from the L2 cache.
CPU registers hold words retrieved from cache memory.
L2 cache holds cache lines retrieved from L3 cache
L0:
L1:
L2:
L3:
L4:
L5:
Smaller,faster,and
costlier(per byte)storage devices
L3 cache (SRAM)
L3 cache holds cache lines retrieved from memory.
L6:
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 19 / 193
![Page 21: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/21.jpg)
The Role of the Operating System
Application programs
Processor Main memory I/O devices
Operating systemSoftware
Hardware
We can think of the operating system as a layer of softwareinterposed between the application program and the hardware.All attempts by an application program to manipulate the hardwaremust go through the operating system.This enhances the security of the system, but also generatessome overhead.In this course we are mainly interested in the interface betweenthe Software and Hardware layers in the picture above.
0From Bryant and O’Hallaron, Ch 1Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 20 / 193
![Page 22: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/22.jpg)
Goals of the Operating System
The operating system has two primary purposes:to protect the hardware from misuse by runaway applications, andto provide applications with simple and uniform mechanisms formanipulating complicated and often wildly different low-levelhardware devices.
The operating system achieves both goals via three fundamentalabstractions: processes, virtual memory, and files.
Processor Main memory I/O devices
Processes
Files
Virtual memory
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 21 / 193
![Page 23: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/23.jpg)
Basic Concepts
In this overview we will cover the following basic concepts:ProcessesThreadsVirtual memoryFiles
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 22 / 193
![Page 24: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/24.jpg)
Processes
A process is the operating system’s abstraction for a runningprogram.It provides the illusion of having exclusive access to the entiremachine.Multiple processes can run concurrently.The OS mediates the access to the hardware, and preventsprocesses from overwriting each other’s memory.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 23 / 193
![Page 25: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/25.jpg)
Concurrency vs Parallelism vs Threads
Concurrent execution means that the instructions of oneprocess are interleaved with the instructions of another process.The operating system performs this interleaving with a mechanismknown as context switching.The context of a process consists of: the program counter (PC),the register file, and the contents of main memory.They appear to run simultaneously, but in reality at each point theCPU is executing just one process’ operation.On multi-core systems, where a CPU contains severalindependent processors, the two processes can be executed inparallel, running on separate cores.In this case, both processes are genuinely running simultaneously.The main goal of parallelism is to make programs run faster.A process can itself consist of multiple threads.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 24 / 193
![Page 26: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/26.jpg)
Example of Context Switching
This example shows the context switching that is happening betweenthe shell process and the hello process, when running our helloworld example.
Process A Process B
User code
Kernel code
User code
Kernel code
User code
TimeContext switch
Context switch
read
Disk interruptReturn
from read
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 25 / 193
![Page 27: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/27.jpg)
Different Forms of Concurrency
Concurrency can be exploited at different levels:Thread-level concurrency: A program explicitly creates severalthreads with independent control flows. Each thread typicallyrepresents a large piece of computation. Shared memory, ormessage passing can be used to exchange data.Instruction-Level Parallelism: The components of the CPU canbe arranged in a way so that the CPU executes severalinstructions at the same time. For example, while one instructionis performing an ALU operation, the data for the next instructioncan be loaded from memory (“pipelining”).Single-Instruction, Multiple-Data (SIMD) Parallelism: Modernprocessor architectures provide vector-operations, that allow toexecute an operation such as addition, over a sequence of values(“vectors”), rather than just two values. Graphic cards make heavyuse of this form of parallelism to speed-up graphics operations.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 26 / 193
![Page 28: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/28.jpg)
Categorizing different processor configurations
All processorsMultiprocessors
Uniprocessors Multi-core
Hyper-threaded
Uniprocessors, with only one CPU, need to context-switch inorder to run several processes seemingly at the same timeMultiprocessors replicate certain components of the hardware togenuinely run processes at the same time:
I Muticores replicate the entire CPU, as several “cores”, each of canrun a process.
I Hyperthreaded machines replicate hardware to store the contextof several processes to speed-up context-switching.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 27 / 193
![Page 29: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/29.jpg)
Virtual Memory
Virtual memory is an abstraction that provides each process with theillusion that it has exclusive use of the main memory. Each processhas the same uniform view of memory, which is known as its virtualaddress space.
Kernel virtual memory
Memory mapped region forshared libraries
Run-time heap(created by malloc)
User stack(created at runtime)
0
Memoryinvisible touser code
Read/write data
Read-only code and data
Loaded from the hello executable file
printf function
0x08048000 (32)0x00400000 (64)
NB: The top region of the address space is resevered for the OS. Userprocesses are not allowed to write into this area!
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 28 / 193
![Page 30: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/30.jpg)
Virtual MemoryThe lower region holds the data for the user.The user space is separated into several areas, with different roles:
The code and data area: contains the progam code andinitialised data, starting at a fixed address. The program code isread only, the data is read/write.The heap contains dynamically allocated data during theexecution of the program. In high-level languages, such as Java,any new will allocate in the heap. In low-level languages, such asC, you can use the library function malloc to dynamically allocatedata in the heap.The shared data section holds dynamically allocated data,managed by shared libraries.The stack is a dynamic area at the top of the memory, growingdownwards. It is used to hold the local data of functions whenevera function is called during program execution.The topmost section of the virtual memory is allocated to kernelvirtual memory, and only accessible to the OS kernel.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 29 / 193
![Page 31: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/31.jpg)
Virtual Memory
Virtual memory gives the illusion of a continuous address space,exceeding main memory, with exclusive access.It abstracts over the limitations of physical main memory andallows for several parallel threads to access the same addressspace.We will discuss this aspect in more detail in the Lecture on“Memory Hierarchy”.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 30 / 193
![Page 32: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/32.jpg)
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 31 / 193
![Page 33: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/33.jpg)
Files
A file is a sequence of bytes.A file can be used to model any I/O device: disk, keyboard,mouse, network connections etc.Files can also be used to store data about the hardware (/proc/filesystem), or to control the system, e.g. by writing to files.Thus, the concept of a file is a very powerful abstraction that canbe used for many different purposes.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 32 / 193
![Page 34: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/34.jpg)
External Devices
An important task of the OS/code is to interact with externaldevices.We will see this in detail on the Rpi2From the OS point of view, external devices and networkconnections are files that can be written to and read from.When writing to such a special file, the OS sends the data to thecorresponding network deviceWhen reading from such a special file, the OS reads data from thecorresponding network deviceThis file abstraction simplifies network communication, but is alsoa source of additional communication overhead.Therefore, high performance libraries tend to avoid this “softwarestack” of implementing file read/write in the OS, but rather directlyread to and write from the device (in the same way that we will beusing these devices)
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 33 / 193
![Page 35: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/35.jpg)
A network is another I/O device
Mainmemory
I/O bridgeBus interface
ALU
Register file
CPU chip
System bus Memory bus
Disk controller
Graphicsadapter
USBcontroller
Mouse Keyboard MonitorDisk
I/O bus
Expansion slots
Networkadapter
Network
PC
The network can be viewed as just another I/O device.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 34 / 193
![Page 36: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/36.jpg)
The Role of Abstraction
In order to tackle system complexity abstraction is a key concept.For example, an application program interface (API), abstractsfrom the internals of an implementation, and only describes itscore functionality.Java class declaration or C prototypes are programming languagefeatures to facilitate abstraction.The instruction set architecture abstracts over details of thehardware, so that the same instructions can be used for differentrealisations of a processor.On the level of the operating system, key abstractions are
I processes (as abstractions of a running program),I files (as abstractions of I/O), andI virtual memory (as an abstraction of main memory).
A newer form of abstraction is a virtual machine, which abstractsover an entire computer.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 35 / 193
![Page 37: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/37.jpg)
Some abstractions provided by a computer system
Processor Main memory I/O devices
Processes
Files
Virtual memory
Operating system
Virtual machine
Instruction setarchitecture
A major theme in computer systems is to provide abstractrepresentations at different levels to hide the complexity of the actualimplementations.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 36 / 193
![Page 38: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/38.jpg)
Reading List: Systems Programming
David A. Patterson, John L. Hennessy. “Computer Organizationand Design: The Hardware/Software Interface”,ARM edition, Morgan Kaufmann, Apr 2016. ISBN-13:978-0128017333.
Randal E. Bryant, David R. O’Hallaron “Computer Systems: AProgrammers Perspective”,3rd edition, Pearson, 7 Oct 2015. ISBN-13: 978-1292101767.
Bruce Smith “Raspberry Pi Assembly Language: Raspbian”,CreateSpace Independent Publishing Platform; 2 edition, 19 Aug2013. ISBN-13: 978-1492135289.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 37 / 193
![Page 39: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/39.jpg)
Other Online Resources
Gordon Henderson “WiringPi library: GPIO Interface library for theRaspberry Pi,http://wiringpi.com/
Valvers “Bare Metal Programming in C”,http://www.valvers.com/open-software/raspberry-pi/step01-bare-metal-programming-in-cpt1/
Alex Chadwick, Univ of Cambridge “Baking Pi”,https://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 1: Intro to Sys Prg 38 / 193
![Page 40: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/40.jpg)
Lecture 2.Systems Programming with
the Raspberry Pi
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 39 / 193
![Page 41: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/41.jpg)
SoC: System-on-Chip
A System-on-Chip (SoC) integrates all components of acomputer or other electronic system into a single chip.One of the main advantages of SoCs is their low powerconsumption.Therefore they are often used in embedded devices.All versions of the Raspberry Pi are examples of SoCs
Note: In this course we are using the Raspberry Pi 2 Model B. Thelow-level code will only work with this version.
The Raspberry Pi Foundation: https://www.raspberrypi.org/UK registered charity 1129409
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 40 / 193
![Page 42: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/42.jpg)
Raspberry Pi 1 vs 2
The Raspberry Pi version 2 was released on 2nd February 2015. Itscomponents are:
the BCM2836 SoC (System-on-Chip) by Broadcoman ARM-Cortex-A7 CPU with 4 cores (clock frequency: 900MHz)1 GB of DRAMa Videocore IV GPU4 USB ports (sharing the one internal port together with theEthernet connection)power supply through a microUSB port
NB: RPi2 is significantly more powerful than RPi1, which used anARM1176JZ-F single-core at 700MHz clock frequency (as theBCM2835 SoC). However, its network bandwidth is unchanged.NB: The A-series of the ARM architectures is for “application” usageand therefore more powerful than the M-series, which is mainly forsmall, embedded systems.It is possible to safely over-clock the processor up to 950 MHz.
0Material from Raspberry Pi Geek 03/2015Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 41 / 193
![Page 43: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/43.jpg)
Raspberry Pi 2
0Source: https://en.wikipedia.org/wiki/Raspberry PiHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 42 / 193
![Page 44: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/44.jpg)
Software configuration
RPi2 supports several major Linux distributions, including:Raspbian (Debian-based), Arch Linux, Ubuntu, etcThe main system image provided for RPi2 can boot into several ofthese systems and provides kernels for both ARMv6 (RPi1) andARMv7 (RPi2)The basic software configuration is almost the same as on astandard Linux desktopTo tune the software/hardware configuration call
> sudo raspi-config
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 43 / 193
![Page 45: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/45.jpg)
Updating your software under Raspbian
We are using Raspbian 7, which is based on Debian “Wheezy” with aLinux kernel 3.18.There is a more recent version (2017-01-11) out: Raspbian 8, basedon Debian “Jessie” with a Linux kernel 4.4. Highlights:
Uses systemd for starting the system (changes to run-scripts,enabling services).Supports OpenGL and 3D graphics acceleration in anexperimental driver (enable using the raspi-config)
To update the software under Raspbian, do the following:> sudo apt-get update> sudo apt-get upgrade> sudo rpi-update
To find the package foo in the on-line repository, do the following:> sudo apt-cache search foo
To install the package foo in the on-line repository, do the following:> sudo apt-get install foo
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 44 / 193
![Page 46: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/46.jpg)
Virtualisation
In this powerful, multi-core configuration, an RPi2 can be used asa server, running several VMs.To this end RPi2 under Raspbian runs a hypervisor process,mediating hardware access between the VMs.Virtualisation is hardware-supported for the ARMv6 and ARMv7instruction setThe ARMv7 instruction set includes a richer set of SIMD(single-instruction, multiple-data) instructions (the NEONextensions), to use parallelism and speed-up e.g. multi-mediaapplicationsThe NEON instruction allow to perform operations on up to 168-bit values at the same time, through the processor’s support for64-bit and 128-bit registersPerformance improvements in the range of 8 − 16× have beenreported for multi-media applicationsThe usual power consumption of the Ri2 is between 3.5−−4 Watt(compared to ca 3 Watt for the RPi1)To compare the (peak) performance of RPi2 with RPi1, theDhrystone benchmark delievers 875 DMIPS on an RPi 1 and6840 DMIPS on an RPi 2, i.e. ca 7×.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 45 / 193
![Page 47: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/47.jpg)
Raspberry Pi 4
Specification:ARMv8, BCM2837B0, ARM Cortex-A72 CPU 64-bit quad-core@ 1.5GHzUp to 1GB, 2GB or 4GB RAM (LPDDR4)On board dual-band 802.11.b/g/n/ac wireless LANOn board Bluetooth 5.0, low-energy (BLE)Gigabit Ethernet2 × USB 3.0 ports, 2 × USB 2.0 portsExtended 40-pin GPIO header2 × micro-HDMI ports (supporting up to 4Kp60)
0See RPi 4 spec on NewIT pagesHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 46 / 193
![Page 48: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/48.jpg)
CPU Performance Comparison: Hardware
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 47 / 193
![Page 49: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/49.jpg)
CPU Performance Comparison: Measurements
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 48 / 193
![Page 50: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/50.jpg)
CPU Performance Comparison: Measurements
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 48 / 193
Note
RPi2 ca. 7.82× faster than RPi1
Banana Pi M2 is 1.11× faster thanRPi2
Cubox i4Pro is 1.46× fasterODroid C1 is 1.38× faster
Intel i7 PC is 15.5× faster than RPi2
![Page 51: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/51.jpg)
Network performance comparison: RPi 1 vs RPi 2
To compare network performance, encrypted data-transferthrough scp is used.This profits from the quad-core architecture, because one corecan be dedicated to encryption, another core to the actual datatransfer.An increase in network performance by a factor of 2.5× isreported.The highest observed bandwidth on the RPi 2 (with overclockingto 1.05 GHz) is 70 Mbit/s.The theoretcial peak performance of the LAN-port is ca 90 MBit/s.The SunSpider benchmark for rendering web pages, reports up to5× performance improvement.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 49 / 193
![Page 52: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/52.jpg)
Network performance Measurements
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 50 / 193
![Page 53: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/53.jpg)
High-performance Alternatives
There are several single-board computers that provide ahigh-performance alternative to the RPi.These are of interest if you have applications with highcomputational demands and you want to run it on a low-cost andlow-power device.It’s possible to build for example a cluster of such devices as aparallel programming platform: see The Glasgow UniversityRaspberry Pi CloudHere we give an overview of the main performancecharacteristics of three RPi2 alternatives:
I the CuBox i4Pro by SolidRunI the Banana Pi M3 by SinovoipI the Lemaker HiKey by Lemaker
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 51 / 193
![Page 54: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/54.jpg)
Core Specs of the CuBox i4-Pro
Freescale i.MX6 (SoC) quad-core, containing an ARM Cortex A9(ARMv7 instruction set) with 4 coresGC2000 GPU (supports OpenGL etc)4 GB RAM and a micro-SD card slot10/100/1000 Mb/s Ethernet (max 470Mb/s)WLAN (802.11b/g/n)Bluetooth 4.01 USB port and eSATA (3Gb/s) interfacePrice: 124$
SoftwareDebian Linux, Kodi Linux, XBMC Linux
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 52 / 193
![Page 55: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/55.jpg)
Core Specs of the CuBox i4-Pro
Freescale i.MX6 (SoC) quad-core, containing an ARM Cortex A9(ARMv7 instruction set) with 4 coresGC2000 GPU (supports OpenGL etc)4 GB RAM and a micro-SD card slot10/100/1000 Mb/s Ethernet (max 470Mb/s)WLAN (802.11b/g/n)Bluetooth 4.01 USB port and eSATA (3Gb/s) interfacePrice: 124$
SoftwareDebian Linux, Kodi Linux, XBMC Linux
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 52 / 193
![Page 56: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/56.jpg)
Core Specs of the Banana Pi M3
Allwinner A83T (SoC) chip, containing an ARM Cortex-A7(ARMv7 instruction set) with 8 coresPowerVR SGX544MP1 GPU (supports OpenGL etc)2 GB LPDDR3 RAM plus 8 GB eMMC memory and a micro-SDcard slotGigabit EthernetWLAN (802.11b/g/n)Bluetooth 4.02 USB ports and SATA interface40 GPIO pins (not compatible with RPi2)Price: 90e
SoftwareBPI-Berryboot (allegedly with GPU support), or Ubuntu Mate
ExperiencesSATA shares the the USB bus connection and is therefore slowProblems accessing the on-board micro-phone
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 53 / 193
![Page 57: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/57.jpg)
Core Specs of the Banana Pi M3
Allwinner A83T (SoC) chip, containing an ARM Cortex-A7(ARMv7 instruction set) with 8 coresPowerVR SGX544MP1 GPU (supports OpenGL etc)2 GB LPDDR3 RAM plus 8 GB eMMC memory and a micro-SDcard slotGigabit EthernetWLAN (802.11b/g/n)Bluetooth 4.02 USB ports and SATA interface40 GPIO pins (not compatible with RPi2)Price: 90e
SoftwareBPI-Berryboot (allegedly with GPU support), or Ubuntu Mate
ExperiencesSATA shares the the USB bus connection and is therefore slowProblems accessing the on-board micro-phone
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 53 / 193
![Page 58: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/58.jpg)
Core Specs of the Banana Pi M3
Allwinner A83T (SoC) chip, containing an ARM Cortex-A7(ARMv7 instruction set) with 8 coresPowerVR SGX544MP1 GPU (supports OpenGL etc)2 GB LPDDR3 RAM plus 8 GB eMMC memory and a micro-SDcard slotGigabit EthernetWLAN (802.11b/g/n)Bluetooth 4.02 USB ports and SATA interface40 GPIO pins (not compatible with RPi2)Price: 90e
SoftwareBPI-Berryboot (allegedly with GPU support), or Ubuntu Mate
ExperiencesSATA shares the the USB bus connection and is therefore slowProblems accessing the on-board micro-phone
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 53 / 193
![Page 59: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/59.jpg)
Core Specs of the Lemaker Hikey
Kirin 620 (SoC) chip with ARM Cortex A53 and 8 coresARM Mali450-MP4 (supports OpenGL etc) GPU1 or 2 GB LPDDR3 RAM plus 8 GB eMMC memory and amicro-SD card slotWLAN (802.11b/g/n)Bluetooth 4.12 USB ports40 GPIO pins (not compatible with RPi2)Audio and Video via HDMI connectorsBoard-layout matches the 96-board industrial standard forembedded devicesPrice: 120e
SoftwareAndroid variant (part of 96-board initiative)Linaro (specialised Linux version for embedded devices)
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 54 / 193
![Page 60: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/60.jpg)
Banana Pi M3 and Lemaker Hikey: Specs
0Material from Raspberry Pi Geek 04/2016Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 55 / 193
![Page 61: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/61.jpg)
Raspberry Pi 3 and Lemaker Hikey: Performance
Performance as runtime (of sysbench benchmark) and networkbandwidth (using lperf benchmark):
Perf. (runtime) Max Network bandwidthnumber of threads power Ethernet WLAN
1 4 8Raspberry Pi 2 297s 75s —Raspberry Pi 3 182s 45s — 45 Mb/sCubox i4Pro 296s 75s —Banana Pi M3 159s 40s 21s 1.1A 633 Mb/s 2.4 Mb/sLemaker Hikey 12s 3s 2s 1.7A — 37.3 Mb/s
Summary: In terms of performance, the Lemaker Hikey is the bestchoice.
0Material from Raspberry Pi Geek 04/2016Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 56 / 193
![Page 62: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/62.jpg)
Raspberry Pi 3 and Lemaker Hikey: Performancecomparison
To run the (CPU) performance benchmark on the RPi2 do:
> sudo apt-get update> sudo apt-get install sysbench> sysbench --num-threads=1 --cpu-max-prime=10000 --test=cpu
run
0Material from Raspberry Pi Geek 04/2016Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 57 / 193
![Page 63: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/63.jpg)
Core Specs of Odroid-XU4
Exynos 5422 (SoC) Octa big.LITTLE ARM with an ARMCortex-A15 quad-core and an ARM Cortex-A7 quad-coreMali-T628 MP6 GPU2 GB LPDDR3 RAM plus eMMC memory and a micro-SD cardslotGigabit Ethernet1 USB 2.0A and 1 USB 3.0 portVideo via HDMI connectors40 GPIO pins (not compatible with RPi2)Price: 95e
The CPU is the same as in high-end smartphones such as theSamsug Galaxy S5.The big.LITTLE architecture dynamically switches from (faster)Cortex-A15 to (slower) Cortex-A7 to save power.Software: Ubuntu 14.04 or Ubuntu 16.04; Android 4.4.4;OpenMediaVault 2.2.13, Kali Linux, Debian.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 58 / 193
![Page 64: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/64.jpg)
RPi3 vs Odroid-XU4: Specs
0Material from Raspberry Pi Geek 02/2017Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 59 / 193
![Page 65: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/65.jpg)
Odroid-XU4
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 60 / 193
![Page 66: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/66.jpg)
Network performance: RPi3 vs Odroid-XU4
Note: Raw network performance is ca. 5× faster on the ODroid-XU4!Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 61 / 193
![Page 67: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/67.jpg)
Raspberry Pi 3 and ODroid C2:CPU Performance Comparison
0Material from Raspberry Pi Geek 04/2016Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 62 / 193
![Page 68: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/68.jpg)
Raspberry Pi 3 and ODroid C2:I/O Performance Comparison
0Material from Raspberry Pi Geek 04/2016Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 63 / 193
![Page 69: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/69.jpg)
Raspberry Pi 3 and ODroid C2:Network Performance Comparison
0Material from Raspberry Pi Geek 04/2016
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 64 / 193
![Page 70: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/70.jpg)
RPi3 vs Odroid-XU4: Experience
In terms of network-performance, the ODroid-XU4 is much faster.It is a good basis for a NAS (Network attached Storage).In terms of CPU-performance, the Odroid is slightly faster:Cortex-A15 (2.0 GHz) vs Cortex-A53 (1.2 GHz).However, in practice, the GUI is much slower.Based on the gtkperf GUI benchmark, the ODroid is ca. 3×slower.The reason for this difference is more optimisation in the devicedrivers for RPi’s VideoCore IV GPU (compared to ODroid’s MaliGPU).Note: To assess performance and usability, one has to considerthe entire software stack, not just the raw performance of thehardware!
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 65 / 193
![Page 71: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/71.jpg)
Performance comparison: Odroid vs RPi4
0From RPi4 vs Odroid on MightyGadget pagesHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 66 / 193
![Page 72: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/72.jpg)
Orange Pi
Allwinner H3 Soc: Quad-core Cortex-A7 H.265/HEVC 4KMali 400MP2 GPU @ 600MHz1GB DDR3 memory (shared with GPU)8GB EMMC Flash10/100 Ethernet RJ4540 Pins Header,compatible with Raspberry Pi B+Runs: Android, Lubuntu, Debian, Raspbian
Beware of stability and performance of the software!
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 67 / 193
![Page 73: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/73.jpg)
Latest devices
Raspberry Pi4: available since Fall 2019Odroid XU4 or N2: for high performanceOdroid C2 or HC2: for high bandwithOrange Pi H3: cheap but software issues
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 68 / 193
![Page 74: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/74.jpg)
Summary
The Raspberry Pi is one of the most widely-used single-boardcomputers.The RPi comes in several version (1,2,3); we are using theRaspberry Pi 2 model B.There is a rich software eco-system for the RPis and excellent,detailed documentation.A good high-CPU-performance alternatives is: Lemaker HiKeyA good high-network-performance alternative is: Odroid-XU4Check out the Raspberry Pi projects available online.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 2: Sys Prg on RPi 69 / 193
![Page 75: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/75.jpg)
Lecture 3:Memory Hierarchy
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 70 / 193
![Page 76: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/76.jpg)
Memory Hierarchy: Introduction
Some fundamental and enduring properties of hardware andsoftware:
I Fast storage technologies cost more per byte, have less capacity,and require more power (heat!).
I The gap between CPU and main memory speed is widening.I Well-written programs tend to exhibit good locality.
These fundamental properties complement each other beautifully.They suggest an approach for organizing memory andstorage systems known as a memory hierarchy.
0Lecture based on Bryant & O’Hallaron, 3rd edition, Chapter 6Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 71 / 193
![Page 77: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/77.jpg)
Memory Hierarchy
Our view of the main memory so far has been a flat one, ie.access time to all memory locations is constant.In modern architecture this is not the case.In practice, a memory system is a hierarchy of storage deviceswith different capacities, costs, and access times.CPU registers hold the most frequently used data.Small, fast cache memories nearby the CPU act as staging areasfor a subset of the data and instructions stored in the relativelyslow main memory.The main memory stages data stored on large, slow disks, whichin turn often serve as staging areas for data stored on the disks ortapes of other machines connected by networks
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 72 / 193
![Page 78: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/78.jpg)
Caches and Memory Hierarchy
Regs
L1 cache (SRAM)
Main memory(DRAM)
Local secondary storage(local disks)
Larger, slower,
and cheaper (per byte)storagedevices
Remote secondary storage(distributed file systems, Web servers)
Local disks hold files retrieved from disks on remote network servers.
Main memory holds disk blocks retrieved from local disks.
L2 cache (SRAM)
L1 cache holds cache lines retrieved from the L2 cache.
CPU registers hold words retrieved from cache memory.
L2 cache holds cache lines retrieved from L3 cache
L0:
L1:
L2:
L3:
L4:
L5:
Smaller,faster,and
costlier(per byte)storage devices
L3 cache (SRAM)
L3 cache holds cache lines retrieved from memory.
L6:
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 73 / 193
![Page 79: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/79.jpg)
Discussion
As we move from the top of the hierarchy to the bottom, the devicesbecome slower, larger, and less costly per byte.
The main idea of a memory hierarchy is that storage at one levelserves as a cache for storage at the next lower level.
Using the different levels of the memory hierarchy efficiently is crucialto achieving high performance.
Access to levels in the hierarchy can be explicit (for example whenusing OpenCL to program a graphics card), or implicit (in most othercases).
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 74 / 193
![Page 80: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/80.jpg)
The importance of the memory hierarchy
For the programmer this is important because data access timesare very different:
I Register: 0 cyclesI Cache: 1–30 cyclesI Main memory: 50–200 cycles
We want to store data that is frequently accessed high in thememory hierarchy
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 75 / 193
![Page 81: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/81.jpg)
Locality
Principle of Locality: Programs tend to use data and instructionswith addresses near or equal to those they have used recentlyTemporal locality: Recently referenced items are likely to bereferenced again in the near future.Spatial locality: Items with nearby addresses tend to bereferenced close together in time
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 76 / 193
![Page 82: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/82.jpg)
Locality Example: sum-over-array
ulong count; ulong sum;for (count = 0, sum = 0; count<n; count++)
sum += arr[count];res1->count = count;res1->sum = sum;res1->avg = sum/count;
}Data references
I Reference array elements in succession (stride-1 referencepattern). spatial locality
I Reference variable sum each iteration. temporal localityInstruction references
I Reference instructions in sequence. spatial localityI Cycle through loop repeatedly. spatial locality
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 77 / 193
![Page 83: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/83.jpg)
Importance of Locality
Being able to look at code and get a qualitative sense of its locality is akey skill for a professional programmer!
Which of the following two version of sum-over-matrix has betterlocality (and performance):
Traversal by rows: Traversal by columns:int i, j; ulong sum;for (i = 0; i<n; i++)for (j = 0; j<n; j++)
sum += arr[i][j];
int i, j; ulong sum;for (j = 0; j<n; j++)for (i = 0; i<n; i++)sum += arr[i][j];
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 78 / 193
![Page 84: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/84.jpg)
Caches
Cache: A smaller, faster storage device that acts as a stagingarea for a subset of the data in a larger, slower device.Fundamental idea of a memory hierarchy:
I For each k , the faster, smaller device at level k serves as a cachefor the larger, slower device at level k + 1.
Why do memory hierarchies work?I Because of locality, programs tend to access the data at level k
more often than they access the data at level k + 1.I Thus, the storage at level k + 1 can be slower, and thus larger and
cheaper per bit.
Big Idea: The memory hierarchy creates a large pool of storagethat costs as much as the cheap storage near the bottom, but thatserves data to programs at the rate of the fast storage near thetop.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 79 / 193
![Page 85: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/85.jpg)
General Cache Concepts
0From Bryant and O’Hallaron, Ch 6Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 80 / 193
![Page 86: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/86.jpg)
General Cache Concepts: Hit
0From Bryant and O’Hallaron, Ch 6Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 81 / 193
![Page 87: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/87.jpg)
General Cache Concepts: Miss
0From Bryant and O’Hallaron, Ch 6Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 82 / 193
![Page 88: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/88.jpg)
General Cache Concepts: Miss
0From Bryant and O’Hallaron, Ch 6Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 83 / 193
![Page 89: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/89.jpg)
Types of Cache Misses
Cold (compulsory) miss:I Cold misses occur because the cache is empty.
Conflict miss:I Most caches limit blocks at level k+1 to a small subset (sometimes
a singleton) of the block positions at level k.F E.g. Block i at level k+1 must be placed in block (i mod 4) at level k.
I Conflict misses occur when the level k cache is large enough, butmultiple data objects all map to the same level k block.
F E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.
Capacity miss:I Occurs when the set of active cache blocks (working set) is larger
than the cache.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 84 / 193
![Page 90: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/90.jpg)
Examples of Caching in the Memory Hierarchy
0From Bryant and O’Hallaron, Ch 6Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 85 / 193
![Page 91: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/91.jpg)
Summary
The speed gap between CPU, memory and mass storagecontinues to widen.Well-written programs exhibit a property called locality.Memory hierarchies based on caching close the gap by exploitinglocality.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 86 / 193
![Page 92: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/92.jpg)
Principles of Caches
Cache memories are small, fast SRAM-based memoriesmanaged automatically in hardware.
I Hold frequently accessed blocks of main memoryCPU looks first for data in caches (e.g., L1, L2, and L3), then inmain memory.Typical system structure:
Mainmemory
I/ObridgeBus interface
ALU
Register fileCPU chip
System bus Memory bus
Cache memories
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 87 / 193
![Page 93: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/93.jpg)
ARM Cortex A7 Cache Hierarchy
A cache is a small, fast block of memory that sits between the core and main memory. It holdscopies of items in main memory. Accesses to the cache memory happen significantly faster thanthose to main memory. Because the cache holds only a subset of the contents of main memory,it must store both the address of the item in main memory and the associated data. Wheneverthe core wants to read or write a particular address, it will first look for it in the cache. If it findsthe address in the cache, it will use the data in the cache, rather than having to perform anaccess to main memory. This significantly increases the potential performance of the system, byreducing the effect of slow external memory access times. It also reduces the powerconsumption of the system. NB: In many ARM-based systems, access to external memory willtake 10s or 100s of cycles.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 88 / 193
![Page 94: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/94.jpg)
ARMv7-A Memory Hierarchy
See ARM Architcture Reference, Ch A3, Fig A3.6, p.157
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 89 / 193
![Page 95: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/95.jpg)
Caching policies: direct mapping
The caching policy determines how to map addresses (and theircontents) in main memory to locations in the chache.Since the cache is much smaller, several main memory addresseswill be mapped to the same cache location.The role of the caching policy is to avoid such clashes as much aspossible, so that the cache can be used for most memoryread/write operations.The simplest caching policy is a direct mapped cache:
I each location in main memory always maps to a single location inthe cache
I this policy is simple to implement, and therefore requires littlehardware
I a weakness of the policy is, that if two frequently used memoryaddresses map to the same cache address, this results in a lot ofcache misses (“cache thrashing”)
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 90 / 193
![Page 96: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/96.jpg)
Direct mapped cache
0See ARM Programmer’s Guide, Ch 8, Fig 8.4, p 113Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 91 / 193
![Page 97: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/97.jpg)
Caching policies: set-associative
To eliminate the weakness of the direct-mapped caches, a moreflexible set-associative cache can be used.With this policy, one memory location can map to one of severalways in the cache.Conceptually, each way represents a slice of the cache.Therefore, a main memory address can be mapped to any ofthese slices in the cache.Inside one such slice, however, the location is fixed.If the system uses n such slices (“ways”) it is called an n-wayassociative cache.This avoids cache thrashing in cases where no more than nfrequently used variables (memory locations) occur.
NB: The ARM Cortex A7 uses a 4-way set associative data cache,with cache size of 32kB, and a cache line size of 8 words
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 92 / 193
![Page 98: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/98.jpg)
Set-associative cache
0See ARM Programmer’s Guide, Ch 8, Fig 8.5, p 115Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 93 / 193
![Page 99: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/99.jpg)
ARM cache features
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 94 / 193
![Page 100: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/100.jpg)
ARM cache features
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 95 / 193
![Page 101: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/101.jpg)
ARM Cortex A7 Structure
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 96 / 193
![Page 102: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/102.jpg)
Example: Cache friendly code
See the background reading material on the web page:Web aside on blocking in matrix multiplication
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 97 / 193
![Page 103: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/103.jpg)
Summary: Memory Hierarchy
In modern architectures the main memory is arranged in ahierarchy of levels (“memory hierarchy”).Levels higher in the hierarchy (close to the processor) have fastaccess time but small capacity.Levels lower in the hierarchy (further from the processor) haveslow access time but large capacity.Modern systems provide hardware (caches) and software(paging; configurable caching policies) support for managing thedifferent levels in the hierarchy.The simplest caching policy uses direct mappingModern ARM architectures use a more sophisticated setassociative cache, that reduces “cache thrashing”.For a programmer it’s important to be aware of the impact ofspatial and temporal locality on the performance of the program.Making good use of the cache can reduce runtime by a factor ofca. 3 as in our example of blocked matrix multiplication.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 3: Memory Hierarchy 98 / 193
![Page 104: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/104.jpg)
Lecture 5.Exceptional Control Flow and
Signals
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 99 / 193
![Page 105: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/105.jpg)
What are interrupts and why do we need them?
In order to deal with internal or external events, abrupt changes incontrol flow are needed.Such abrupt changes are also called exceptional control flow(ECF).Informally, these are known as hardware- andsoftware-interrupts.The system needs to take special action in these cases (callinterrupt handlers, use non-local jumps)
0Lecture based on Bryant and O’Hallaron, Ch 8Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 100 / 193
![Page 106: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/106.jpg)
ECF on different levels
ECF occurs at different levels:hardware level: e.g. arithmetic overflow events detected by thehardware trigger abrupt control transfers to exception handlersoperating system: e.g. the kernel transfers control from one userprocess to another via context switches.application level: a process can send a signal to another processthat abruptly transfers control to a signal handler in the recipient.
In this class we will cover an overview of ECF with examples fromthe operating system level.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 101 / 193
![Page 107: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/107.jpg)
Handling ECF on different levels
ECF is dealt with in different ways:hardware level: call an interrupt routine, typ. in Assembleroperating system: call a signal handler, typ. in Capplication level: call an exception handler, e.g. in a Javacatch block
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 102 / 193
![Page 108: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/108.jpg)
Exceptions
DefinitionAn exception is an abrupt change in the control flow in response tosome change in the processor’s state.
A change in the processor’s state (event) triggers an abrupt control transfer(an exception) from the application program to an exception handler. After itfinishes processing, the handler either returns control to the interruptedprogram or aborts.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 103 / 193
![Page 109: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/109.jpg)
Exceptions (cont’d)
When the processor detects that the event has occurred, it makes anindirect procedure call (the exception), through a jump table called anexception table, to an operating system subroutine (the exceptionhandler) that is specifically designed to process this particular kind ofevent.When the exception handler finishes processing, one of three thingshappens, depending on the type of event that caused the exception:
The handler returns control to the current instruction, i.e. theinstruction that was executing when the event occurred.The handler returns control to the instruction that would haveexecuted next had the exception not occurred.The handler aborts the interrupted program.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 104 / 193
![Page 110: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/110.jpg)
Exceptions (cont’d)
When the processor detects that the event has occurred, it makes anindirect procedure call (the exception), through a jump table called anexception table, to an operating system subroutine (the exceptionhandler) that is specifically designed to process this particular kind ofevent.When the exception handler finishes processing, one of three thingshappens, depending on the type of event that caused the exception:
The handler returns control to the current instruction, i.e. theinstruction that was executing when the event occurred.The handler returns control to the instruction that would haveexecuted next had the exception not occurred.The handler aborts the interrupted program.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 104 / 193
![Page 111: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/111.jpg)
Interrupt handling
The interrupt handler returns control to the next instruction in theapplication program’s control flow.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 105 / 193
![Page 112: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/112.jpg)
Exception Handling
Exception Handling requires close cooperation between software andhardware.
Each type of possible exception in a system is assigned a uniquenonnegative integer exception number.Some of these numbers are assigned by the designers of theprocessor. Other numbers are assigned by the designers of theoperating system kernel.At system boot time (when the computer is reset or powered on),the operating system allocates and initializes a jump table calledan exception table, so that entry k contains the address of thehandler for exception k .At run time (when the system is executing some program), theprocessor detects that an event has occurred and determines thecorresponding exception number k . The processor then triggersthe exception by making an indirect procedure call, throughentry k of the exception table, to the corresponding handler.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 106 / 193
![Page 113: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/113.jpg)
Exception table
Exceptiontable
01
2 ...
n-1
Code for exception handler 0
Code for exception handler 0
Code for exception handler 1
Code for exception handler 1
Code forexception handler 2
Code forexception handler 2
Code for exception handler n-1
Code for exception handler n-1
...The exception table is a jump table where entry k contains the addressof the handler code for exception k .
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 107 / 193
![Page 114: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/114.jpg)
Differences between exception handlers andprocedure calls
Calling an exception handler is similar to calling a procedure/method,but there are some important differences:
Depending on the class of exception, the return address is eitherthe current instruction or the next instruction.The processor also pushes some additional processor state ontothe stack that will be necessary to restart the interrupted programwhen the handler returns.If control is being transferred from a user program to the kernel, allof these items are pushed onto the kernel’s stack rather than ontothe user’s stack.Exception handlers run in kernel mode, which means they havecomplete access to all system resources.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 108 / 193
![Page 115: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/115.jpg)
Classes of exceptions
Exceptions can be divided into four classes: interrupts, traps, faults,and aborts:
Class Cause (A)Sync Return behaviorInterrupt Signal from I/O device Async Always returns to next instrTrap Intentional exception Sync Always returns to next instrFault Potent. recoverable error Sync Might return to current instrAbort Nonrecoverable error Sync Never returns
It is useful to distinguish 2 reasons for an exceptional control flow:an exception is any unexpected change in control flow;e.g. arithmetic overflow, using an undefined instruction, hardwaretimeran interrupt is an unexpected change in control flow triggered byan external event;e.g. I/O device request, hardware malfunction
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 109 / 193
![Page 116: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/116.jpg)
Traps and System Calls
Traps are intentional exceptions that occur as a result ofexecuting an instruction.Traps are often used as an interface between application programand OS kernel.Examples: reading a file (read), creating a new process (fork),loading a new program (execve), or terminating the currentprocess (exit).Processors provide a special “syscall n” instruction.This is exactly the SWI instruction on the ARM processor.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 110 / 193
![Page 117: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/117.jpg)
Trap Handling
The trap handler returns control to the next instruction in theapplication program’s control flow.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 111 / 193
![Page 118: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/118.jpg)
Faults
Faults result from error conditions that a handler might be ableto correct.Note that after fault handling, the processor typically reexecutesthe same instruction.Example: page fault exception.
I Assume an instruction references a virtual address whosecorresponding physical page is not in memory.
I In this case page fault is triggered.I The fault handler loads the required page into main memory.I After that the same instruction needs to be executed again.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 112 / 193
![Page 119: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/119.jpg)
Fault handling
Depending on whether the fault can be repaired or not, the faulthandler either reexecutes the faulting instruction or aborts.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 113 / 193
![Page 120: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/120.jpg)
Aborts
Aborts result from unrecoverable fatal errors, typically hardware errorssuch as parity errors that occur when DRAM or SRAM bits arecorrupted. Abort handlers never return control to the applicationprogram.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 114 / 193
![Page 121: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/121.jpg)
Common system calls
Number Name Description1 exit Terminate process2 fork Create new process3 read Read file4 write Write file5 open Open file6 close Close file7 waitpi Wait for child to terminate11 execve Load and run program19 lseek Go to file offset20 getpid Get process ID
0For a more complete list see Smith, Appendix B “Raspbian System Calls”Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 115 / 193
![Page 122: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/122.jpg)
Common system calls
Number Name Description27 alarm Set signal delivery alarm clock29 pause Suspend process until signal arrives37 kill Send signal to another process48 signal Install signal handler63 dup2 Copy file descriptor64 getppid Get parent’s process ID65 getpgrp Get process group67 sigaction Install portable signal handler90 mmap Map memory page to file106 stat Get information about file
0For the truly complete list see /usr/include/sys/syscall.hHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 116 / 193
![Page 123: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/123.jpg)
Signal handlers in C
UNIX signals are a higher-level software form of exceptional controlflow, that allows processes and the kernel to interrupt other processes.
Signals provide a mechanism for exposing the occurrence of suchexceptions to user processes.For example, if a process attempts to divide by zero, then thekernel sends it a SIGFPE signal (number 8).Other signals correspond to higher-level software events in thekernel or in other user processes.
0From Bryant and O’Hallaron, Sec 8.5Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 117 / 193
![Page 124: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/124.jpg)
Signal handlers in C (cont’d)
For example, if you type a ctrl-c (i.e. press the ctrl key and thec key at the same time) while a process is running in theforeground, then the kernel sends a SIGINT (number 2) to theforeground process.A process can forcibly terminate another process by sending it aSIGKILL signal (number 9).When a child process terminates or stops, the kernel sends aSIGCHLD signal (number 17) to the parent.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 118 / 193
![Page 125: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/125.jpg)
Signal handling
Receipt of a signal triggers a control transfer to a signal handler. Afterit finishes processing, the handler returns control to the interruptedprogram.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 119 / 193
![Page 126: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/126.jpg)
Example: handling ctrl-c
// header files#include <stdio.h>#include <stdlib.h>#include <signal.h>
void ctrlc_handler(int sig) {fprintf(stderr, "Received signum %d; thank you for pressing
CTRL-C\n", sig);exit(1);
}
int main() {signal(SIGINT, ctrlc_handler); // install the signal handlerwhile (1) { } ; // infinite loop
}
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 120 / 193
![Page 127: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/127.jpg)
Example: handling ctrl-c in more detail
See signal2.c
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 121 / 193
![Page 128: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/128.jpg)
Example: sending SIGALARM by the kernel
/* signal handler, i.e. the fct called when a signal isreceived */
void handler(int sig){
static int beeps = 0;
printf("BEEP %d\n", beeps+1);if (++beeps < 5)
alarm(1); /* Next SIGALRM will be delivered in 1 second
*/else {
printf("BOOM!\n");exit(1);
}}
int main() {signal(SIGALRM, handler); /* install SIGALRM handler; see:
man 2 signal */alarm(1); /* Next SIGALRM will be delivered in 1s; see: man
2 alarm */
while (1) { /* nothing */ ; /* Signal handler returnscontrol here each time */
}exit(0);
}
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 122 / 193
![Page 129: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/129.jpg)
Timers
We now want to use timers, i.e. setting up an interrupt in regularintervals.The BCM2835 chip as an on-board timer for time-sensitiveoperations.We will explore three ways of achieving this:
I using C library calls (on top of Raspbian)I using assembler-level system calls (to the kernel running inside
Raspbian)I by directly probing the on-chip timer available on the RPi2
In this section we will cover how to use the on-chip timer toimplement a simple timeout function in C
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 123 / 193
![Page 130: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/130.jpg)
Overview
Features of the different approaches:C library calls (on top of Raspbian)
I are portable across hardware and OSI require a (system) library for handling the timer
assembler-level system calls (to the kernel running insideRaspbian)
I depend on the OS, but are portable across hardwareI require a support for software-interrupts in the OS kernel
directly probing the on-chip timer available on the RPi2I depend on both hardware and OSI the instructions for probing a hardware timer are specific to the
hardware
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 124 / 193
![Page 131: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/131.jpg)
Example: C library functions for controlling timers
getitimer, setitimer - get or set value of an interval timer#include <sys/time.h>
int getitimer(int which, struct itimerval *curr_value);int setitimer(int which, const struct itimerval *
new_value,struct itimerval *old_value);
setitimer sets up an interval timer that issues a signal in an intervalspecified by the new value argument, with this structure:struct itimerval {
struct timeval it_interval; /* next value */struct timeval it_value; /* current value */
};struct timeval {
time_t tv_sec; /* seconds */suseconds_t tv_usec; /* microseconds */
};
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 125 / 193
![Page 132: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/132.jpg)
C library functions for controlling timers
There are three kinds of timers, specified by the which argument:ITIMER REAL decrements in real time, and delivers SIGALRMupon expiration.ITIMER VIRTUAL decrements only when the process isexecuting, and delivers SIGVTALRM upon expiration.ITIMER PROF decrements both when the process executes andwhen the system is executing on behalf of the process. Coupledwith ITIMER VIRTUAL, this timer is usually used to profile thetime spent by the application in user and kernel space. SIGPROFis delivered upon expiration.
0See: man getitimerHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 126 / 193
![Page 133: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/133.jpg)
Programming a C-level signal handler
Signals (or software interrupts) can be programmed on C level byassociating a C function with a signal sent by the kernel.sigaction - examine and change a signal action#include <signal.h>int sigaction(int signum, const struct sigaction *act,
struct sigaction *oldact);
The sigaction structure is defined as something like:struct sigaction {
void (*sa_handler)(int);void (*sa_sigaction)(int, siginfo_t *,
void *);sigset_t sa_mask;int sa_flags;void (*sa_restorer)(void);
};NB: the sa handler or sa sigaction fields define the action to beperformed when the signal with the id signum is sent.
0See man sigactionHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 127 / 193
![Page 134: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/134.jpg)
Programming Timers using C library calls
We need the following headers:
#include <signal.h>#include <stdio.h>#include <stdint.h>#include <string.h>#include <sys/time.h>
// in micro-sec#define DELAY 250000
0Sample source in itimer11.cHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 128 / 193
![Page 135: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/135.jpg)
Programming Timers using C library calls
int main (){struct sigaction sa;struct itimerval timer;
fprintf(stderr, "configuring a timer with a delay of %dmicro-seconds ...\n", DELAY);
/* Install timer_handler as the signal handler forSIGALRM. */
memset (&sa, 0, sizeof (sa));sa.sa_handler = &timer_handler;sigaction (SIGALRM, &sa, NULL);
Calling sigaction like this, causes the function timer handler tobe called whenever signal SIGALRM arrives.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 129 / 193
![Page 136: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/136.jpg)
Programming Timers using C library calls
Now, we need to set-up a timer to send SIGALRM every DELAYmicro-seconds:
/* Configure the timer to expire after 250 msec... */timer.it_value.tv_sec = 0;timer.it_value.tv_usec = DELAY;/* ... and every 250 msec after that. */timer.it_interval.tv_sec = 0;timer.it_interval.tv_usec = DELAY;/* Start a real timer. It counts down whenever this
process is executing. */setitimer (ITIMER_REAL, &timer, NULL);
/* A busy loop, doing nothing but accepting signals */while (1) {} ;
}
0Sample source in itimer11.cHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 130 / 193
![Page 137: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/137.jpg)
Further Reading & Hacking
Randal E. Bryant, David R. O’Hallaron “Computer Systems: AProgrammers Perspective”,3rd edition, Pearson, 7 Oct 2015. ISBN-13: 978-1292101767.Chapter 8: Exceptional Control Flow
David A. Patterson, John L. Hennessy. “Computer Organizationand Design: The Hardware/Software Interface”,ARM edition, Morgan Kaufmann, Apr 2016. ISBN-13:978-0128017333.Section 4.9: Exceptions
tewart Weiss. “UNIX Lecture Notes”Chapter 5: Interactive Programs and SignalsDepartment of Computer Science, Hunter College, 2011
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 131 / 193
![Page 138: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/138.jpg)
Summary
Interrupts trigger an exceptional control flow, to deal withspecial situations.Interrupts can occur at several levels:
I hardware level, e.g. to report hardware faultsI OS level, e.g. to switch control between processesI application level, e.g. to send signals within or between processes
The concept is the same on all levels: execute a short sequenceof code, to deal with the special situation.Depending on the source of the interrupt, execution will continuewith the same, the next instruction or will be aborted.The mechanisms how to implement this behaviour are different:in software on application level, in hardware with jumps to entriesin the interrupt vector table on hardware level
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 5: Exceptional Control Flow 132 / 193
![Page 139: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/139.jpg)
Lecture 6:Computer Architecture
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 133 /
193
![Page 140: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/140.jpg)
Classes of Computer Architectures
There is a wide range of computer architectures from small-scale(embedded) to large-scale (super-computers)In this course we focus on embedded systemsA key requirement for these devices is low power consumptionThis is also increasingly important for main-stream hardware andeven for super-computingEmbedded devices are found in cars, planes, house-hold devices,network-devices, cell-phones etcThis is the most rapidly growing market for computer hardware
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 134 /
193
![Page 141: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/141.jpg)
Number of processors produced
0From Patterson & Hennessy, Chapter 1Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 135 /193
![Page 142: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/142.jpg)
Limitations to further improvements
0From Patterson & Hennessy, Chapter 1Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 136 /193
![Page 143: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/143.jpg)
Processor Architectures: Introduction
In this part we take a brief look at the design of processorhardware.This view will give you a better understanding of how computerswork.In particular you will gain a better understanding of issues relevantto resource consumption.So far we have used a very simple model of a CPU: eachinstruction is fetched and executed to completion before the nextone begins.Modern processor architectures use pipeling to execute multipleinstructions simultaneously (“super-scalar architectures”).Special measures need to be taken to ensure that the processorcomputes the same results as it would with sequential execution.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 137 /
193
![Page 144: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/144.jpg)
A simple picture of the CPU
The ALU executes arithmetic/logic operations with arguments inregistersLoad and store instructions move data between memory andregisters
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 138 /
193
![Page 145: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/145.jpg)
A simple picture of the CPU
The ALU executes arithmetic/logic operations with arguments inregistersLoad and store instructions move data between memory andregisters
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 138 /
193
![Page 146: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/146.jpg)
Why should you learn about architecture design?
It is intellectually interesting and important.Understanding how the processor works aids in understandinghow the overall computer system works.Although few people design processors, many design hardwaresystems that contain processors.You just might work on a processor design.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 139 /
193
![Page 147: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/147.jpg)
Stages of executing an assembler instruction
Processing an assembler instruction involves a number of operations:1 Fetch: The fetch stage reads the bytes of an instruction from
memory, using the program counter (PC) as the memory address.2 Decode: The decode stage reads up to two operands from the
register file.3 Execute: In the execute stage, the arithmetic/logic unit (ALU)
either performs the operation specified by the instruction,computes the effective address of a memory reference, orincrements or decrements the stack pointer.
4 Memory: The memory stage may write data to memory, or it mayread data from memory.
5 Write back: The write-back stage writes up to two results to theregister file.
6 PC update: The PC is set to the address of the next instruction.NB: The processing depends on the instruction, and certain stagesmay not be used.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 140 /
193
![Page 148: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/148.jpg)
Unpipelined computation hardware
Combinationallogic
Reg
300 ps 20 ps
Clock
Delay = 320 psThroughput = 3.12 GIPS
Time
I1
I2
I3
(a) Hardware: Unpipelined
(b) Pipeline diagram
On each 320 ps cycle, the system spends 300 ps evaluating acombinational logic function and 20 ps storing the results in an outputregister.
0From Bryant, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 141 /193
Unpipelined Design
Delay: 320 psThroughput: 3.12 GIPS
![Page 149: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/149.jpg)
Instruction-level parallelism
Key observation: We can do the different stages of the executionin parallel (“instruction-level parallelism”)An architecture that allows this kind of parallelism is called“pipelined” architectureThis is a big performance boost: ideally each instruction takes just1 cycle (as opposed to 5 cycles for the 5 stages of the execution)However, the ideal case is often not reached, and modernarchitecture play clever tricks to get closer to the ideal case:branch prediction, out-of-order execution etc
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 142 /
193
![Page 150: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/150.jpg)
Three-stage pipelined computation hardware
The computation is split into stages A, B, and C. On each 120-pscycle, each instruction progresses through one stage.
0From Bryant, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 143 /193
Three-stage Pipeline
Delay: 360 psThroughput: 8.33 GIPS
![Page 151: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/151.jpg)
Three-stage pipeline timing
The rising edge of the clock signal controls the movement ofinstructions from one pipeline stage to the next.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 144 /
193
![Page 152: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/152.jpg)
Example: One clock cycle of pipeline operation.
We now take a closer look on how values are propagated throughthe pipeline.Instruction I1 has completed stage BInstruction I2 has completed stage A
0From Bryant, Chapter 4, Fig 4.35Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 145 /193
![Page 153: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/153.jpg)
Example: One clock cycle of pipeline operation.
Just before clock rise: values have been computed (stage A ofinstruction I2, stage B of instruction I1), but the pipeline registers havenot been updated, yet.
0From Bryant, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 146 /193
![Page 154: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/154.jpg)
Example: One clock cycle of pipeline operation.
On clock rise, inputs are loaded into the pipeline registers.
0From Bryant, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 147 /193
![Page 155: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/155.jpg)
Example: One clock cycle of pipeline operation.
Signals then propagate through the combinational logic (possibly atdifferent rates).
0From Bryant, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 148 /193
![Page 156: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/156.jpg)
Example: One clock cycle of pipeline operation.
Before time 360, the result values reach the inputs of the pipelineregisters, to be propagated at the next rising clock.
0From Bryant, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 149 /193
![Page 157: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/157.jpg)
Multiple-clock-cycle pipeline diagram
0From Patterson & Hennessy, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 150 /193
![Page 158: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/158.jpg)
Abstract view of a sequential processor
The information processedduring execution of an in-struction follows a clockwiseflow starting with an instruc-tion fetch using the programcounter (PC), shown in thelower left-hand corner of thefigure.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 151 /
193
![Page 159: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/159.jpg)
Discussion of pipelined execution
The main pipeline stages are:Fetch: Using the program counter register as an address, theinstruction memory reads the bytes of an instruction. The PCincrementer computes valP, the incremented program counter.Decode: The register file has two read ports, A and B, via whichregister values valA and valB are read simultaneously.Execute: This uses the arithmetic/logic (ALU) unit for differentpurposes according to the instruction type: integer operations,memory access, or branch instructions.Memory: The Data Memory unit reads or writes a word ofmemory (memory instruction). The instruction and data memoriesaccess the same memory locations, but for different purposes.Write back: The register file has two write ports. Port E is used towrite values computed by the ALU, while port M is used to writevalues read from the data memory.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 152 /
193
![Page 160: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/160.jpg)
Abstract view of a pipelined processor
Hardware structure of apipelined implementation.By inserting pipeline regis-ters between the stages, wecreate a five-stage pipeline.
0From Bryant, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 153 /193
![Page 161: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/161.jpg)
Pipeline registersThe pipeline registers are labeled as follows:
F holds a predicted value of the program counter.D sits between the fetch and decode stages. It holds informationabout the most recently fetched instruction for processing by thedecode stage.E sits between the decode and execute stages. It holdsinformation about the most recently decoded instruction and thevalues read from the register file for processing by the executestage.M sits between the execute and memory stages. It holds theresults of the most recently executed instruction for processing bythe memory stage. It also holds information about branchconditions and branch targets for processing conditional jumps.W sits between the memory stage and the feedback paths thatsupply the computed results to the register file for writing and thereturn address to the PC selection logic when completing a returninstruction.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 154 /
193
![Page 162: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/162.jpg)
Example of instruction flow through pipeline
0From Bryant, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 155 /
193
![Page 163: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/163.jpg)
The ARM picture
The pipeline in the BCM2835 SoC for the RPi has 8 pipeline stages:1 Fe1: The first Fetch stage, where the address is sent to memory
and an instruction is returned.2 Fe2: Second fetch stage, where the processor tries to predict the
destination of a branch.3 De: Decoding the instruction.4 Iss: Register read and instruction issue5 Only for ALU operations:
1 Sh: Perform shift operations as required.2 ALU: Perform arithmetic/logic operations.3 Sat: Saturate integer results.
6 WBi: Write back of data from any of the above sub-pipelines.
0See slidesRPiArch and the table in Smith’s bookHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 156 /193
![Page 164: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/164.jpg)
The ARM picture
The pipeline in the BCM2835 SoC for the RPi has 8 pipeline stages:1 Fe1: The first Fetch stage, where the address is sent to memory
and an instruction is returned.2 Fe2: Second fetch stage, where the processor tries to predict the
destination of a branch.3 De: Decoding the instruction.4 Iss: Register read and instruction issue5 Only for Multiply operations:
1 MAC1: First stage of the multiply-accumulate pipeline.2 MAC2: Second stage of the multiply-accumulate pipeline.3 MAC3: Third stage of the multiply-accumulate pipeline.
6 WBi: Write back of data from any of the above sub-pipelines.
0See slidesRPiArch and the table in Smith’s bookHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 156 /193
![Page 165: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/165.jpg)
The ARM picture
The pipeline in the BCM2835 SoC for the RPi has 8 pipeline stages:1 Fe1: The first Fetch stage, where the address is sent to memory
and an instruction is returned.2 Fe2: Second fetch stage, where the processor tries to predict the
destination of a branch.3 De: Decoding the instruction.4 Iss: Register read and instruction issue5 Only for Load/Store operations:
1 ADD: Address generation stage.2 DC1: First stage of data cache access.3 DC2: Second stage of data cache access.
6 WBi: Write back of data from any of the above sub-pipelines.
0See slidesRPiArch and the table in Smith’s bookHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 156 /193
![Page 166: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/166.jpg)
Pipelining and branches
How can a pipelined architecture deal with conditional branches?In this case the processor doesn’t know the successor instructionuntil further down the pipeline.To deal with this, modern architectures perform some form ofbranch prediction in hardware.There are two forms of branch prediction:
I static branch prediction always takes the same guess (e.g. guessalways taken)
I dynamic branch prediction uses the history of the execution to takebetter guesses
Performance is significantly higher when branch predictions arecorrectIf they are wrong, the processor needs to stall or inject bubblesinto the pipeline
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 157 /
193
![Page 167: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/167.jpg)
Example: bad branch prediction
.global _start
.text_start: EORS R1, R1, R1 @ always 0
BNE target @ Not takenMOV R0, #11 @ fall throughMOV R7, #1SWI 0
target: MOV R0, #1MOV R7, #1SWI 0
Branch prediction: we assume the processor takes an always takenpolicy, i.e. it always assumes that that a branch is takenNB: the conditional branch (BNE) will never be taken, becauseexclusive-or with itself always gives 0, i.e. this is a deliberately badexample for the branch predictor
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 158 /
193
![Page 168: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/168.jpg)
Processing mispredicted branch instructions.
Predicting “branch taken”, instruction 0x014 is fetched in cycle 3,and instruction 0x018 is fetched in cycle 4.In cycle 4 the branch logic detects that the branch is not takenIt therefore abandons the execution of 0x014 and 0x018 byinjecting bubbles into the pipeline.The result will be as expected, but performance is sub-optimal!
0Adapted from Bryant, Figure 4.62Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 159 /193
![Page 169: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/169.jpg)
Example of bad branch prediction
Code example: sumav3 asm
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 160 /
193
![Page 170: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/170.jpg)
Hazards of Pipelining
Pipelining complicates the processing of instructions because of:I Control hazards, where branches are mis-predicted (as we have
seen)I Data hazards, where data dependencies exist between
subsequent instructionsSeveral ways exist to solve these problems:
I To deal with control hazards, branch prediction is used and, ifnecessary, partially executed instructions are abandoned.
I To deal with data hazards, bubbles can be injected to delay theexecution of instructions, or data in pipeline registers (but notwritten back) can be forwarded to other stages in the pipeline.
A lot of the complexities in modern processors is due to deeppipelining, (possibly dynamic) branch prediction, and forwarding ofdata
For details on pipelining and data hazards, see Bryant & O’Hallaron,Computer Systems: A Programmer’s View, Chapter 4 (especiallySec 4.4 and 4.5).
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 161 /
193
![Page 171: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/171.jpg)
Data Hazards
The branch-prediction example above was a case of a controlhazard.Now we look into a simple example of a data hazard.Consider the following simple ARM assembler program:
ADD R3, R1, R2 @ R3 = R1 + R2SUB R0, R3, R4 @ R0 = R3 - R4
Note, the result from the first instruction, in R3, will only becomeavailable in the write-back (5th) stageBut, the data in R3 is needed already in the decode (2nd) stage ofthe second instructionWithout intervention, this would stall the pipeline, similar to thebranch-mis-prediction caseThe solution to this is to introduce forwarding (or by-passing) tothe hardware of the processor
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 162 /
193
![Page 172: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/172.jpg)
A Graphical Representation of Forwarding
0From Patterson & Hennessy, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 163 /193
![Page 173: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/173.jpg)
Example: Reordering Code to Avoid Pipeline Stalls
We have previously examined, how C expressions are compiled toAssembler code. For example, consider this C program fragment:
int a, b, c, d, e, f;a = b + e;c = b + f;
Knowing about control and data hazards motivates reordering ofcode that should be done by the compiler to avoid pipeline stalls.Such reordering is commonly done in the backend of compilers.Therefore, the sequence of Assembler instructions might bedifferent from the one you expect.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 164 /
193
![Page 174: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/174.jpg)
Data layout and code for a C expression
0From Patterson & Hennessy, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 165 /193
![Page 175: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/175.jpg)
Example: Reordering Code to Avoid Pipeline Stalls
Example: Translate the following C expression into Assembler:
int a, b, c, d, e, f;a = b + e;c = b + f;
Example: We assume the variables are stored in memory, startingfrom the location held in register R0. Here is the naive Assembler code:
LDR R1, [R0, #4] @ load bLDR R2, [R0, #16] @ load eADD R3, R1, R2 @ b + eSTR R3, [R0, #0] @ store aLDR R4, [R0, #20] @ load fADD R5, R1, R4 @ b + fSTR R5, [R0, #12] @ store c
Can you spot the data hazard in this example?Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 166 /193
![Page 176: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/176.jpg)
Example: Reordering Code to Avoid Pipeline Stalls
Example: Translate the following C expression into Assembler:
int a, b, c, d, e, f;a = b + e;c = b + f;
Example: We assume the variables are stored in memory, startingfrom the location held in register R0. Here is the naive Assembler code:
LDR R1, [R0, #4] @ load bLDR R2, [R0, #16] @ load eADD R3, R1, R2 @ b + eSTR R3, [R0, #0] @ store aLDR R4, [R0, #20] @ load fADD R5, R1, R4 @ b + fSTR R5, [R0, #12] @ store c
Can you spot the data hazard in this example?Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 166 /193
![Page 177: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/177.jpg)
A Graphical Representation of a Load-Store Hazard
0From Patterson & Hennessy, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface
Lec 6: Computer Architecture 167 /193
![Page 178: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/178.jpg)
Example: Reordering Code to Avoid Pipeline Stalls
Example: Translate the following C expression into Assembler:
int a, b, c, d, e, f;a = b + e;c = b + f;
Example: The reordered Assembler code, eliminating the data hazard:
LDR R1, [R0, #4] @ load bLDR R2, [R0, #16] @ load eLDR R4, [R0, #20] @ load f; moved_upADD R3, R1, R2 @ b + eSTR R3, [R0, #0] @ store aADD R5, R1, R4 @ b + fSTR R5, [R0, #12] @ store c
Moving the third LDR instruction upward, makes its result availablesoon enough to avoid a pipeline stall.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 168 /
193
![Page 179: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/179.jpg)
Summary: Processor Architecture and Pipelining
Modern (“super-scalar”) processors can execute severalinstructions at the same time, by organising the execution of aninstruction into several stages and using a pipeline structure.This exploits instruction-level parallelism and boostsperformance.However, there is a risk of control and data hazards, leading toreduced performance, e.g. due to poor branch predictionKnowing these risks, you can develop faster code!These code transformations are often done internally by thecompiler.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software InterfaceLec 6: Computer Architecture 169 /
193
![Page 180: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/180.jpg)
Lecture 10:Revision
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 170 / 193
![Page 181: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/181.jpg)
A simple picture of the CPU
The ALU executes arithmetic/logic operations with arguments inregistersLoad and store instructions move data between memory andregisters
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 171 / 193
![Page 182: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/182.jpg)
A simple picture of the CPU
The ALU executes arithmetic/logic operations with arguments inregistersLoad and store instructions move data between memory andregisters
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 171 / 193
![Page 183: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/183.jpg)
Three-stage pipelined computation hardware
The computation is split into stages A, B, and C. The stages fordifferent instructions can be executed in an overlapping way.
0From Bryant, Chapter 4Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 172 / 193
![Page 184: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/184.jpg)
Stages of executing an assembler instruction
Processing an assembler instruction involves a number of operations:1 Fetch: The fetch stage reads the bytes of an instruction from
memory, using the program counter (PC) as the memory address.2 Decode: The decode stage reads up to two operands from the
register file.3 Execute: In the execute stage, the arithmetic/logic unit (ALU)
either performs the operation specified by the instruction,computes the effective address of a memory reference, orincrements or decrements the stack pointer.
4 Memory: The memory stage may write data to memory, or it mayread data from memory.
5 Write back: The write-back stage writes up to two results to theregister file.
6 PC update: The PC is set to the address of the next instruction.NB: The processing depends on the instruction, and certain stagesmay not be used.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 173 / 193
![Page 185: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/185.jpg)
Pipelining and branches
How can a pipelined architecture deal with conditional branches?In this case the processor doesn’t know the successor instructionuntil further down the pipeline.To deal with this, modern architectures perform some form ofbranch prediction in hardware.There are two forms of branch prediction:
I static branch prediction always takes the same guess (e.g. guessalways taken)
I dynamic branch prediction uses the history of the execution to takebetter guesses
Performance is significantly higher when branch predictions arecorrectIf they are wrong, the processor needs to stall or inject bubblesinto the pipeline
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 174 / 193
![Page 186: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/186.jpg)
Example: bad branch prediction
.global _start
.text_start: MOVS R1, #0 @ load 0 =>
LSR R1, #1 @ LSR yields zero =>BNE target @ Not takenMOV R0, #0 @ fall throughMOV R7, #1SWI 0
target: MOV R0, #1 @ return: branch taken?MOV R7, #1SWI 0
Branch prediction: we assume the processor takes an always takenpolicy, i.e. it always assumes that that a branch is takenNB: the conditional branch (BNE) will NOT be taken, because the rightshift (LSR) will set the zero flag according to the right-most bit, which is0 in this case. This is a deliberately bad example for the branchpredictorHans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 175 / 193
![Page 187: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/187.jpg)
Example: good branch prediction
.text_start: MOVS R1, #1 @ load 1 =>
LSR R1, #1 @ LSR yields one =>BNE target @ Branch takenMOV R0, #0 @ fall throughMOV R7, #1SWI 0
target: MOV R0, #1 @ return: branch taken?MOV R7, #1SWI 0
Branch prediction: we assume the processor takes an always takenpolicy, i.e. it always assumes that that a branch is takenNB: now the conditional branch (BNE) WILL be taken, because theright shift (LSR) will set the zero flag according to the right-most bit,which is 1 in this case. This is better for the branch predictor and givesbetter performance.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 176 / 193
![Page 188: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/188.jpg)
Performance: good vs bad branch prediction
We now measure the performance of doing these two versions insidetwo nested loops (0x10000 iterations, each).Good Case: branch taken:
> hawopi[167](4.2)> as -o bonzo15.o bonzo15.s> hawopi[168](4.2)> ld -o bonzo15 bonzo15.o> hawopi[169](4.2)> time ./bonzo15real 0m30.091suser 0m29.980ssys 0m0.000s
> hawopi[170](4.2)> echo $?1
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 177 / 193
![Page 189: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/189.jpg)
Performance: good vs bad branch prediction
We now measure the performance of doing these two versions insidetwo nested loops (0x10000 iterations, each).Bad Case: branch NOT taken:
> hawopi[171](4.2)> as -o bonzo15.o bonzo15.s> hawopi[172](4.2)> ld -o bonzo15 bonzo15.o> hawopi[173](4.2)> time ./bonzo15real 0m36.188suser 0m34.900ssys 0m0.090s
> hawopi[174](4.2)> echo $?0
NB: a difference in runtime of ca. 16.8%
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 178 / 193
![Page 190: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/190.jpg)
Processing mispredicted branch instructions.
Predicting “branch taken”, instruction 0x014 is fetched in cycle 3,and instruction 0x018 is fetched in cycle 4.In cycle 4 the branch logic detects that the branch is not takenIt therefore abandons the execution of 0x014 and 0x018 byinjecting bubbles into the pipeline.The result will be as expected, but performance is sub-optimal!
0Adapted from Bryant, Figure 4.62Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 179 / 193
![Page 191: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/191.jpg)
The Current Program Status Register (CPSR)
The Current Program Status Register (CPSR) contains flags(V,Z,N,C) that are set by certain assembler instructions.For example, the CMP R0, R1 instruction compares the values ofregisters R0 and R1 and sets the zero flag (Z) if R0 and R1 are equal.
0See ARM’s Programmer Guide, p. 3-8Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 180 / 193
![Page 192: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/192.jpg)
Caches and Memory Hierarchy
Regs
L1 cache (SRAM)
Main memory(DRAM)
Local secondary storage(local disks)
Larger, slower,
and cheaper (per byte)storagedevices
Remote secondary storage(distributed file systems, Web servers)
Local disks hold files retrieved from disks on remote network servers.
Main memory holds disk blocks retrieved from local disks.
L2 cache (SRAM)
L1 cache holds cache lines retrieved from the L2 cache.
CPU registers hold words retrieved from cache memory.
L2 cache holds cache lines retrieved from L3 cache
L0:
L1:
L2:
L3:
L4:
L5:
Smaller,faster,and
costlier(per byte)storage devices
L3 cache (SRAM)
L3 cache holds cache lines retrieved from memory.
L6:
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 181 / 193
![Page 193: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/193.jpg)
Discussion
As we move from the top of the hierarchy to the bottom, the devicesbecome slower, larger, and less costly per byte.
The main idea of a memory hierarchy is that storage at one levelserves as a cache for storage at the next lower level.
Using the different levels of the memory hierarchy efficiently is crucialto achieving high performance.
Access to levels in the hierarchy can be explicit (for example whenusing OpenCL to program a graphics card), or implicit (in most othercases).
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 182 / 193
![Page 194: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/194.jpg)
Importance of Locality
Being able to look at code and get a qualitative sense of its locality is akey skill for a professional programmer!
Which of the following two version of sum-over-matrix has betterlocality (and performance):
Traversal by rows: Traversal by columns:int i, j; ulong sum;for (i = 0; i<n; i++)for (j = 0; j<n; j++)
sum += arr[i][j];
int i, j; ulong sum;for (j = 0; j<n; j++)for (i = 0; i<n; i++)sum += arr[i][j];
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 183 / 193
![Page 195: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/195.jpg)
The high-level picture
From the main chip of the RPi2 we want to control an (external)device, here an LED.We use one of the GPIO pins to connect the device.Logically we want to send 1 bit to this device to turn it on/off.
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 184 / 193
![Page 196: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/196.jpg)
The low-level picture
Programmatically we achieve that, bymemory-mapping the address space of the GPIOs into user-spacenow, we can directly access the device via memory read/writeswe need to pick-up the meaning of the peripheral registers fromthe BCM2835 peripherals sheet
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 185 / 193
![Page 197: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/197.jpg)
BCM2835 GPIO Peripherals
The meaning of the registers is (see p90ff of BCM2835 ARMperipherals):
GPFSEL: function select registers (3 bits per pin); set it to 0 forinput, 1 for output; 6 more alternate functions availableGPSET: set the corresponding pinGPCLR: clear the corresponding pinGPLEV: return the value of the corresponding pin
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 186 / 193
![Page 198: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/198.jpg)
GPIO Register Assignment
The GPIO has 48 32-bit registers (RPi2; 41 for RPi1).0See BCM Peripherals Manual, Chapter 6, Table 6.1
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 187 / 193
![Page 199: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/199.jpg)
GPIO Register Assignment
GPIO registers (Base address: 0x3F200000)
12:31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 1211:31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 1210:31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 129: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 128: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 127: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 126: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 125: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 124: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 123: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 122: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 121: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 120: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12GPFSEL0
GPFSEL1GPFSEL2GPFSEL3GPFSEL4GPFSEL5
—GPFSET0GPFSET1
—GPFCLR0GPFCLR1
—
0See BCM Peripherals, Chapter 6, Table 6.1Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 188 / 193
![Page 200: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/200.jpg)
Locating the GPFSEL register for pin 47 (ACT)
This table explains the meaning of the bits in register GPFSEL4.0See BCM Peripherals Manual, Chapter 6, Table 6.1
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 189 / 193
![Page 201: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/201.jpg)
Accessing a GPIO Pin
Now we want to control the on-chip LED, called ACT, that normallyindicates activity.The pin number of this device on the RPi2 is: 47We need to calculate registers and bits corresponding to this pinThe GPFSEL register for pin 47 is 4 (per docu, this register coverspins 40-49 (Tab 6-6, p. 94)For each register 3 bits are used to select the function of that pin:bits 0–2 for register 40 etcThus, bits 21–23 cover register 47 (7 × 3)The function that we need to select is OUTPUT, which is encodedas the value 1We need to write the value 0x01 into bits 21–23 of register 4
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 190 / 193
![Page 202: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/202.jpg)
Accessing a GPIO Pin
Now we want to control the on-chip LED, called ACT, that normallyindicates activity.The pin number of this device on the RPi2 is: 47We need to calculate registers and bits corresponding to this pinThe GPFSEL register for pin 47 is 4 (per docu, this register coverspins 40-49 (Tab 6-6, p. 94)For each register 3 bits are used to select the function of that pin:bits 0–2 for register 40 etcThus, bits 21–23 cover register 47 (7 × 3)The function that we need to select is OUTPUT, which is encodedas the value 1We need to write the value 0x01 into bits 21–23 of register 4
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 190 / 193
![Page 203: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/203.jpg)
Accessing a GPIO Pin
Now we want to control the on-chip LED, called ACT, that normallyindicates activity.The pin number of this device on the RPi2 is: 47We need to calculate registers and bits corresponding to this pinThe GPFSEL register for pin 47 is 4 (per docu, this register coverspins 40-49 (Tab 6-6, p. 94)For each register 3 bits are used to select the function of that pin:bits 0–2 for register 40 etcThus, bits 21–23 cover register 47 (7 × 3)The function that we need to select is OUTPUT, which is encodedas the value 1We need to write the value 0x01 into bits 21–23 of register 4
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 190 / 193
![Page 204: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/204.jpg)
Accessing a GPIO Pin
Now we want to control the on-chip LED, called ACT, that normallyindicates activity.The pin number of this device on the RPi2 is: 47We need to calculate registers and bits corresponding to this pinThe GPFSEL register for pin 47 is 4 (per docu, this register coverspins 40-49 (Tab 6-6, p. 94)For each register 3 bits are used to select the function of that pin:bits 0–2 for register 40 etcThus, bits 21–23 cover register 47 (7 × 3)The function that we need to select is OUTPUT, which is encodedas the value 1We need to write the value 0x01 into bits 21–23 of register 4
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 190 / 193
![Page 205: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/205.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 206: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/206.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 207: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/207.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio? Answer: gpio+4How do we read the current value from this register?How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 208: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/208.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 209: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/209.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?Answer: *(gpio+4)How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 210: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/210.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 211: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/211.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?Answer: *(gpio + 4) & ˜(7 << 21)
How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 212: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/212.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?
C code: 7
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 128 24 20 16 12 8 4 0
How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 213: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/213.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?
C code: 7 << 21
0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 028 24 20 16 12 8 4 0
How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 214: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/214.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?
C code: ˜(7 << 21)
1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 128 24 20 16 12 8 4 0
How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 215: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/215.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?
C code: (*(gpio + 4) & ˜(7 << 21))
&1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 128 24 20 16 12 8 4 0
How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 216: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/216.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?
C code: (*(gpio + 4) & ˜(7 << 21))
0 0 028 24 20 16 12 8 4 0
How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 217: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/217.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 218: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/218.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?Answer: (1 << 21)
How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 219: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/219.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?
(*(gpio + 4) & ˜(7 << 21)) | (1 << 21)
0 0 128 24 20 16 12 8 4 0
How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 220: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/220.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 221: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/221.jpg)
Accessing GPIO Pin 47
We want to construct C code to write the value 0x01 into bits21–23 of register 4What’s the address of register 4 relative to the base address ingpio?How do we read the current value from this register?How do we blank out bits 21–23 from this register?How do we get the value 0x01 into bits 21–23 of a 32-bit word?How do we put only these bits into the contents of register 4?
*(gpio + 4) = (*(gpio + 4) & ˜(7 << 21)) | (1 << 21)
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 191 / 193
![Page 222: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/222.jpg)
GPIO programming
The previous slides discussed how to control an LED with a GPIOpin.Similar code is used to use a button as an input device, and toread a bit from the right GPIO pinFor the exam you need to understand the main steps that areneededYou must be able to perform the above steps to explain, e.g. howto set the mode of a pinThe LCD device is controlled in a similar way, but always sending8 bits as the byte to be displayed.You should expect specific code questions about GPIOprogramming, either in C or Assembler
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 192 / 193
![Page 223: F28HS Hardware-Software Interface: Systems Programminghwloidl/Courses/F28HS/slides_SysPrg_2020.pdf · Outline 1 Lecture 1: Introduction to Systems Programming 2 Lecture 2: Systems](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e45b39ab02036534a360f/html5/thumbnails/223.jpg)
Summary
Check the detailed tutorial slides about controlling externaldevicesLook-up the sample sources (both C and Asm) for the tutorialsYou need to have a solid understanding of this code and beable to answer questions about it!Focus on the main concepts that we covered in the lectures:
I Computer architecture, in particular pipeliningI Memory hierarchy, in particular caching
You need to be able to explain how these concepts impactperformance of some sample programs.Be prepared for small-scale coding questions
Hans-Wolfgang Loidl (Heriot-Watt Univ) F28HS Hardware-Software Interface Lec 10: Revision 193 / 193