David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's...

7
David DiPaola Independent Study Report During this quarter, myself and David Larsen worked to port the Xinu embedded educational operating system to various ARM platforms. Along the way, I learned not only about the many things that go into even the simplest of operating systems, but how they are implemented. While previous coursework in Operating Systems 1 gave me the theoretical background, this course gave me a sense of what it is like to put these concepts into practice. Our first task was to assess the current state of a partially completed ARM port. The code was designed to support an ARM7 IPRE Fluke robot controller board, which has a CPU with a similar architecture to the ARM11 in the Raspberry Pi and the ARM9 that we were simulating in QEMU. From initial tests, we were able to find out that certain essential low-level functions such did not perform properly: the context switching routine --which is used by the process scheduler to change the currently running process-- and the interrupt handler –used by devices to notify the CPU of events. Since they require very carefully constructed assembly language routines, these components are among the most challenging to implement in this Operating System. In addition to this, we also did not have a means to boot our code on the Raspberry Pi hardware. Thus, the other major task was to figure out how the Raspberry Pi loaded and executed code. Finally, as we would come to learn in greater detail over the course of the project, the Raspberry Pi's hardware peripherals diverged from both the QEMU emulated system and the original Fluke hardware. Illustration 1: A Raspberry Pi Model B.

Transcript of David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's...

Page 1: David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's recommendation was to use the less-advanced, but simpler, System Timer for applications

David DiPaolaIndependent Study Report

During this quarter, myself and David Larsen worked to port the Xinu embedded educational

operating system to various ARM platforms. Along the way, I learned not only about the many things

that go into even the simplest of operating systems, but how they are implemented. While previous

coursework in Operating Systems 1 gave me the theoretical background, this course gave me a sense of

what it is like to put these concepts into practice.

Our first task was to assess the current state of a partially completed ARM port. The code was

designed to support an ARM7 IPRE Fluke robot controller board, which has a CPU with a similar

architecture to the ARM11 in the Raspberry Pi and the ARM9 that we were simulating in QEMU. From

initial tests, we were able to find out that certain essential low-level functions such did not perform

properly: the context switching routine --which is

used by the process scheduler to change the

currently running process-- and the interrupt

handler –used by devices to notify the CPU of

events. Since they require very carefully

constructed assembly language routines, these

components are among the most challenging to

implement in this Operating System. In addition to this, we also did not have a means to boot our code

on the Raspberry Pi hardware. Thus, the other major task was to figure out how the Raspberry Pi

loaded and executed code. Finally, as we would come to learn in greater detail over the course of the

project, the Raspberry Pi's hardware peripherals diverged from both the QEMU emulated system and

the original Fluke hardware.

Illustration 1: A Raspberry Pi Model B.

Page 2: David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's recommendation was to use the less-advanced, but simpler, System Timer for applications

The first order of business was getting the cross-compiler and emulator working. Previous work

by Professors Jeremy and Travis Brown had established a suitable development environment on a

private server that could be logged into remotely. Since this process was very technical and that we

were also going to need to get this environment working on lab machines, David Larsen set to work on

developing a script that would build the cross-compiler. While he worked on that script, I used the

server and set to work on learning how to operate QEMU and GDB. While I did have previous

experience with GDB from the Computer Science 4 course, I did not use it to the depth that I did for

this project. I had learned from Professor Travis that the calibration for the system's timer was off, so I

worked on fixing that as a simple entry into the workings of Xinu. My first naive approach was to

simply compare how long it took for the timer to fire and how long a stopwatch took to measure the

same period of time. While this did allow me to recalibrate the timer, it only did so for machines that

ran the emulator at the same speed as the one I was using did. Later on, as I started writing device

drivers, I realized that there was a fair bit more that was going on than I had initially considered.

Having now looked at the code briefly, I set about

finding the source of the context switching issues we'd

been having. I started by tracing the process of how Xinu

handles these operations and making a graph that would

be easier to digest mentally. After this, I met with David

Larsen on the side to discuss our understanding of the

problem and how we were going to approach it. After

clarifying a few things with him, we both started

working on an fix since there would be little else to do if

this functionality didn't work. He was able to complete

his work before I finished, and we both started working on different tasks to finish the project.

Illustration 2: The interaction between IRQs and context switching.

Page 3: David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's recommendation was to use the less-advanced, but simpler, System Timer for applications

Since I had some experience working on embedded hardware platforms in the past, it was

deemed that my next goal was to get the code we had to boot on the Raspberry Pi. My first job was to

determine just how the Raspberry Pi boots up. Information regarding these lower-level processes tend

to be scattered all about, and this case was no exception. After searching for a few hours, I was able to

piece together that the boot process was actually fairly simple:

1. The GPU starts up, mounts FAT32 partition on SD card

2. The GPU reads, runs bootcode.bin and start.elf from the SD card

3. The GPU reads a configuration file config.txt from the SD card if it exists

4. The GPU reads kernel.img from the SD card, writes it to 0x00008000 in RAM

5. The GPU starts the CPU executing the code

Initially, we had mistakenly believed that the boot code came from the first 512 bytes of the SD card,

similarly to how a PC reads from the boot sector. It was also odd to learn that the ARM CPU is actually

more like a co-processor for the GPU and that while the ARM CPU is somewhat older, the GPU

actually is of a cutting-edge design. After figuring these basic parameters out, I had to figure out a way

to let Xinu tell us that it had booted without having access to nearly any of the ARM's input or output

devices. The solution came in the form of a single LED on the board, which can be toggled through a

GPIO port. After finding some example code on-line, I was able to craft a very rudimentary routine in

ARM assembler to turn on this LED and placed it into the boot sequence. After a bit of work to get it to

compile, we had our first signs of booting:

Illustration 3: A successful boot!

Page 4: David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's recommendation was to use the less-advanced, but simpler, System Timer for applications

Having successfully booted code, the next steps were to determine what input and output

facilities existed so we could interact with the device. We also needed a timer so that Xinu could switch

tasks. Initially, I had assumed that all ARM devices would share a common set of peripherals produced

by ARM. Therefore, we could simply change the device's bus location in a few source files and have a

working system. Unfortunately, that did not turn out to be the case. The System on a Chip (SoC) that

was in the Raspberry Pi had hardware that was almost completely different from the QEMU system or

the Fluke board. I learned that this is fairly typical of ARM systems. Unlike the PC-based platforms

that we commonly use in the Computer Science department, ARM computers rarely share the same

devices and memory layout. This stems from the nature of how ARM does business. Rather than

producing chips for buyers directly, ARM's business model is based on selling designs for CPUs and

peripheral devices like serial controllers, timers, interrupt controllers, and the like. Because of this, the

companies that license these designs and produce the actual silicon can choose how they want their

devices to be built, which is ideal for embedded devices. The incompatibilities that stem from this

configurability mean that supporting multiple ARM chips with the same code base can be tricky at

times.

The next task was to get a serial port operating so that we could get a greater depth of

debugging information than a blinking LED

could provide. After finding more example

code, I was able to initialize one of the on-

board serial ports and get basic booting

information using a simple polling routine

that watched the status of the serial port

device constantly. With this completed, I set

out to get interrupts and the timer Illustration 4: A small piece of UART initialization code.

Page 5: David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's recommendation was to use the less-advanced, but simpler, System Timer for applications

functioning. The first roadblock was the Xinu code itself. There was a large amount of configuration

information as well as device drivers that were built more with the Fluke board or QEMU in mind.

Since code involving interrupts can be a very unstable before it is completed, I opted to create a

separate code base where the only code that would run would be the drivers and tests I had written.

Writing the serial port driver in the test environment was a simple matter of organizing the code

for it, but getting the timer to work was a bit more difficult. After reading the documentation about the

SP804 timer in the SoC, I was able to piece together some simple driver code that worked well. After

completing the driver code, I stumbled upon a small note in the SoC's data sheet that stated that the

SP804 timer would not work accurately because it would slow down when the system went into power-

save mode. The data sheet's recommendation was to use the less-advanced, but simpler, System Timer

for applications that needed accurate timing. This timer also turned out to have some strange

unmentioned quirks such as having half of its timers used by the GPU. Using one of the left over timer

slots, work proceeded smoothly. The next problem faced was getting interrupts to work. In a past

Systems Programming 1 course, I had used a Vectored Interrupt Controller (VIC) which takes care of

some of the details of interrupt processing for us. With the chip in the Raspberry Pi, it seems that the

designers had left out that device and instead used a much more involved method of manually checking

each bit of a status register to see if a peripheral had triggered the interrupt. Since Xinu expects a VIC

to be present, I wrote a simple VIC emulator that determines the interrupt source and calls the

appropriate interrupt handler.

Illustration 5: Interrupt organization on the Raspberry Pi's SoC.

Page 6: David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's recommendation was to use the less-advanced, but simpler, System Timer for applications

Because this code was race condition sensitive, much care had to be taken to implement it properly.

After dealing with that, I was able to get the timer to trigger the LED to blink without the main code

having to directly take any action. The next challenge was getting the serial port to work via interrupts

instead of the simplistic polling method used earlier. It was soon discovered that this device also had

latent bugs, mostly in the data sheet's documentation. These errors were in some critical status registers

which made acknowledging interrupts an ambiguous process. After fighting with the device for some

time, it became apparent that it would be arduous to determine exactly how to deal with interrupts.

After reading up on the other serial port present, I realized that it was actually much simpler, easier to

configure, closer to an industry standard, and had higher performance. Interrupt functionality for this

device was fairly simple to implement and I had a demo that blinked the LED with the timer and sent

our characters on the UART via interrupts in short order.

The final stage of the port is integrating all of the driver code in the stand alone test framework.

Moving the serial port code over was a simple task because Xinu handles serial ports in a modular way:

Illustration 6: The text output of a simple demo to test interrupt stability by sending large amounts of text over the UART and blinking the LED when the timer goes off.

Page 7: David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's recommendation was to use the less-advanced, but simpler, System Timer for applications

each device has its own directory and configuration file. Timers and interrupt handling, however, do

not work in such a flexible manner. Currently, work is under way to remove all of the old interrupt and

timer code, replace it with the new routines from the test framework, and test the complete system.

In summary, porting an Operating System is a demanding task. It requires large blocks of time

to engage in properly, intimate knowledge of hardware details, and also a broad knowledge of how

various parts of the operating system interact with each other. Additionally, work was made more

difficult by a the general lack of correct documentation. For most devices, I had to reference source

code in other projects in order to figure out what was really going on because of the errors in the

manufacturer's data sheet. Due to my experiences in this project, I now value documentation to a much

greater degree.