David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's...
Transcript of David DiPaola Independent Study Reportvcss544/DiPaola report.pdf · save mode. The data sheet's...
David DiPaolaIndependent Study Report
During this quarter, myself and David Larsen worked to port the Xinu embedded educational
operating system to various ARM platforms. Along the way, I learned not only about the many things
that go into even the simplest of operating systems, but how they are implemented. While previous
coursework in Operating Systems 1 gave me the theoretical background, this course gave me a sense of
what it is like to put these concepts into practice.
Our first task was to assess the current state of a partially completed ARM port. The code was
designed to support an ARM7 IPRE Fluke robot controller board, which has a CPU with a similar
architecture to the ARM11 in the Raspberry Pi and the ARM9 that we were simulating in QEMU. From
initial tests, we were able to find out that certain essential low-level functions such did not perform
properly: the context switching routine --which is
used by the process scheduler to change the
currently running process-- and the interrupt
handler –used by devices to notify the CPU of
events. Since they require very carefully
constructed assembly language routines, these
components are among the most challenging to
implement in this Operating System. In addition to this, we also did not have a means to boot our code
on the Raspberry Pi hardware. Thus, the other major task was to figure out how the Raspberry Pi
loaded and executed code. Finally, as we would come to learn in greater detail over the course of the
project, the Raspberry Pi's hardware peripherals diverged from both the QEMU emulated system and
the original Fluke hardware.
Illustration 1: A Raspberry Pi Model B.
The first order of business was getting the cross-compiler and emulator working. Previous work
by Professors Jeremy and Travis Brown had established a suitable development environment on a
private server that could be logged into remotely. Since this process was very technical and that we
were also going to need to get this environment working on lab machines, David Larsen set to work on
developing a script that would build the cross-compiler. While he worked on that script, I used the
server and set to work on learning how to operate QEMU and GDB. While I did have previous
experience with GDB from the Computer Science 4 course, I did not use it to the depth that I did for
this project. I had learned from Professor Travis that the calibration for the system's timer was off, so I
worked on fixing that as a simple entry into the workings of Xinu. My first naive approach was to
simply compare how long it took for the timer to fire and how long a stopwatch took to measure the
same period of time. While this did allow me to recalibrate the timer, it only did so for machines that
ran the emulator at the same speed as the one I was using did. Later on, as I started writing device
drivers, I realized that there was a fair bit more that was going on than I had initially considered.
Having now looked at the code briefly, I set about
finding the source of the context switching issues we'd
been having. I started by tracing the process of how Xinu
handles these operations and making a graph that would
be easier to digest mentally. After this, I met with David
Larsen on the side to discuss our understanding of the
problem and how we were going to approach it. After
clarifying a few things with him, we both started
working on an fix since there would be little else to do if
this functionality didn't work. He was able to complete
his work before I finished, and we both started working on different tasks to finish the project.
Illustration 2: The interaction between IRQs and context switching.
Since I had some experience working on embedded hardware platforms in the past, it was
deemed that my next goal was to get the code we had to boot on the Raspberry Pi. My first job was to
determine just how the Raspberry Pi boots up. Information regarding these lower-level processes tend
to be scattered all about, and this case was no exception. After searching for a few hours, I was able to
piece together that the boot process was actually fairly simple:
1. The GPU starts up, mounts FAT32 partition on SD card
2. The GPU reads, runs bootcode.bin and start.elf from the SD card
3. The GPU reads a configuration file config.txt from the SD card if it exists
4. The GPU reads kernel.img from the SD card, writes it to 0x00008000 in RAM
5. The GPU starts the CPU executing the code
Initially, we had mistakenly believed that the boot code came from the first 512 bytes of the SD card,
similarly to how a PC reads from the boot sector. It was also odd to learn that the ARM CPU is actually
more like a co-processor for the GPU and that while the ARM CPU is somewhat older, the GPU
actually is of a cutting-edge design. After figuring these basic parameters out, I had to figure out a way
to let Xinu tell us that it had booted without having access to nearly any of the ARM's input or output
devices. The solution came in the form of a single LED on the board, which can be toggled through a
GPIO port. After finding some example code on-line, I was able to craft a very rudimentary routine in
ARM assembler to turn on this LED and placed it into the boot sequence. After a bit of work to get it to
compile, we had our first signs of booting:
Illustration 3: A successful boot!
Having successfully booted code, the next steps were to determine what input and output
facilities existed so we could interact with the device. We also needed a timer so that Xinu could switch
tasks. Initially, I had assumed that all ARM devices would share a common set of peripherals produced
by ARM. Therefore, we could simply change the device's bus location in a few source files and have a
working system. Unfortunately, that did not turn out to be the case. The System on a Chip (SoC) that
was in the Raspberry Pi had hardware that was almost completely different from the QEMU system or
the Fluke board. I learned that this is fairly typical of ARM systems. Unlike the PC-based platforms
that we commonly use in the Computer Science department, ARM computers rarely share the same
devices and memory layout. This stems from the nature of how ARM does business. Rather than
producing chips for buyers directly, ARM's business model is based on selling designs for CPUs and
peripheral devices like serial controllers, timers, interrupt controllers, and the like. Because of this, the
companies that license these designs and produce the actual silicon can choose how they want their
devices to be built, which is ideal for embedded devices. The incompatibilities that stem from this
configurability mean that supporting multiple ARM chips with the same code base can be tricky at
times.
The next task was to get a serial port operating so that we could get a greater depth of
debugging information than a blinking LED
could provide. After finding more example
code, I was able to initialize one of the on-
board serial ports and get basic booting
information using a simple polling routine
that watched the status of the serial port
device constantly. With this completed, I set
out to get interrupts and the timer Illustration 4: A small piece of UART initialization code.
functioning. The first roadblock was the Xinu code itself. There was a large amount of configuration
information as well as device drivers that were built more with the Fluke board or QEMU in mind.
Since code involving interrupts can be a very unstable before it is completed, I opted to create a
separate code base where the only code that would run would be the drivers and tests I had written.
Writing the serial port driver in the test environment was a simple matter of organizing the code
for it, but getting the timer to work was a bit more difficult. After reading the documentation about the
SP804 timer in the SoC, I was able to piece together some simple driver code that worked well. After
completing the driver code, I stumbled upon a small note in the SoC's data sheet that stated that the
SP804 timer would not work accurately because it would slow down when the system went into power-
save mode. The data sheet's recommendation was to use the less-advanced, but simpler, System Timer
for applications that needed accurate timing. This timer also turned out to have some strange
unmentioned quirks such as having half of its timers used by the GPU. Using one of the left over timer
slots, work proceeded smoothly. The next problem faced was getting interrupts to work. In a past
Systems Programming 1 course, I had used a Vectored Interrupt Controller (VIC) which takes care of
some of the details of interrupt processing for us. With the chip in the Raspberry Pi, it seems that the
designers had left out that device and instead used a much more involved method of manually checking
each bit of a status register to see if a peripheral had triggered the interrupt. Since Xinu expects a VIC
to be present, I wrote a simple VIC emulator that determines the interrupt source and calls the
appropriate interrupt handler.
Illustration 5: Interrupt organization on the Raspberry Pi's SoC.
Because this code was race condition sensitive, much care had to be taken to implement it properly.
After dealing with that, I was able to get the timer to trigger the LED to blink without the main code
having to directly take any action. The next challenge was getting the serial port to work via interrupts
instead of the simplistic polling method used earlier. It was soon discovered that this device also had
latent bugs, mostly in the data sheet's documentation. These errors were in some critical status registers
which made acknowledging interrupts an ambiguous process. After fighting with the device for some
time, it became apparent that it would be arduous to determine exactly how to deal with interrupts.
After reading up on the other serial port present, I realized that it was actually much simpler, easier to
configure, closer to an industry standard, and had higher performance. Interrupt functionality for this
device was fairly simple to implement and I had a demo that blinked the LED with the timer and sent
our characters on the UART via interrupts in short order.
The final stage of the port is integrating all of the driver code in the stand alone test framework.
Moving the serial port code over was a simple task because Xinu handles serial ports in a modular way:
Illustration 6: The text output of a simple demo to test interrupt stability by sending large amounts of text over the UART and blinking the LED when the timer goes off.
each device has its own directory and configuration file. Timers and interrupt handling, however, do
not work in such a flexible manner. Currently, work is under way to remove all of the old interrupt and
timer code, replace it with the new routines from the test framework, and test the complete system.
In summary, porting an Operating System is a demanding task. It requires large blocks of time
to engage in properly, intimate knowledge of hardware details, and also a broad knowledge of how
various parts of the operating system interact with each other. Additionally, work was made more
difficult by a the general lack of correct documentation. For most devices, I had to reference source
code in other projects in order to figure out what was really going on because of the errors in the
manufacturer's data sheet. Due to my experiences in this project, I now value documentation to a much
greater degree.