Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer...
-
date post
22-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer...
![Page 1: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/1.jpg)
Memory Management forSelf-Stabilizing Operating Systems
Shlomi Dolev and Reuven YagelComputer Science DepartmentBen-Gurion University of the Negev,Beer-Sheva, Israel
SSS’05
![Page 2: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/2.jpg)
2
SOS - Motivation• Growing interest in self-* / autonomic
computing systems• Self-stabilizing algorithms/programs
assume hardware and operating system are also stabilizing
• Pentium HALTING problem: “… if the ESP or SP register is 1 when the PUSH instruction is executed, the processor shuts down…”
![Page 3: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/3.jpg)
3
Proposed solution
• To build according to the well defined and understood paradigm of self-stabilization (traditionally used in distributed systems)
• Thereby achieving: trustworthiness, dependability, self-healing, automatic recovery, adaptive systems, …
![Page 4: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/4.jpg)
4
OS Reliability
• Past examples:– Dijkstra, “THE” Multiprogramming System ’68
(Layered Approach)– Denning, Fault tolerant operating systems ’76
(Protection)– KeyKOS ‘85, EROS ’92 (Capabilities, Checkpoints)– Micro-kernel ~‘90, Exo-kernel ’94 (Minimal TCB)
• Current– JHU: The Coyotos Secure Operating System– IBM: K42, Autonomic Computing– SUN: Solaris 10, Predictive Self-Healing– MSR: Singularity, managed code OS
![Page 5: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/5.jpg)
5
A problem has been detected and Windows has been shut down to prevent damage to your computer.
PFN_LIST_CORRUPT
If this is the first time you've seen this error screen,restart your computer. If this screen appears again, followthese steps:
Check to make sure any new hardware or software is properly installed.If this is a new installation, ask your hardware or software manufacturerfor any Windows updates you might need.
If problems continue, disable or remove any newly installed hardwareor software. Disable BIOS memory options such as caching or shadowing.If you need to use Safe Mode to remove or disable components, restartyour computer, press F8 to select Advanced Startup Options, and thenselect Safe Mode.
Technical information:*** STOP: 0x0000004e (0x00000099, 0x00000000, 0x00000000, 0x00000000)
Beginning dump of physical memoryPhysical memory dump complete.Contact your system administrator or technical support group for furtherassistance.
![Page 6: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/6.jpg)
6
Goal: Autonomic Computer
• Following any sequence of transient faults (e.g. soft-errors), the (operating) system converges
• Using self stabilization:– A system can be started in an arbitrary state
and converge to a desired behavior– Using fair composition to run hardware+OS
• BGU: Self-stabilizing systems, tools & paradigms– Microprocessor [DH’04]– Operating System [DY’04]– Compiler [DH’05]– Framework: autonomic recoverer [BDK’03]– Middleware: File System [DK’02], Group Comm. [DS’01]
![Page 7: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/7.jpg)
7
SOS - Directions
• Black-box– Take existing (Desktop/Real-time) OS– Add stabilization layer– Detailed formal specification needed
• Carefully tailoring a tiny kernel– Processor scheduling [SAACS04]– Memory management [SSS05]– Device drivers
![Page 8: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/8.jpg)
8
Method
• Additional requirements for each OS function
• Evolve self-stabilizing solutions that follow computer-architecture/OS progress
• Detailed proof for self-stabilization of algorithms AND implementation
• Processor (e.g. Pentium) instruction manual defines a transition function– Don’t rely on existing compilers
![Page 9: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/9.jpg)
9
Assumptions
• Whole soft-state can be corrupted (e.g. Program Counter)
• Stabilization of other layers
![Page 10: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/10.jpg)
10
Solution Foundations
• Program loading & process scheduling
• Code portions in ROM• Truly non-maskable interrupt
and watchdog architecture• Periodic reset reinstall &
execute (weak)• continuous monitoring and
consistency enforcement
![Page 11: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/11.jpg)
11
Memory Management: Requirements
• Consistency of memory hierarchy
•Self-stabilization preservation
App1
App2
AppN
OS
HW
![Page 12: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/12.jpg)
12
• Allocate whole available memory to the running application
• Consistency: kept all the time• Stabil. Preserving: no mutual sharing
Solution 1: Full Swapping
App1 OS
App1
RAM
Disk App2 App3 App…
App2
![Page 13: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/13.jpg)
13
Solution 2: Fixed Partitioning
• Fixed slots in main memory for several programs
c5 d5 d2c2
c1 c2 c3 c4 c5
d1 d2 d3 d4 d5
CD-ROM
Disk
OS
c3
d3
RAM
![Page 14: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/14.jpg)
14
• Consistency: through continuous checks and consistency establishmentof OS data structures
• Stabil. Preserving: via segmentation + code refreshing
Solution 2: Fixed Partitioning
# F R
P1 2 4
P2 -1 3
...
# P
F1
F2 1
…
Process Table
Fram
e T
able
![Page 15: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/15.jpg)
15
Solution 3: Dynamic allocations
• We want to allow applications to dynamically allocate memory
• How can we avoid a process that (faultily) allocates the whole available memory?
• What happens if a process “forgets” about its ownership?
• Leasing
![Page 16: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/16.jpg)
16
Solution 3: Dynamic allocations
# F R
P1 -1
P2 -1
...
# P
F1
F2
…
Process Table
L
Fram
e
Table
Request
Queue P2
3
3
1
1
2
-1
2
2
1
1
0
0
2
-1
-1
1
9
0
• Consistency: dynamic memory is temporarily leased & garbage collected, verification of PCB & queue
• Stabil. Preserving: access through special segment selector
![Page 17: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/17.jpg)
17
Implementation
• Pentium in real-mode, single address space– Simple– common for sensors/microcontrollers– Protected mode & VM mechanisms
can be handled accordingly• Code size: ~1-2K
– TinyOS ~1K– VxWorks ~102K– Linux kernel ~4M
• Fault injection with the Bochs simulator
![Page 18: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/18.jpg)
18
Implementation1 ) MM_FindFrame: ;(PT, FT, i); al contains current frame suggestion; nf <- (frmae[PT[i]] + 1) modulo M2 ) and byte [bx+FRAME_COL], FRAME_MASK3 ) inc al4 ) and al, FRAME_MASK
;Check all slots for an empty one.;while nf != frame[PT[i]] and FT[nf] != nil5 ) while1:6 ) cmp al, [bx+FRAME_COL]7 ) jz endwhile18 ) lea si, [frames]9 ) add si, ax10) mov dl, [si]11) cmp dl, NULL_PROCESS12) jz endwhile1; do nf <- (nf + 1) modulo M13) inc al14) and al, FRAME_MASK15) jmp while116) endwhile1:
; return found frame number in register 'al'17) ret
![Page 19: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/19.jpg)
19
![Page 20: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/20.jpg)
20
Future Work
• I/O device drivers– Major cause of operating systems
failures– Co-operation of more than one
microprocessor– Detailed driver / General monitoring
layer• Gather the different parts• Micro-kernel / VMM
![Page 21: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/21.jpg)
21
Conclusion
• The work shows theoretical and practical ways to achieve the goal of a self-stabilizing OS
• The (system) research community & industry can benefit from the foundation of self-stabilization
• http://www.cs.bgu.ac.il/~yagel/sos
![Page 22: Memory Management for Self-Stabilizing Operating Systems Shlomi Dolev and Reuven Yagel Computer Science Department Ben-Gurion University of the Negev,](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a55b/html5/thumbnails/22.jpg)
22
Space Vehicle Failure
• …The Spirit rover has a radiation-hardened R6000 CPU from Lockheed-Martin Federal Systems…The operating system is Wind River Systems' Vx-Works..
• …attempted to allocate more files than the RAM-based directory structure could accommodate. That caused an exception, which caused the task that had attempted the allocation to be suspended…
• …Spirit fell silent, alone on the emptiness of Mars…
http://www.eetimes.com/story/OEG20040220S0046
…the rover was in fact listening and rebooting, the team commanded Spirit to reboot without mounting the flash file system…But just in case, the team is working on an exception-handler routine that will more gracefully recover from an allocation failure