1 Kernfach System Software WS04/05 P. Reali M. Corti.

Post on 30-Mar-2015

222 views 5 download

Tags:

Transcript of 1 Kernfach System Software WS04/05 P. Reali M. Corti.

1

KernfachSystem Software

WS04/05

P. Reali

M. Corti

System-Software WS 04/05

2

© P. Reali / M. Corti

IntroductionAdmin

Lecture– Mo 13-14 IFW A 36– We 10-12 IFW A 36

Exercises– Always on Thursday

14-15 IFW A34 C. Tuduce (E)14-15 IFW C42 V. Naoumov (E)15-16 IFW A32.1 I. Chihaia (E)15-16 RZ F21 C. Tuduce (E)16-17 IFW A34 T. Frey (E)16-17 IFW A32.1 K. Skoupý (E)

System-Software WS 04/05

3

© P. Reali / M. Corti

IntroductionAdditional Info

Internet– Homepage http://www.cs.inf.ethz.ch/ssw/ – Inforum vis site

Textbooks & Co.– Lecture Slides– A. Tanenbaum, Modern Operating Systems– Silberschatz / Gavin, Operating Systems Concepts– Selected articles and book chapters

System-Software WS 04/05

4

© P. Reali / M. Corti

IntroductionExercises

Exercises are optional(feel free to shoot yourself in the foot)

– Weekly paper exercisestest the knowledge acquired in the lectureidentify troubles earlyexercise questions are similar to the exam ones

– Monthly programming assignmentfeel the gap between theory and practice

System-Software WS 04/05

5

© P. Reali / M. Corti

IntroductionExam

Sometimes in March 2005 Written, 3 hours Allowed help

– 2 A4 page summary– calculator

Official Q&A session 2 weeks before the exam

System-Software WS 04/05

6

© P. Reali / M. Corti

IntroductionLecture Goals

Operating System Concepts– bottom-up approach– no operating system course– learn most important concepts– feel the complexity of operating systems

there‘s no silver-bullet!

Basic knowledge for other lectures / term assignments– Compilerbau– Component Software– ....– OS-related assignments

System-Software WS 04/05

7

© P. Reali / M. Corti

IntroductionWhat is an operating system?

An operating system has two goals: Provide an abstraction of the hardware

– ABI (application binary interface)– API (application programming interface)– hide details

Manage resources– time and space multiplexing– resource protection

System-Software WS 04/05

8

© P. Reali / M. Corti

IntroductionOperating system target machines

Targets mainframes servers multiprocessors desktops real-time systems embedded systems

Different goals and requirements!

memory efficiency reaction time abstraction level resources security ...

System-Software WS 04/05

9

© P. Reali / M. Corti

IntroductionMemory vs. Speed Tradeoff

Example: retrieve a list of namesmemory time

1. Array Nn N2. List N(n+4) N/23. Bin. Tree N(n+8)

log(N)4. Hash Table 3Nn 1

N = # namesn = name length

System-Software WS 04/05

10

© P. Reali / M. Corti

IntroductionOperating System as resource manager

... in the beginning was the hardware!

Most relevant resources:

CPU Memory Storage Network

System-Software WS 04/05

11

© P. Reali / M. Corti

IntroductionLecture Topics

MemoryCPU Network

Abs

trac

tion

leve

l

Disk

Scheduling

Virtual Memory

Demand Paging

Thread

Process

Coroutine

Memory Management

Garbage CollectionConcurrencySupport

File System

Object-OrientedRuntime Support

DistributedFile-System

DistributedObject-System

Virtual Machine

Runtime support

System-Software WS 04/05

12

© P. Reali / M. Corti

IntroductionA word of warning....

Most of the topics may seem simple.....

.... and in fact they are!

Problems are mostly due to: complexity when integrating system low-level („bit fiddling“) details bootstrapping (X needs Y, Y needs X)

System-Software WS 04/05

13

© P. Reali / M. Corti

Locks

Storage

Modules

Processor

Memory

Interrupts

ActiveTraps

TimersSMP

IntroductionBootstrapping (Aos)

Leve

l

System-Software WS 04/05

14

© P. Reali / M. Corti

IntroductionLecture Topics

Overview

Runtime Support

Virtual Addressing

Memory Management

Distributed Obj. System

Concurrency

Concurrency

Disc / Filesystem

Case Study: JVM

Oct

‘04

Nov

‘04

Dec

‘04

Jan

‘05

Feb

‘05

System-Software WS 04/05

15

© P. Reali / M. Corti

Run-time SupportOverview

Support for programming abstractions– Procedures

calling conventions parameters

– Object-Oriented Model objects methods (dynamic dispatching)

– Exceptions Handling– ... more ...

System-Software WS 04/05

16

© P. Reali / M. Corti

Call a.P Call b.Q Call b.q Call b.q Return b.q Return b.q Call c.R Return c.R Return b.QReturn a.P

a.P

b.Q

b.q

1

1

Run-time SupportApplication Binary Interface (ABI)

Object a, b, c, … with methods P, Q, R, … and internal procedures p, q, r, …

Call SequenceStack

Pointer (SP)

ProcedureActivation

Frame (PAF)a.P

b.Q

b.q

b.q

2

2

a.P

b.Q

3

3

a.P

b.Q

c.R

4

4

Stack

System-Software WS 04/05

17

© P. Reali / M. Corti

locals

params

Run-time SupportProcedure Activation Frame

DynamicLink

FramePointer (FP)

Save RegistersPush ParametersSave PCBranchSave FPFP := SPAllocate Locals

Remove Locals Restore FPRestore PCRemove ParametersRestore Registers

FP‘PC

StackPointer (SP)

Ca

ller

Ca

llee

CallerFrame

Call

Return

Ca

ller

System-Software WS 04/05

18

© P. Reali / M. Corti

Run-time SupportProcedure Activation Frame, Optimizations

Many optimizations are possible– use registers instead of stack– register windows– procedure inlining– use SP instead of FP addressing

System-Software WS 04/05

19

© P. Reali / M. Corti

Run-time SupportProcedure Activation Frame (Oberon / x86)

push paramscall P push fp

mov fp, spsub sp, size(locals)

mov sp, fppop fpret size(params)

Caller Callee

...

push pcpc := P

pop pcadd sp,size(params)

System-Software WS 04/05

20

© P. Reali / M. Corti

Run-time SupportCalling Convention

Convention between caller and callee– how are parameters passed

data layout left-to-right, right-to-left registers register window

– stack layout dynamic link static link

– register saving reserved registers

System-Software WS 04/05

21

© P. Reali / M. Corti

Run-time SupportCalling Convention (Oberon)

Parameter passing:– on stack (exception: Oberon/PPC uses registers)– left-to-right– self (methods only) as last parameter– structs and arrays passed as reference, value-parameters

copied by the callee Stack

– dynamic link– static link as last parameter (for local procedures)

Registers– saved by caller

System-Software WS 04/05

22

© P. Reali / M. Corti

Run-time SupportCalling Convention (C)

Parameter passing:– on stack – right-to-left– arrays passed as reference (arrays are pointers!)

Stack– dynamic link

Registers– some saved by caller

System-Software WS 04/05

23

© P. Reali / M. Corti

Run-time SupportCalling Convention (Java)

Parameter passing– left-to-right– self as first parameter– parameters pushed as operands– parameters accessed as locals– access through symbolic, type-safe operations

System-Software WS 04/05

24

© P. Reali / M. Corti

Run-time SupportObject Oriented Support, Definitions

Obj x = new ObjA();

• static type of x is Obj• dynamic type of x is ObjA

x compiled as being compatible with Obj, but executes as ObjA.

static and dynamic type can be different the system must keep track of the

dynamic type with an hidden„type descriptor“

Obj0

Obj

ObjA

ObjB

Class Hierarchy

Polymorphism

System-Software WS 04/05

25

© P. Reali / M. Corti

Run-Time SupportPolymorphism

VARt: Triangle;s: Square;o: Figure;

BEGINt.Draw();s.Draw();o.Draw();

END;

WHILE p # NIL DOp.Draw(); p := p.next

END;

Type is discovered at runtime!

Type is statically known!

System-Software WS 04/05

26

© P. Reali / M. Corti

Run-time SupportObject Oriented Support, Definitions

Obj x = new ObjA();

if (x IS ObjA) { ... }// type test

ObjA y = (ObjA)x// type cast

x = y;// type coercion// (automatic convertion)

Obj0

Obj

ObjA

ObjB

Class Hierarchy

System-Software WS 04/05

27

© P. Reali / M. Corti

Run-time SupportObject Oriented Support (High-level Java)

.... a IS T ....

if (a != null) {Class c = a.getClass();while ((c != null) && (c != T)) {

c = c.getSuperclass();}return c == T;

} else {return false;

}

Type Test Implementation

System-Software WS 04/05

28

© P. Reali / M. Corti

Run-Time SupportType Descriptors

struct TypeDescriptor {int level;type[] extensions;method[] methods;

}

class Object {TypeDescriptor type;

}

many type-descriptor layouts are possible

layout depends on the optimizations choosen

System-Software WS 04/05

29

© P. Reali / M. Corti

Run-Time SupportType Tests and Casts

0

1

2

Obj0

Obj

ObjAObjB

0: Obj01: NIL2: NIL3: NIL

TD(Obj0)

TD(Obj)

0: Obj01: Obj2: NIL3: NIL

0: Obj01: Obj2: ObjA3: NIL

TD(ObjA)

(obj IS T)

obj.type.extension[ T.level ] = T

mov EAX, objmov EAX, -4[EAX]cmp T, -4 * T.level - 8[EAX]bne ....

“extension level”

System-Software WS 04/05

30

© P. Reali / M. Corti

Run-time SupportObject Oriented Support (High-level Java)

.... a.M(.....) ....

Class[] parTypes = new Class[params.Length()];for (int i=0; i< params.Length(); i++) {

parTypes[i] = params[i].getClass();}Class c = a.getClass();Method m = c.getDeclaredMethod(“M”, parTypes);res = m.invoke(self, parValues);

Method Call Implementation

Use method implementation for the actual

class(dynamic type)

System-Software WS 04/05

31

© P. Reali / M. Corti

Disadvantages:• memory usage• bad integration (explicit self)• non constantAdvantages:• instance bound• can be changed at run-time

Run-Time SupportHandlers / Function Pointers

TYPESomeType = POINTER TO SomeTypeDesc;Handler = PROCEDURE (self: SomeType; param: Par);SomeTypeDesc = RECORD

handler: Handler;next: SomeType;

END

handler

next handler

next handler

next

PROC Q

PROC R

root

System-Software WS 04/05

32

© P. Reali / M. Corti

Run-Time SupportMethod tables (vtables)

TYPEA = OBJECT

PROCEDURE M0;PROCEDURE M1;

END A;

B = OBJECT (A)PROCEDURE M0;PROCEDURE M2;

END B;

B.M0 overrides A.M0

B.M2 is new

0: A.M0

1: A.M1

A.MethodTable

0: A.M0

1: A.M1

B.MethodTable

2: B.M2

B.M0

Idea:have a per-type table of function pointers.

• New methods add a new entry in the method table• Overrides replace an entry in the method table• Each method has an unique entry number

System-Software WS 04/05

33

© P. Reali / M. Corti

Run-Time SupportMethod tables

TYPEA = OBJECT

PROCEDURE M0;PROCEDURE M1;

END A;

B = OBJECT (A)PROCEDURE M0;PROCEDURE M2;

END B;

0: A.M0

1: A.M1

A.MethodTable

0: A.M0

1: A.M1

B.MethodTable0: B.M0

2: B.M2

Virtual Dispatch

o.M0;

call o.Type.Methods[0]

mov eax, VALUE(o)mov eax, type[eax]mov eax, off + 4*mno[eax]call eax

o

Fields

Type

System-Software WS 04/05

34

© P. Reali / M. Corti

Run-Time SupportOberon Type Descriptors

obj size

obj fields

ext table

mth table

type name

type desc

td size

type desc

• method table• superclass table• pointers in object for GC

type descriptor is also an object!

type desc

ptr offsetsfor garbage collection

for object allocation

for type checks

for method invocation

System-Software WS 04/05

35

© P. Reali / M. Corti

Run-Time SupportInterfaces, itables

interface A {void m();

}

interface B {void p();

}

Object x;A y = (A)x;

y.m();

does x implement A?

x has an method table (itable) for each

implemented interface

multiple itables:how is the right itable

discovered?

System-Software WS 04/05

36

© P. Reali / M. Corti

Run-Time SupportInterface support

How to retrieve the right method table (if any)? Global table indexed by [class, interface] Local (per type) table / list indexed by

[interface]

Many optimizations are availableuse the usual trick:

enumerate interfaces

System-Software WS 04/05

37

© P. Reali / M. Corti

Run-Time SupportInterface support (I)

methodtable(vtable)

interfaces

methodtable(itable)

Intf0

methodtable(itable)

Intf7

Type Descriptor

Intf0 y = (Intf0)x;y.M();

interface i = x.type.interfaces;while ((i != null) && (i != Intf0) {

i = i.next;}if (i != null) i.method[mth_nr]();

Call is expensive because requires traversing a list: O(N) complexity

System-Software WS 04/05

38

© P. Reali / M. Corti

Run-Time SupportInterface support (II)

vtable

interfaces

itable2

0

itable7

1 2 3 4 5 6 7 sparse array!

Intf0 y = (Intf0)x;y.M();

interface i = x.type.interfaces[Intf0];

if (i != null) i.method[mth_nr]();

Lookup is fast (O(1)), but wastes

memory

Type Descriptor

System-Software WS 04/05

39

© P. Reali / M. Corti

Run-Time SupportInterface Implementation (III)

vtablet

interfaces

itablet,2

0

itablet,7

1 2 3 4 5 6 7vtablet

interfaces

itableu,2

itableu,0

0 1 2 3 4 5 6 7

overlapinterface table

indexType Descriptor t

Type Descriptor u

System-Software WS 04/05

40

© P. Reali / M. Corti

Run-Time SupportInterface Implementation (III)

vtable

interfaces

itable

itable

vtable

interfaces

itable

itable

overlappedinterface table indexType Descriptor

Type Descriptor

System-Software WS 04/05

41

© P. Reali / M. Corti

Run-Time SupportInterface Implementation (III)

vtable

interfaces

itable

itable

itable

itable

overlappedinterface

tables

Type DescriptorIntf0 y = (Intf0)x;y.M();

itable i = x.type.interfaces[Intf0];

if ((i != null) && (i in x.type))i.method[mth_nr]();

System-Software WS 04/05

42

© P. Reali / M. Corti

Run-Time Support Exceptions

void catchOne() {

try {

tryItOut();

} catch (TestExc e) {

handleExc(e);

}

}

void catchOne()0 aload_01 invokevirtual tryItOut();4 return5 astore_16 aload_07 aload_18 invokevirtual handleExc11 return

ExceptionTableFrom To Target Type0 4 5 TestExc

System-Software WS 04/05

43

© P. Reali / M. Corti

Run-Time Support Exception Handling / Zero Overhead

void ExceptionHandler(state){

pc = state.pc, exc = state.exception;

while (!Match(table[i], pc, exc)){

i++;if (i == TableLength) {

PopActivationFrame(state); pc = state.pc; i = 0;

}}state.pc = table[i].pchandler;ResumeExecution(state)

}

try {.....

} catch (Exp1 e) {.....

} catch (Exp2 e) {.....

}

pcstart

pcend

pchandler1

pchandler2

start end exception handler

pcstart pcend Exp1 pchandler1

pcstart pcend Exp2 pchandler2

Global Exception Table

System-Software WS 04/05

44

© P. Reali / M. Corti

Run-Time Support Exception Handling / Zero Overhead

exception table filled by the loader / linker traverse whole table for each stack frame system has default handler for uncatched

exceptions

no exceptions => no overhead exception case is expensive

system optimized for normal case

System-Software WS 04/05

45

© P. Reali / M. Corti

Run-Time Support Exception Handling / Fast Handling

try {.....

} catch (Exp1 e) {.....

} catch (Exp2 e) {.....

}

pchandler1

pchandler2

try {save (FP, SP, Exp1, pchandler1)save (FP, SP, Exp2, pchandler2).....remove catch descr.jump end

} catch (Exp1 e) {.....remove catch descr.jump end

} catch (Exp2 e) {.....remove catch descr.

jump end}end:

push catchdescriptors on

the stack

add codeinstrumentation

use an exception stack to

keep track of the handlers

System-Software WS 04/05

46

© P. Reali / M. Corti

Run-Time Support Exception Handling / Fast Handling

void ExceptionHandler(ThreadState state){

int FP, SP, handler;Exception e;

do{retrieve(FP, SP, e, handler);

} while (!Match(state.exp, e));

state.fp = FP; // set frame to the onestate.sp = SP; // containing the handlerstate.pc = handler; // resume with the handlerResumeExecution(state)

}

pop next exception descriptor from exception stack

can resume in a different

activation frame

System-Software WS 04/05

47

© P. Reali / M. Corti

Run-Time Support Exception Handling / Fast Handling

code instrumentation insert exception descriptor at try remove descriptor before catch

fast exception handling overhead even when no exceptions

system optimized for exception case

System-Software WS 04/05

48

© P. Reali / M. Corti

Virtual Addressing Overview

Virtual Addressing: abstraction of the MMU(Memory Management Unit)

Work with virtual addresses, whereaddressreal = f(addressvirtual)

Provides decoupling from real memory– virtual memory– demand paging– separated address spaces

System-Software WS 04/05

49

© P. Reali / M. Corti

Virtual AddressingPages

Memory as array of pages

12345

76

0

3

1

20

0

5

virtual address-space 2

real memory:pool of page frames

virtual address-space 1

unmapped(invalid) page

pagepageframe

unmappedrange

mapping

programs use and run in this

address spaces

memory address

System-Software WS 04/05

50

© P. Reali / M. Corti

Translation Lookaside BufferAssociative Cache

(PT, VA, RA)(PT, VA, RA)(PT, VA, RA)

Virtual Address Real Address

page-no off

Virtual AddressingPage mapping

Page Table Ptr

Register

off

Virtual Address Real Addressframe

Real Memory

Page Frameframe

Page Table

page-no

off

frame

MMU

TLB

System-Software WS 04/05

51

© P. Reali / M. Corti

Virtual AddressingDefinitions

page smallest unit in the virtual address space

page frame unit in the physical memory

page table table mapping pages into page frames

page fault access to a non-mapped page

working set pages a process is currently using

System-Software WS 04/05

52

© P. Reali / M. Corti

pr

Virtual AddressingAlternate Page Mapping

Multilevel page tables Multipart Virtual Address Page table as (B*-)Tree

Inverted Page-Table

pno1 pno2 off

0

1

N

vp

vp

vp

pr

pr

pr

Hash

pr, vp pf

vp

vp

Next probe

pr

Processpf

pf

pf

pf

pfHashtable

64 bit Address Space

1. Level Table2. Level Table

unassigned

unassigned

System-Software WS 04/05

53

© P. Reali / M. Corti

Virtual AddressingWhat for?

Decoupling from real memory– virtual memory (cheat: use more virtual memory than the

available real memory)– dynamically allocated contiguous memory blocks (for

multiple stacks in multitasking systems)– some optimizations

null reference checks garbage collection (using dirty flag)

Virtual Addressing is not for free!– address mapping may require additional memory accesses– page table takes space

System-Software WS 04/05

54

© P. Reali / M. Corti

Virtual AddressingVirtual Memory

Use secondary storage (disc) to keep currently unused pages (swapping)

Page table usually keeps some per-page flag invalid page not mapped referenced page has been referenced dirty page has been modified

Accessing an invalid page causes a page-fault interrupt select page frame to be swapped out (victim or candidate) swap-in requested page frame

System-Software WS 04/05

55

© P. Reali / M. Corti

Virtual AddressingVirtual Memory / Demand Paging

“Page-out”

“Page-in”Real Memory

Disc

Page Table

victimset to invalid

requestedpage

System-Software WS 04/05

56

© P. Reali / M. Corti

Virtual AddressingDemand Paging Sequence

ELSE Access Page Table; IF Page invalid THEN Page-Fault ELSE RETURN RA ENDEND

IF Free Page Frame exists THEN Assign frame to VAELSE Search victim page; IF victim page modified THEN page-out to secondary storage END; Invalidate victim page; Assign frame to VAEND;Page-in from secondary storage;Reset invalid flag

MMU

OSPage-Fault

Handler

IF VA IN TLB THEN RETURN RA

TLB

E[t] = PTLB * tTLB +PPT * tPT +Pdisc * tdisc

Expected time to translateVA into RA

System-Software WS 04/05

57

© P. Reali / M. Corti

Virtual Addressing Example

Page size 4 KBAddress size 32 Bits

addressable memory: 232 = 4GB

page offset: 12 Bits (4KB = 212)page number: 20 Bits (32 - 12)

page table size: 220 * 32 Bits = 4 MB

Real Memory 128 MB page table overhead: ca. 3%

System-Software WS 04/05

58

© P. Reali / M. Corti

Virtual Addressing Example

mov EAX, @Addr

1-PTLB

PageTable

TLBPTLB

DiscPpage

fault

Memory1-Ppage

fault 1 disc read1 disc write

1 memory read

E[t] = PTLB tTLB + (1- PTLB)(tPT + PPF tdisc + (1-PPF)tmem)

System-Software WS 04/05

59

© P. Reali / M. Corti

Virtual AddressingDemand Paging: Page Replacement

Optimal Strategy (Longest Unused) Take the page, that will remain unused for the

longest time Requires oracle

Pref ref mod

3 0 0

2 0 1

1 1 0

0 1 1

NRU: ”Not Recently Used” Reset the referenced flag at each

tick Create page categories (good

candidate to bad candidate) choose best candidate

System-Software WS 04/05

60

© P. Reali / M. Corti

Virtual AddressingDemand Paging: Page Replacement (2)

LRU: “Least Recently Used” Assumption:

not used in past ==> not used in the future Hardware implementation

64-Bit time-stamp for each page Software implementation

“Aging”-Algorithm Choose page with lowest value

t

0 0 0 01 111 1 1

0 1 11

01 11

Reference Flag

t(i)

t(i+1)

set if page accessed

System-Software WS 04/05

61

© P. Reali / M. Corti

Virtual AddressingDemand Paging: Page Replacement (3)

“Least Recently Created” LRC (FIFO) Page Lifespan as metric (old are swapped out) Chain sorted by creation time Bad handling for often-used pages

Fix: “second chance” when accessed (ref flag set) during the last tick

earliest

Ref-Flag

cur := earliest;WHILE cur.ref DO cur.ref := FALSE; cur := cur.nextEND

next

System-Software WS 04/05

62

© P. Reali / M. Corti

Virtual AddressingDemand Paging: Page Replacement (4)

Strategies:– optimal– LRU / NRU / LRC

Exceptions:– “page pinning”: page cannot be swapped out

kernel code

System-Software WS 04/05

63

© P. Reali / M. Corti

Virtual AddressingExample

Accessed Pages: 1, 2, 1, 3, 4, 1, 2, 3, 4Available Page Frames: 3

working set{1,2,3,4}

Page Access

1 2 1 3 4 1 2 3 4

Ideal 1 1, 2 1, 2 1, 2, 3 1, 2, 4 1, 2, 4 1, 2, 4 2, 3, 4 2, 3 ,4

FIFO 1 1, 2 1, 2 1, 2, 3 2, 3, 4 3, 4, 1 4, 1, 2 1, 2, 3 2, 3, 4

LRU 1 1, 2 1, 2 1, 2, 3 1, 3, 4 1, 3, 4 1, 4, 2 4, 2, 3 4, 2, 3

PF!

PF!

PF! PF! PF! PF! PF! PF! PF!

PF! PF! PF! PF!

PF! PF! PF! PF! PF!

PF!

System-Software WS 04/05

64

© P. Reali / M. Corti

Demand PagingBelady’s Anomaly

LRC Strategie• 3 Page Frames

9 Page Faults

• 4 Page Frames10 Page Faults

0 1 2 3 0 1 4 0 1 2 3 4

0 1 2 3 0 1 4 4 4 2 3 3 0 1 2 3 0 1 1 1 4 2 2 0 1 2 3 0 0 0 1 4 4

0 1 2 3 0 1x x x x x x x x x

0 1 2 3 3 3 4 0 1 2 3 4 0 1 2 2 2 3 4 0 1 2 3 0 1 1 1 2 3 4 0 1 2 0 0 0 1 2 3 4 0 1

0 1 2 3 4 0x x x x x x x x x x

Victim

Victim

Page access sequence

Belady’s Anomaly:More page frames cause more page faults

System-Software WS 04/05

65

© P. Reali / M. Corti

Demand PagingHow many page frames per process?

Even Distribution Every process has the same amount of memory Thrashing

every memory access causes a page-fault not enough page-frames for the current working-set

Process Count

CPU-Load

100 %

1 2 n n+1

System is swapping instead

of running

System-Software WS 04/05

66

© P. Reali / M. Corti

Demand PagingHow many page frames per process? (2)

Depending on the process needs (1) use Working-Set

Page Frames assigned according to the process’ working-set size. Swap-out a process when not enough memory available.

1 3 2 2 3 3 1 2 2 3 3 3 4 2 2 1 1 1 2 1 3 3 3 1 3 1 2 3 4 1

{ 1, 2, 3, 4 }Sliding

Window

Page Access

{ 2, 3, 4 }Working

Set

System-Software WS 04/05

67

© P. Reali / M. Corti

Demand PagingHow many page frames per process? (3)

Depending on the process needs (2) use Page-Fault Rate

Time

HIGH

LOW

Page-Fault Rate

Swap out one process Swap in

System-Software WS 04/05

68

© P. Reali / M. Corti

Virtual AddressingAos/Bluebottle, Memory Layout Example

Stacks

4 GB

2 GB

Heap

Kernel

PROCEDURE PageFault;BEGIN

IF adr > 2GB THENadd page to stack

ELSEException(NilTrap)

ENDEND PageFault;

• 128 KB per stack• max. 32768 active objects• first stack page allocated on process creation

System-Software WS 04/05

69

© P. Reali / M. Corti

Virtual AddressingExample: UNIX, Fork

code

text

data

a UNIX Program consists of.....

Process B

Fork()read-only

read-only

read-only

read-only

Process A

read-only

read-only

Page Table

data’read-write“copy on write”

System-Software WS 04/05

70

© P. Reali / M. Corti

Virtual AddressingOS Control

Oberon– no virtual memory

Windows– Virtual Memory configuration– Task Manager

Linux– Swap partition / Swap files– ps / top

System-Software WS 04/05

71

© P. Reali / M. Corti

Virtual AddressingSegmentation

e.g. Intel x86 Problem

– 640KB Max Memory– 16bit addresses (i.e. 64KB)

Solution– work in a segment– code / data segments– check segment boundaries

Addrreal = Segbase+Offset

real memory

datasegment

codesegment

segment limit

segment base

System-Software WS 04/05

72

© P. Reali / M. Corti

Virtual AddressingSummary

virtual addresses, addressreal = f(addressvirtual)

Decoupling from real memory– virtual memory– demand paging– separate address spaces

Keywords– page– page frame– page table– page fault– page flags

dirty, used, unmapped

– page replacement strategy LRC, LRU, ideal, ...

– swapping– thrashing, belady’s anomaly

System-Software WS 04/05

73

© P. Reali / M. Corti

Memory ManagementOverview

Abstractions for applications

– heap– memory blocks

( << memory pages)

Operations:– Allocate– Deallocate

Topics:– memory organization– free lists– allocation strategies– deallocation explicit– garbage collection

type-aware conservative copying / moving incremental generational

System-Software WS 04/05

74

© P. Reali / M. Corti

Memory ManagementObjects on the heap

Object Instances: a, b, c, d, … Sequence:

NEW(a)NEW(b)NEW(c)DISPOSE(b)NEW(d)NEW(e)

a

b

c

dynamicallocation

explicitdisposal

„Heap“

e

a

c

d

!

e

Case 1

e

e

Case 2

not enough space

System-Software WS 04/05

75

© P. Reali / M. Corti

Memory ManagementProblem overview

Problems Heap size limitation ( e, case 1) External Fragmentation ( e, case 2) Dangling Pointers (a points to b)

Solutions System-managed list of free blocks

(„free list“) Vector of blocks with fixed size

(Bitmap, with 0=free, 1=used) Automated detection and reclamation of unused blocks

(„garbage collection“)

System-Software WS 04/05

76

© P. Reali / M. Corti

Memory ManagementTheory: 50% rule

Assumption: stable state M free blocks, N block allocated 50%-Rule: M = 1/2 N

A B B B BC C

N = A + B + CM = 1/2 (2A + B + e) e = 0,1, or 2

block disposal: ΔM = (C - A) / Nblock allocation:(splitting likelihood)

ΔM = 1 - p

B

(C - A) / N = 1 - pC - A - N + pN = 0

2M = 2A + B + e2M = 2A + N - A - C + e2M = N + A - C + e

2M +e = pN

System-Software WS 04/05

77

© P. Reali / M. Corti

Memory ManagementTheory: Memory Fragmentation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

{ 50%-Rule }(b/2)*F = H - b*B, /2*b*B = H - b*BH/(b*B) = 1 + /2, = 2/ - 2

Criticalpoint

System-Software WS 04/05

78

© P. Reali / M. Corti

Memory ManagementFree-list management with a Bitmap

Idea– partition heap in blocks of size s– use bitmap to track allocated blocks

bitmap[i] = true blocki allocated Problems

– internal fragmentationround up block size to next multiple of s

– map sizesize is (heap_size / s) bits

loss due to internal

fragmentation

System-Software WS 04/05

79

© P. Reali / M. Corti

Memory ManagementFree-list management with a list

List organization– sorted / non-sorted

merging of empty blocks is simpler with sorted list– one list / many lists (per size)

search is simpler, merging is more difficult– management data stored in the free block

size, next pointer

Operations– Allocation– Disposal with merge

find free blocks next to current block, merge into bigger free block

System-Software WS 04/05

80

© P. Reali / M. Corti

Memory ManagementMemory allocation strategies

block splitting:– if a free-block is bigger than the requested block, then it is split

first-fit– use first free block which is big enough

best-fit– take smallest fitting block causes a lot of fragmentation

worst-fit– take biggest available block

quick-fit– best-fit but multiple free-lists (one per block size) fast allocation!

freeused used freeused usedused

usedused used

internal fragmentation

System-Software WS 04/05

81

© P. Reali / M. Corti

Memory ManagementBuddy System (for fast block merging)

Blocks have size 2k

Block with size 2i has address j*2i (last i bits are 0)

Blocks with address x=j*2i and (j XOR 1)*2i are buddies (can be merged into a block of size 2i+1)

buddy = x XOR 2i

32

64

32

321616

32816 8b1 xxxx 0 0000b2 xxxx 1 00002k+1

2k

2k-1Merge

Split

System-Software WS 04/05

82

© P. Reali / M. Corti

Memory ManagementBuddy System (for fast block merging)

Problem: only buddies can be merged

Cascading merge

321616

32816 8

321616

32816 8

no buddiesbuddies

32816 8

321616

3232

System-Software WS 04/05

83

© P. Reali / M. Corti

Memory ManagementBuddy System (for fast block merging)

Allocation– allocate(8)

328 168

321616

3232split

split

quickfit

328 168

System-Software WS 04/05

84

© P. Reali / M. Corti

Block size = k*32free-lists for k = 1..9, one list for blocks > 9*32

Allocate quick-fit, splitting may be required Free-list management and block-merging done

by the Garbage Collector

Memory ManagementExample: Oberon / Aos

k * 32966432

ALLOCATE(50)

initialstate

k * 32966432

Allocated Block

System-Software WS 04/05

85

© P. Reali / M. Corti

Memory ManagementGarbage Collection

Two steps:

1. Free block detection– type-aware

collector is aware of the types traversed, i.e. know which values are pointers

– conservative collector doesn’t know

which values are pointers

2. Block Disposal return unused blocks to the

free-lists

GC Characteristics– incremental

gc is performed in small steps to minimize program interruption

– moving / copying / compactingblocks are moved around

– generationalblocks are grouped in generations; different treatment or collection priority

Barriers– read

intercept and check every pointer read operation

– writeintercept and check every pointer write operation

System-Software WS 04/05

86

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Reference Counting

Every object has a Reference counter rc rc = 0 Object is „Garbage“ Problems

Overhead

no support for circular structures

Useful for... Module hierarchies DAG-Structures (z. B. LISP)

p, q Pointers to Objectq := p

rc

p

rc

q

write barrier

INC p.rcDEC q.rcIF q.rc = 0 THEN Collect q^END;q := p

M

A B

C D

rc >= 1

rc >= 1

System-Software WS 04/05

87

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Mark & Sweep

Mark-Phase (Garbage Detection) Compute the Root-set consisting of

global pointers (statics) in each module local pointers on the stack in each PAF temporary pointers in the CPU’s registers

Traverse the graph of the live objects starting from the root-set with depth-first strategy; mark all reached objects.

Sweep-Phase (Garbage Collection) Linear heap traversal. Non-marked blocks are inserted into

free-lists. Optimization: lazy sweeping (sweep during allocation,

allocation gets slower)

System-Software WS 04/05

88

© P. Reali / M. Corti

Run-time support from object-system. Hidden data structures with (compiler generated) information about pointers (metadata).

Conservative approach. Guess which values could be pointers and threat them as such

Memory ManagementGarbage Collection: root-set

off

off1

off2

off2off1

off

ModuleDescriptor

ModuleData

ObjectInstance

TypDescriptor

Type Tag

globalpointer

instance

pointer

System-Software WS 04/05

89

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Mark with Pointer Rotation/1

Problem:Garbage collection called when free memory is low, but mark may require a lot of memory

Solution:Pointer rotation algorithm (Deutsch, Schorre , Waite)

+ Memory efficient+ iterative

– structures are temporarily inconsistent– non-concurrent– non-incremental

System-Software WS 04/05

90

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Mark with Pointer Rotation/2

q pq p

p.link

Simple case: list traversal

System-Software WS 04/05

91

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Mark with Pointer Rotation/3

q

p q

p

Generic case: structure traversal

System-Software WS 04/05

92

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Memory Compaction

nextavail Pointer: partition heap between allocated and free space

Allocate: increment nextavail Garbace Collector performs memory compaction

nextavail

ALLOC

GC

MS .NET

System-Software WS 04/05

93

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Stop & Copy

Partition heap in from and to regions Collection:

– traverse objects in from, copy to to– leave forwarding pointer behind– requires read barrier– swap from and to

Characteristics– copying– incremental– (generational)

IF p is moved THENreplace p with forwarding pointer

END;access p

access p

instrument code with read barrier

System-Software WS 04/05

94

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Stop & Copy

from to

1

from to

2

from to

3

to from

4

System-Software WS 04/05

95

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Concurrent GC

„Stop-and-Go“ Approach

„Incremental“ Approach

Mutator Mutator MutatorGC GC

Mutator

GC

Mutator Mutator Mutator

User Process

Real-TimeConstraint

System-Software WS 04/05

96

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Tricolor marking

„Wave-front“ Model

State Color

already traversed,behind wave

black

being traversed,on the wave

grey

not reached yet,

in front of the wave

white

System-Software WS 04/05

97

© P. Reali / M. Corti

Mutator can change pointers at any time Critical case: black white

Remedy Write-Barrier

color B gray color W gray

Memory ManagementGarbage Collection: Tricolor marking / Isolation

W

unreachable

B

WriteBarrier

System-Software WS 04/05

98

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Backer‘s Treadmill

To-SpaceFrom-Space

Free-Space

Heap: double-linked chain of

objects

curscan

System-Software WS 04/05

99

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Backer‘s Treadmill

To-Space From-Space

Free-Space

curscan

conservativeallocation

progressiveallocation

System-Software WS 04/05

100

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Backer‘s Treadmill

collect

To-SpaceFrom-Space

Free-Space

curscan

reference

curscancurscan

System-Software WS 04/05

101

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Backer‘s Treadmill

State transitions after GC is complete From-Space + Free-Space Free-Space ToSpace FromSpace

Fragmentation External: not removed Internal: depends on

supported block sizes Allocation

conservative: black progressive: white

Root Set

x

y

NEW(y)

NEW(x)

curscan

System-Software WS 04/05

102

© P. Reali / M. Corti

Memory ManagementGenerational Garbage Collection

Generations Expected object life

young short life (temp data)old long life

Generations G0, G1, G2

A

B

C

D

E

A

D

F

G

A

G

H

I

J

G2

G1

G0 special handling for pointers

across different generations

required

GenGC

frequency

G0 high

G1 medium

G2 low

collect where it is garbage is most

likely to be found

System-Software WS 04/05

103

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Finalization

Finalization (after-use cleanup) User-defined routine when object is collected Establish Consistency

save buffers flush caches

Release Resources close connections release file descriptors

Dangers: Resurrection of objects: objects added to live structures Finalization sequence is undefined

System-Software WS 04/05

104

© P. Reali / M. Corti

Memory ManagementGarbage Collection: .NET Finalization Example

Rules:objects with finalizer belong to

older generation finalizer only called once

(ReRegisterForFinalize)FinalizationQueue: live object

with finalizerFreachableQueue: collected

objects to be finalizedFinalization executed by

different process for security reasons

ABCDE E

BA

garbageFinalizationQueue

ABCDE

EB

A FinalizationQueue

FreachableQueue

GC

thread

System-Software WS 04/05

105

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Weak Pointers

„Weak“ Pointers Objects referenced only

through a weak pointer can be collected by the GC in case of need

Used for Caches and Buffers

Implementation1. Weak Pointers are not

registered to the GC

2. Use a weak reference table (indirect access)

garbagegarbage in use

weak pointer

weak reference

weak reference table

System-Software WS 04/05

106

© P. Reali / M. Corti

Memory ManagementGarbage Collection: Weak Pointers Example

Oberon: internal file list– system must keep track of open files to avoid buffer

duplication– file descriptor must be collected once user has no

more reference to it– use weak pointer in the system (otherwise would

keep file alive!)

System-Software WS 04/05

107

© P. Reali / M. Corti

Memory ManagementObject Pools

Application keeps a pool of preallocated object instances; handles allocation and disposal Simulation discrete events Buffers in a file system Provide dynamic allocation in real-time systemPROCEDURE NewT (VAR p: ObjectT);BEGIN IF freeT = NIL THEN NEW(p) ELSE p := freeT; freeT := freeT.next ENDEND NewT;

PROCEDURE DisposeT (p: ObjectT);BEGIN p.next := freeT; freeT := pEND DisposeT;

System-Software WS 04/05

108

© P. Reali / M. Corti

Garbage Collection, Recap

GC kinds: compacting copying incremental generationalHelpers: write barrier read barrier forwarding pointer pointer rotation

Algorithms: Ref-Count Mark & Sweep Stop & Copy Mark & Copy (.NET) Baker’s Threadmill

– Dijkstra / Lamport– Steele

System-Software WS 04/05

109

© P. Reali / M. Corti

Distributed Object SystemsOverview

Goals– object-based approach– hide communication

details

Advantages– more space– more CPU– redundancy– locality

Problems Coherency

– ensure that same object definition is used

Interoperability– serialization– type consistency– type mapping

Object life-time– distributed garbage collection

System-Software WS 04/05

110

© P. Reali / M. Corti

Distributed Object SystemsArchitecture

Proxy Stub Impl.

NamingService

IDL

ObjectBroker

Client Server

ObjectBroker

CallContex

tMessage

IDL-Compiler IDL-Compiler

Impl.Skeleton

Application

System-Software WS 04/05

111

© P. Reali / M. Corti

Remote Procedure InvocationOverview

Problem– send structured

information from A to B– A and B may have

different memory layouts– “endianness”

– How is 0x1234 (2 bytes) representend in memory?

12 340 1

1234

Big-Endian: MSB before LSB• IBM, Motorola, Sparc

Little-Endian: LSB before MSB•VAX, Intel

network byte-ordering

little end first

System-Software WS 04/05

112

© P. Reali / M. Corti

Definitions

Serialization– conversion of an object‘s instance into a byte stream

Deserialization– conversion of a stream of bytes into an object‘s instance

Marshaling– gathering and conversion (may require serialization) to an

appropriate format of all relevant data, e.g in a remote method call; includes details like name representation.

System-Software WS 04/05

113

© P. Reali / M. Corti

Remote Procedure InvocationProtocol Overview

Protocols– RPC + XDR (Sun)

RFC 1014, June 1987 RFC 1057, June 1988

– IIOP / CORBA (OMG) V2.0, February 1997 V3.0, August 2002

– SOAP / XML (W3C) V1.1, May 2000

– ...

XDR Type System– [unsigned] Integer (32-bit)– [unsigned] Hyper-Integer (64-bit)– Enumeration (unsigned int)– Boolean (Enum)– Float / Double (IEEE 32/64-bit)– Opaque– String– Array (fix + variable size)– Structure– Union– Void

big-endian representation

System-Software WS 04/05

114

© P. Reali / M. Corti

Remote Procedure InvocationRPC Protocol

Remote Procedure Call Marshalling of procedure

parameters

Message Format Authentication Naming

Client

PROCEDURE P(a, b, c)• pack parameters• send message to

server• await response• unpack response

Server

Server• unpack parameters• find procedure• invoke• pack response• send response

P(a, b, c)

System-Software WS 04/05

115

© P. Reali / M. Corti

Distributed Object SystemsDetails

References vs. Values– client receives reference to

remote object– data values are copied to

client for efficiency reasons– decide whether an object is

sent as reference or a value serializable (Java, .NET),

valuetype (CORBA) MarshalByRefObject (.NET),

java/RMI/Remote (Java), default (CORBA)

object creation– server creates objects– client creates objects– server can return references

object instances– one object for all requests– one object for each requests– one object per proxy

conversation state– stateless– stateful

System-Software WS 04/05

116

© P. Reali / M. Corti

Distributed Object SystemsDistr. Object Systems vs. Service Architecture

Dist. Object System– object oriented model– object references– stateful / stateless– tight coupling

Service Architecture– OO-model / RPC– service references– stateless– loose coupling

internal communication between application’s

tiers

external communication

between applications

System-Software WS 04/05

117

© P. Reali / M. Corti

Distributed Object SystemsDistr. Object Systems vs. Service Architecture

heterogeneoushomogeneous

tightloose

CORBAR

emoting

RM

I

Web Services

• components / objects(distributed object system)

• stateful and statelessconversation

• transactions

• servicesremote procedure calls

• stateless conversation(session?)

• messageenvironment

coupling

System-Software WS 04/05

118

© P. Reali / M. Corti

Distributed Object SystemsType Mapping

Type System 1InteroperabilityType System Type System 2

Possible Types Possible Types Possible Types

MappableTypes

MappableTypes

InteropSubset

System-Software WS 04/05

119

© P. Reali / M. Corti

Distributed Object SystemsType Mapping, Example

JavaType System

CORBAType System

CLSType System

wchar

doubledouble double

char

char

char

enumenum

union union union

custom implementation custom implementation

System-Software WS 04/05

120

© P. Reali / M. Corti

Distributed Object SystemsExamples

Standards– OMG CORBA

IIOP

– Web Services SOAP

Frameworks– Java RMI (Sun)– DCOM (Microsoft)– .NET Remoting (Microsoft)

IIOP.NET

System-Software WS 04/05

121

© P. Reali / M. Corti

Distributed Object SystemsCORBA

Common Object Request Broker Architecture

TCP/IP Socket

ORBORB

InterfaceRepository

ImplementationRepositoryCORBA

Runtime

Object AdaptorCORBARuntime

Client StubObject Skeleton

ObjectClient Application

Remote Architecture

Client Server

GIOP/IIOP

„Object-Bus“

                                                                                                     

System-Software WS 04/05

122

© P. Reali / M. Corti

Distributed Object SystemsCORBA

– CORBA is a standard from OMG

Object Management Group Common Object Request

Broker Architecture

– CORBA is useful for... building distributed object

systems heterogeneous

environments tight integration

– CORBA defines... an object-oriented type system an interface definition language

(IDL) an object request broker (ORB) an inter-orb protocol (IIOP) to

serialize data and marshall method invocations

language mappings from Java, C++, Ada, COBOL, Smalltalk, Lisp, Phyton

... and many additional standards and interfaces for distributed security, transactions, ...

System-Software WS 04/05

123

© P. Reali / M. Corti

Distributed Object SystemsCORBA

Basic Types– integers

16-, 32-, 64bit integers (signed and unsigned)

– IEEE floating point 32-, 64-bit and extended-

precision numbers– fixed point– char, string

8bit and wide– boolean– opaque (8bit), any– enumerations

Compound Types– struct– union– sequence (variable-length array)– array (fixed-length)– interface

concrete (pass-by-reference) abstract (pure definition)

– value type pass-by-value abstract (no state)

Operations in / out / inout parameters raises

Attributes

System-Software WS 04/05

124

© P. Reali / M. Corti

Distributed Object SystemsCORBA / General Inter-ORB Protocol (GIOP)

CDR (Common Data Representation)

– Variable byte ordering– Aligned primitive types– All CORBA Types supported

IIOP (Internet IOP)– GIOP over TCP/IP– Defines Interoperable Object

Reference (IOR) host post key

Message Format– Defined in IDL– Messages

Request, Reply CancelRequest, CancelReply LocateRequest, LocateReply CloseConnection MessageError Fragment

– Byte ordering flag– Connection Management

request multiplexing asymmetrical / bidirectional

connections

System-Software WS 04/05

125

© P. Reali / M. Corti

Distributed Object SystemsCORBA / GIOP Message in IDL

module GIOP {struct Version {

octet major;octet minor;

}enum MsgType_1_0 {

Request, Reply, CancelRequest,CancelReply, LocateRequest,LocateReply, CloseConnection, Error

}

struct MessageHeader {

char Magic[4];

Version GIOP_Version;

boolean byte_order;

octet message_size;

unsigned long message_type;

}

} // module end GIOP

System-Software WS 04/05

126

© P. Reali / M. Corti

Distributed Object SystemsCORBA Services

CORBA Services– System-level services defined

in IDL– Provide functionality required by

most applications Naming Service

– Allows local or remote objects to be located by name

– Given a name, returns an object reference

– Hierarchical directory-like naming tree

– Allows getting initial reference of object

Event Service– Allows objects to

dynamically register interest in an event

– Object will be notified when event occurs

– Push and pull models ... and more

– Trader, LifeCycle, Persistence, Transaction, Security

System-Software WS 04/05

127

© P. Reali / M. Corti

Web Services

Distributed Object SystemsWebServices

Service-oriented architecture Rely on existing protocols

– SOAP messaging protocol

– WSDL service description protocol

– UDDI service location protocol

SOAP

HTTP

TCP/IP

System-Software WS 04/05

128

© P. Reali / M. Corti

Distributed Object SystemsSOAP

Simple Object Access Protocol communication protocol XML-based describes object values XML Schemas as interface

description language– basic types

string, boolean, decimal, float, double, duration, datetime, time, date, hexBinary, base64Binary, URI, Qname, NOTATION

– structured types list, union

SOAP Message– SOAP Envelope– SOAP Header– SOAP Body

Method Call– packed as structure– messages are self-

contained– no external object

references

System-Software WS 04/05

129

© P. Reali / M. Corti

Distributed Object SystemsSOAP Message

SOAP Message– SOAP Envelope

SOAP Header SOAP Body

Example

float Multiply(float a, float b);

System-Software WS 04/05

130

© P. Reali / M. Corti

Distributed Object SystemsSOAP Example (Request)

POST /quickstart/aspplus/samples/services/MathService/CS/MathService.asmx HTTP/1.1

Host: samples.gotdotnet.com Content-Type: text/xml; charset=utf-8 Content-Length: length SOAPAction: "http://tempuri.org/Multiply" <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body>

<Multiply xmlns="http://tempuri.org/"> <a>float</a> <b>float</b> </Multiply></soap:Body>

</soap:Envelope>

System-Software WS 04/05

131

© P. Reali / M. Corti

Distributed Object SystemsSOAP Example (Answer)

HTTP/1.1 200 OK Content-Type: text/xml; charset=utf-8 Content-Length: length <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-

instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body>

<MultiplyResponse xmlns="http://tempuri.org/"> <MultiplyResult>float</MultiplyResult>

</MultiplyResponse> </soap:Body>

</soap:Envelope>

System-Software WS 04/05

132

© P. Reali / M. Corti

Distributed Object SystemsSOAP Example (Service Description-1)

<?xml version="1.0" encoding="utf-8"?><definitions ....> <types> <s:schema elementFormDefault="qualified"

targetNamespace="http://tempuri.org/"> <s:element name="Multiply"> <s:complexType><s:sequence> <s:element minOccurs="1" maxOccurs="1" name="a" type="s:float" /> <s:element minOccurs="1" maxOccurs="1" name="b" type="s:float" /> </s:sequence></s:complexType> </s:element> </s:schema> </types> <message name="MultiplySoapIn"> <part name="parameters" element="s0:Multiply" /> </message>

System-Software WS 04/05

133

© P. Reali / M. Corti

Distributed Object SystemsSOAP Example (Service Description-2)

<binding name="MathServiceSoap" type="s0:MathServiceSoap"> <soap:binding transport="http://schemas.xmlsoap.org/soap/http"

style="document" /> <operation name="Multiply"> <soap:operation soapAction="http://tempuri.org/Multiply" style="document" /> <input><soap:body use="literal" /></input> <output><soap:body use="literal" /></output> </operation> </binding> <service name="MathService"> <port name="MathServiceSoap" binding="s0:MathServiceSoap"> <soap:address

location="http://samples.gotdotnet.com/quickstart/aspplus/samples/services/MathService/CS/MathService.asmx" />

</port> </service></definitions>

System-Software WS 04/05

134

© P. Reali / M. Corti

Distributed Object SystemsWebServices

Comments– XML (easily readable)– system independent– standard– stateless (encouraged design

pattern)

– bloated– big messages (but easily

compressed)– requires expensive parsing

Constraints– Services

no object references server-activated servant

– Goes over HTTP requires web server

System-Software WS 04/05

135

© P. Reali / M. Corti

Distributed Object SystemsWebService Future

Use SOAP-Header to store additional information about message or context

Many standards to come...– WS-Security– WS-Policy– WS-SecurityPolicy– WS-Trust– WS-SecureConversation– WS-Addressing

System-Software WS 04/05

136

© P. Reali / M. Corti

Distributed Object SystemsJava RMI

Java Remote Method Invocation

TCP/IP Socket

TransportLayer

RemoteReferences

Object Stub

ObjectClient Application

Network

Remote Architecture

Client Server

LookupRegister

TransportLayer

RemoteReferences

Object Stub

LookupRegister

System-Software WS 04/05

137

© P. Reali / M. Corti

Distributed Object SystemsJava RMI Details

Framework– supports various implementations

e.g. RMI/IIOP– mapping limited to the Java type system, workarounds

needed

– uses reflection to inspect objects

System-Software WS 04/05

138

© P. Reali / M. Corti

Distributed Object-SystemsLow-Level Details: Java RMI/IIOP

Common Type-System– restricted CORBA

Marshalling– name mapping– remote objects

only references

Interface Description Language (IDL)

– java to IDL mapping

Message representation Underlying protocol

– IIOP (CORBA)

System-Software WS 04/05

139

© P. Reali / M. Corti

Distributed Object SystemsMicrosoft DCOM

Distributed Common Object Model

RPC Channel

SCMSCM

SCMs and RegistrationCOMRuntime

COMRuntime

Object Proxy Object Stub

ObjectClient Application

Network

Remote Architecture

Client ServerRegistry Registry

OXID Resolver

Ping Server

System-Software WS 04/05

140

© P. Reali / M. Corti

Distributed Object SystemsMicrosoft .NET Remoting

InstanceInstance

Network

ChannelChannelChannelChannel

TransparentProxy

TransparentProxy

ObjRefObjRef

ClientClient

new Instace()or

Activator.GetObject(...) Application D

omain B

oundaryIChannelInfo ChannelInfo;IEnvoyInfo EnvoyInfo;IRemotingTypeInfo TypeInfo;string URI;

System-Software WS 04/05

141

© P. Reali / M. Corti

channel

channel

Distributed Object SystemsMicrosoft .NET Remoting

ClientClient InstanceInstance

Instance s = new Instance();s.DoSomething();

Network

ProxyProxy DispatcherDispatcher

FormatterFormatter FormatterFormatter serialize object

TransportSink

TransportSink

TransportSink

TransportSink

handle communication

Stream Chan.Sink(s)

Stream Chan.Sink(s)

Stream Chan.Sink(s)

Stream Chan.Sink(s)

custom operations

MessageChan.Sink(s)

MessageChan.Sink(s)

MessageChan.Sink(s)

MessageChan.Sink(s)

custom operations

System-Software WS 04/05

142

© P. Reali / M. Corti

Distributed Object SystemsMicrosoft .NET Remoting

Activation client

– one instance per activation server / Singleton

– one instance of object server / SingleCall

– one instance per call

Leases (Object Lifetimes)– renew lease on call– set maximal object lifetime

Serialization– SOAP

Warning: non-standard types, only for .NET use

– binary– user defined

Transport– TCP– HTTP– user defined

System-Software WS 04/05

143

© P. Reali / M. Corti

AppDomain 2AppDomain 1

Distributed Object SystemsMicrosoft .NET Remoting (Object Marshalling)

MarshalByRefObjects remoted by reference client receives an ObjRef

object, which is a“pointer“ to the original object

[Serializable] all fields of instance are

cloned to the client [NonSerialized] fields are

ignored ISerializable

object has method to define own serialization

Obj Proxy

AppDomain 2AppDomain 1

Obj Obj‘

SerializedObjRef

Serializedfld1... fldn

System-Software WS 04/05

144

© P. Reali / M. Corti

Distributed Object SystemsMicrosoft .NET Remoting, Activation

Server-Side Activation (Well-Known Objects)

– Singleton Objects only one instance is allocated to process all

requests– SingleCall Objects

one instance per call is allocated

Client-Side Activation– Client Activated Objects

the client allocates and controls the object on the server

“stateless”

“stateful”

System-Software WS 04/05

145

© P. Reali / M. Corti

Distributed Object SystemsMicrosoft .NET Remoting, Limitations

–Server-Activated Objects object configuration limited to the default constructor

–Client-Activated Objects class must be instantiated, no access over interface class hierarchy limitations use Factory Pattern

– to get interface reference– to allow parametrization of the constructor

–Furthermore... interface information is lost when passing an object reference to another

machine no control over the channel

– which channel is used– which peer is allowed to connect

System-Software WS 04/05

146

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET

Opensource project based on ETH-Diploma thesis– http://iiop-net.sourceforge.net/

IIOP.NET (marketing)– „Provide seamless interoperability between .NET and CORBA-

based peers (including J2EE)“

IIOP.NET (technical) .NET remoting channel implementing the CORBA IIOP protocol Compiler to make .NET stubs from IDL definitions IDL definition generator from .NET metadata

System-Software WS 04/05

147

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET

IIOP rather than SOAP transparent reuse of

existing servers tight coupling object-level granularity efficiency

Runtime: standard .NET remoting channel for IIOP transport sink formatter type-mapper

Build tools IDL CLS compiler CLS IDL generator

.NETserver

.NETclient

J2EEserver

Javaclient

CORBAobjects

IIOPbinary IIOP

Java Type System IDL Type System CLS Type System

Possible Types Possible Types Possible Types

IDL MappableTypes

IDLMappableTypesInterop

Subset

System-Software WS 04/05

148

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET, Interoperability

CommunicationProtocols

Data Model

Message Format

Contextual DataInterception Layer

Conversation

Services

Application

TCP/UDP, Byte stream, point-to-point communication

Type system, mapping and conversion issues

RPC, IIOP, HTTP, SOAP, proprietary binary format,messages, unknown data (exceptions), encryption

SessionID, TransactionID, cultureID, logical threadID …

Activation model (EJB, MBR), global naming,distributed garbage collection, conversational state,…

Distributed Transaction Coordinator, Active Directory, …

This is what we want

System-Software WS 04/05

149

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET, Granularity

Service Component Object

Message-basedInterface,Stateless

Strongly-typedInterface,

Stateless or Stateful

ImplementationDependency,

Stateful

Object Object

Component

Object Object

Component

Object Object

Component

Service Service

System

Granularity

Coupling,Interaction

System-Software WS 04/05

150

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET

1.01.11.2

1.3

1.4

1.5

1st A

rtic

le

2nd A

rtic

le1.6

System-Software WS 04/05

151

© P. Reali / M. Corti

Distributed Object SystemsCase Study: IIOP.NET, Performance

Test Case:–WebSphere 5.0.1 as server–Clients

IBM SOAP-RPC Web Services IBM Java RMI/IIOP IIOP.NET

–Response time receiving 100 beans from

server– WS: 4.0

seconds– IIOP.NET: 0.5

seconds when sending many more

beans, WS are then 200% slower than IIOP.NET

Source: posted on IIOP.NET forum

System-Software WS 04/05

152

© P. Reali / M. Corti

Processes and ThreadsIntroduction

CPU as resource, provide abstraction to it

Allow multiprogramming– pseudo-parallelism

(single-processors)– real parallelism

(multi-processors)Required abstractions

– multiple activities -- execution of instructions

– protection of resources– synchronization of activities

Topics– coroutines– processes – threads– scheduling

fairness starvation

– synchronization deadlocks

System-Software WS 04/05

153

© P. Reali / M. Corti

Processes and ThreadsMultithreading

Call a.run Call b.Q Call b.q Call b.q Return b.q Return b.q Call c.R Return c.R Return b.QReturn a.run

a.run

b.Q

Thread 1 b.q

b.q

Call c.run Call d.Q Call d.q Call d.q Return d.q Return d.q Call e.R Return e.R Return d.QReturn c.run

Thread 2

c.run

d.Q

e.R

Stack 1

Stack 2

12

time12

time12

time

System-Software WS 04/05

154

© P. Reali / M. Corti

Processes and ThreadsCoroutines (1)

Coroutines– each activity has its own stack, address-space is

shared– explicit context switch (stack only) under

programmer‘s control– uses Transfer call switch to another coroutine

System-Software WS 04/05

155

© P. Reali / M. Corti

Processes and ThreadsCoroutines (2)

Subroutines

Coroutinen

Call

Return

Start

Start

Transfer

Transfer

Call

Return

System-Software WS 04/05

156

© P. Reali / M. Corti

Processes and ThreadsCoroutines (3)

TYPE Coroutine = POINTER TO RECORDFP: LONGINT;stack: POINTER TO ARRAY OF SYSTEM.BYTE;

END;

VAR cur: Coroutine; (* Current Coroutine *)

PROCEDURE Transfer*(to: Coroutine);BEGIN

SYSTEM.GETREG(SYSTEM.EBP, cur.FP);cur := to;SYSTEM.PUTREG(SYSTEM.EBP, cur.FP);

END Transfer;

MOV ESP, EBPPOP EBPRET 4

PUSH EBPSUB ESP, 4

save FP

restore FP

System-Software WS 04/05

157

© P. Reali / M. Corti

Processes and ThreadsCoroutines (4)

to’SPFP

PC’FP’

locals

stackQstackP

to’

SP

FP PC’FP’

locals

stackQstackP

Qpcx

localsFP”

FP := Q.FP

to’

SP

FP PC’FP’

locals

stackQstackP

Qpcx

localsFP”

Transfer(Q)

FP

stackQstackP

Qpcx

localsFP”

SP

returnjump at PC’

System-Software WS 04/05

158

© P. Reali / M. Corti

Processes and ThreadsCoroutines (5)

Current stack: current execution state All other stacks: top PAF (proc activation frame) contains last

Transfer call Start: create stack with fake Transfer-like PAF

PROCEDURE Start(C: Coroutine; size: LONGINT);BEGIN

NEW(C.stack, size);tos := SYSTEM.ADR(C.stack[0])+LEN(C.stack);SYSTEM.PUT(tos-4, 0); (* par = null *)SYSTEM.PUT(tos-8, 0); (* PC’ = null, not allowed to return *)SYSTEM.PUT(tos-12, 0); (* FP’ *)cur.FP := tos-12;

END;

System-Software WS 04/05

159

© P. Reali / M. Corti

Processes and ThreadsProblems caused by multitasking

Concurrent access to resources

– protectionlimit access to a resource

– synchronizationsynchronize task with resource state or other task

Concurrent access to CPU– task priorities– scheduling

One problem’s solution is another problem’s cause....

– deadlocks– fairness– deadlines / periodicity

constraints

System-Software WS 04/05

160

© P. Reali / M. Corti

Processes and ThreadsProtection: Mutual Exclusion

Mutual Exclusiononly one activity is allowed to access one resource at a time disable interrupts (single CPU only, avoid switches) locks

flag: lock taken / lock free spin lock (uses busy waiting) exclusive lock read-write lock (multiple reader, one writers)

System-Software WS 04/05

161

© P. Reali / M. Corti

Processes and ThreadsProtection: Monitor

Shared resources as Monitor resources are passive objects execution of critical sections inside monitor is

mutually exclusive Global Monitor Lock Shared Monitor Lock for read-access (optional)

monitor as a special module [original version (Hoare, Brinch Hansen)]

object instance as monitor method and code block granularity Java, C#, Active Oberon, ...

Resource

task P task Q

acquire

releaseacquire

release

System-Software WS 04/05

162

© P. Reali / M. Corti

Processes and ThreadsProtection

Simplistic implementation with coroutinesNon-reentrant lock (no recursion allowed)

PROCEDURE Acquire(r: Resource);BEGIN

IF r.taken THENInsertList(r.waiting, cur);SwitchToNextRoutine()

ELSEr.taken := TRUE

ENDEND Acquire;

PROCEDURE Release(r: Resource);BEGIN

next := GetFromList(r.waiting);IF next # NIL THEN

InsertList(ready , next);Transfer(GetNextTask());

ELSEr.taken := FALSE

ENDEND Release;

one waiting queue per resource is

required

System-Software WS 04/05

163

© P. Reali / M. Corti

Processes and ThreadsProtection

Shared resource as Process synchronization during communication Communicating Sequential Processes (CSP)

C.A.R. Hoare (1978) Model of communication

„Rendez-vous“ between two processes P!x (send x to process P) Q?y (ask y from process Q)

Used in Ada, Occam

task P task Q

Q!z

P?x

task P task QQ!z

P?x

System-Software WS 04/05

164

© P. Reali / M. Corti

Processes and ThreadsProtection

Some variations on the theme....– Reentrant Locks– Readers / Writers

one writer or multiple readers allowed

– Binary Semaphores one activity can get the resource

– Generic Semaphores N activities are allowed to get the resource

System-Software WS 04/05

165

© P. Reali / M. Corti

Processes and ThreadsSynchronization

Wait on a condition / state Signals with Send/Wait Methods

Require cooperation from all processes

Example: Producer/Consumer with conditions nonempty/nonfull

Semantic of Send

Send-and-Pass vs. Send-and-Continue

Generic system-handled conditions (Active Oberon) AWAIT(x > y);

Wait on partner process CSP

System-Software WS 04/05

166

© P. Reali / M. Corti

Processes and ThreadsSynchronization: Implementation Example

Process list double-chained list of all coroutines cur points to current (running) coroutine each signal has a LIFO list

C2

C1C4

C5C3s

link

ready

Signal cur

System-Software WS 04/05

167

© P. Reali / M. Corti

Processes and ThreadsSynchronization: Implementation Example

Scheduleprev := cur;WHILE ~cur.ready & cur.next # prev DO cur := cur.nextEND;IF cur.ready THEN Transfer(cur) ELSE (*deadlock*) END

Terminatecur.next.prev := cur.prev;cur.prev.next := cur.next;Schedule

System-Software WS 04/05

168

© P. Reali / M. Corti

Processes and ThreadsSynchronization: Implementation Example

Send(s)IF s # NIL THEN (*send-and-pass*) cur := s; s.ready := TRUE; s := s.linkEND;Schedule (*to next ready from cur*)

Wait(s)cur.link := s; s := cur; cur.ready := FALSE;Schedule (*to next ready from cur*)

Init(s)s := NIL

System-Software WS 04/05

169

© P. Reali / M. Corti

Processes and ThreadsActive Oberon: Bounded Buffer

Buffer* = OBJECTVAR

data: ARRAY BufLen OF INTEGER;in, out: LONGINT;

(* Put - insert element into the buffer *)

PROCEDURE Put* (i: INTEGER);BEGIN {EXCLUSIVE}

(*AWAIT ~full *)AWAIT ((in + 1) MOD BufLen #

out);data[in] := i; in := (in + 1) MOD

BufLenEND Put;

(* Get - get element from the buffer *)PROCEDURE Get* (VAR i: INTEGER);

BEGIN {EXCLUSIVE}(*AWAIT ~empty *)AWAIT (in # out);i := data[out]; out := (out + 1) MOD

BufLenEND Get;

PROCEDURE & Init;BEGIN

in := 0; out := 0;END Init;

END Buffer;

System-Software WS 04/05

170

© P. Reali / M. Corti

Processes and ThreadsCSP: Bounded Buffer (I)

[bounded_buffer || producer || consumer]

producer ::

*[<produce item>

bounded_buffer ! item;

]

consumer ::

*[bounded_buffer ? item;

<consume item>

]

Geoff CoulsonLancaster University

System-Software WS 04/05

171

© P. Reali / M. Corti

Processes and ThreadsCSP: Bounded Buffer (II)

bounded_buffer ::

buffer: (0..9) item;in, out: integer;

in := 0; out := 0;*[

in < out+10; producer ? buffer(in mod 10)-> in := in + 1;

||out < in; consumer ! buffer(out mod 10)-> out := out + 1;

]

System-Software WS 04/05

172

© P. Reali / M. Corti

Processes and ThreadsProcess State

Process states1. Running: actually using the

CPU

2. Ready: waiting for a CPU

3. Blocked: unable to run, waiting for external event

– Process state transitions1. wait for external event

2. system scheduler

3. system scheduler

4. external event happens

Running

Blocked Ready

12

3

4

System-Software WS 04/05

173

© P. Reali / M. Corti

Processes and ThreadsProcess State (Active Oberon)

Active Oberon provides– monitor-like object

protection– conditions

Condition are checked by the system.

No explicit help or knowledge from user is required (no x.Signal)

Running

AwaitingObject

Ready

AwaitingCondition

System-Software WS 04/05174 © P. Reali / M. Corti

Activities Program (static concept) ≠ Process (dynamic) Processes, jobs, tasks, threads (differences later)

– program code– context:

program counter (PC) and registersstack pointerstate

– [new]– running– waiting– ready– [terminated]

– stack– data section (heap)

System-Software WS 04/05175 © P. Reali / M. Corti

Processes vs. Threads

Process or job (heavyweight)

– code– address space– processor state– private data

(stack+registers)

– can have multiple threads

Thread (lightweight)– shared code– shared address space– processor state– private data

(stack+registers)

CPU

Kernel

System-Software WS 04/05176 © P. Reali / M. Corti

Processes vs. Threads: Example

PROC 1

instr

instr

instr

PROC 2

instr

instr

instr

HEAP 1

STACK 1

HEAP 2

STACK 2

PROC

instr

instr

instr

HEAP

STACK 1 STACK 2

System-Software WS 04/05177 © P. Reali / M. Corti

Programmed events that can cause a task switch– protection (locks)

acquire release

– synchronization wait on a condition send a signal (send-and-pass)

System events that can cause a task switch– voluntary switch (“yield”, task termination)– process with higher priority becomes available– consumption of the allowed time quantum

syn

chro

nou

sa

syn

chro

nou

s

task preemption

Multitasking

System-Software WS 04/05178 © P. Reali / M. Corti

Preemption

Assign each process a time-quantum (normally in the order of tens of ms)

Asynchronous task switches can happen at any time!– task can be in the middle of a computation– save whole CPU state (registers, flags, ...)

Perform switch– on resource conflict– on synchronization request– on timer-interrupt (time-quantum is over)

System-Software WS 04/05179 © P. Reali / M. Corti

Context switch Scheduler invocation:

– preemption interrupt– cooperation explicit call

Operations:– store the process state (PC, regs, …)– choose the next process (strategy)– [accounting]– restore the state of the next process (regs, SP, PC, …)– jump to the restored PC

A context switch is usually expensive: 1–1000s depending on the system and number of processes

– hardware optimizations (e.g., multiple sets of registers – SPARC, DECSYSTEM-20)

System-Software WS 04/05180 © P. Reali / M. Corti

Scheduling algorithms

Three categories of environments: batch systems (e.g., VPP, DOS)

– usually non-preemptive (i.e., task is not stopped by scheduler, only synchronous switches)

interactive systems (UNIX, Windows, Mac OS)– cooperative or preemptive– no task allowed to have the CPU forever

real-time systems (PathWorks, RT Linux)– timing constraints (deadlines, periodicity)

System-Software WS 04/05181 © P. Reali / M. Corti

Scheduling Performance CPU utilization Throughput

– number of jobs per time unit– minimize context switch penalty

Turnaround time– = exit time - arrival time– execution, wait, I/O

Response time– = start time - request time

Waiting time (I/O, waiting, …) Fairness

System-Software WS 04/05182 © P. Reali / M. Corti

Scheduling algorithm goals

All systems– Fairness

give every task a chance

– Policy enforcement– Balance

keep all subsystems busy

Interactive systems– Response time

respond quickly

– Proportionality meet user’s expectations

Batch systems– Throughput

maximize number of jobs

– Turnaround time minimize time in system

– CPU utilization keep CPU busy

Real-time systems– Meet deadlines

avoid losing data

– Predictability avoid degradation

– Hard- vs. soft-real-time systems

System-Software WS 04/05183 © P. Reali / M. Corti

Batch Scheduling Algorithms

Choose task to run (task is usually not preempted) First Come First Serve (FCFS)

– fair, may cause long waiting times

Shortest Job First (SJF)– requires knowledge about job length

Longest Response Ratio– response ratio = (time in the system / CPU time)– depends on the waiting time

Highest Priority First– with or without preemption

Mixed– the priority is adjusted dynamically (time in queue, length, priority, …)

ETH-VPP is a batch system!

Which algorithm does it use?

System-Software WS 04/05184 © P. Reali / M. Corti

Time sharing– Each task has a predefined time quantum– Round-Robin

Schedule next task on the ready list

– Quantum choice:small: may cause frequent switchesbig: may cause slow response

– Implicit assumption: all task have same importance

P1

P4

P3

P2

next

next

Preemptive Scheduling Algorithms

System-Software WS 04/05185 © P. Reali / M. Corti

Preemptive Scheduling Algorithms

Priority scheduling– process with highest priority is scheduled first

Variants– multilevel queue scheduling

one list per priority, use round-robin on list– dynamic priorities

proportional to time in system inversely proportional to part of quantum used

– make time quantum proportional to priority

System-Software WS 04/05186 © P. Reali / M. Corti

Real-Time Scheduling Algorithms

Task needs to meet the deadline!

Task cost is known (should)

Two task kind:– aperiodic– periodic

Reservation– scheduler decides if

system has enough resources for the task

Algorithms:– Rate Monotonic

Schedulingassign static priorities (priority proportional to frequency)

– Earliest Deadline Firsttask with closest deadline is chosen

System-Software WS 04/05187 © P. Reali / M. Corti

Scheduling Algorithm Example Situation:

– Tasks P1, P2, P3, P4 Arrive at time t = 0 Priority: P1 highest, P4 lowest Time to process: 10, 2, 5, 3

System-Software WS 04/05188 © P. Reali / M. Corti

P1

P2

P3

P40 10 12 17 20

Scheduling Algorithm Example Highest Priority First

System-Software WS 04/05189 © P. Reali / M. Corti

P1

P2

P3

P40 2 20105

Scheduling Algorithm Example Shortest Job First

System-Software WS 04/05190 © P. Reali / M. Corti

P1

P2

P3

P40 2 4 6 8 10 12

1314 16 18 20

Scheduling Algorithm Example Timesharing with quantum = 2

System-Software WS 04/05191 © P. Reali / M. Corti

P1

P2

P3

P40 8 11 15 20

runningat 1/4

runningat 1/3

runningat 1/2

Scheduling Algorithm Example Timesharing with quantum 0

System-Software WS 04/05192 © P. Reali / M. Corti

Scheduling Algorithm Example: Results

Situation:– Tasks P1, P2, P3, P4

Arrive at time t = 0 Priority: P1 highest, P4 lowest Time to process: 10, 2, 5, 3

– Results turnaround response time Highest Priority First: 14.75 9.75 Shortest Job First: 9.25 4.25 Timesharing with Quantum = 2: 12.75 3.0 Timesharing with Quantum 0: 13.5 0

System-Software WS 04/05193 © P. Reali / M. Corti

Scheduling Examples UNIX

– preemption– 32 priority levels (round robin)– each second the priorities are recomputed (CPU usage,

nice level, last run)

BSD similar– every 4th tick priorities are recomputed (usage

estimation)

Windows NT– “real time” priorities: fixed, may run forever– variable: dynamic priorities, preemption– idle: last choice (swap manager)

System-Software WS 04/05194 © P. Reali / M. Corti

Scheduling Examples: Quantum & Priorities

Win2K:– quantum = 20ms (professional) 120ms (user),

configurable– depending on type (I/O bound)

BSD:– quantum = 100ms– priority = f(load,nice,timelast)

Linux:– quantum = quantum / 2 + priority– f(quantum, nice)

System-Software WS 04/05195 © P. Reali / M. Corti

Scheduling Problems Starvation

A task is never scheduled (although ready) “fairness”

DeadlockNo task is ready (nor it will ever become ready) detection+recovery or avoidance

System-Software WS 04/05196 © P. Reali / M. Corti

Coffman conditions for a deadlock (1971): Mutual exclusion Hold and wait No resource preemption Circular wait (cycle)

R1

R2

A holds R

A wants S

T1 T2

B holds S

B wants RT Thread

R Resource

Deadlock Conditions

System-Software WS 04/05197 © P. Reali / M. Corti

Deadlock Remedies

Coarser lock granularity: use a single lock for all resources (e.g., Linux 2.0-2.4 “Big

Kernel Lock”)

Locking order: resources are ordered resource locking according to the resource order (ticketing)

Two-phase-locking: try to acquire all the resources if successful, lock them; otherwise free them and try again

System-Software WS 04/05198 © P. Reali / M. Corti

Deadlock Detection, Prevention & Recovery

Deadlock detection: the system keeps a graph of locks and tries to detect cycles.– time consuming– the graph has to be kept consistent with the actual state

Deadlock prevention (avoidance): remove one of the four Coffman conditions cycles

Recovery:– kill processes and reclaim the resources– rollback: requires to save the states of the processes

regularly

System-Software WS 04/05199 © P. Reali / M. Corti

A

B

C

+S +T -S -T+T +R -T -R

+R +S -R -S

Simple Deadlock Scenario Example

– Resources R, S, T– Tasks A, B, C require { R, S }, { S, T }, { T, R } respectively

Case 1: Sequential execution, no deadlock

System-Software WS 04/05200 © P. Reali / M. Corti

A

B

C

+R+S

+T

+S+T

+R

Simple Deadlock Scenario Case 2: Interleaving, deadlock

C A

B

R

T S

System-Software WS 04/05201 © P. Reali / M. Corti

DC

A

F

B

E

G

R

S

W

T

U V

graphicalrepresentation

is this a case of deadlock?

Complex Deadlock Scenario Case with 6 resources and 7 tasks

System-Software WS 04/05202 © P. Reali / M. Corti

Locks

Blocks

Modules

Configuration

Memory

Interrupts

ThreadsTraps

TimersProcessors

Module Lock

Mod

ule

Hie

rarc

hy

Each Kernel Modulehas a lock to protectits data

When multiple locks areneeded, acquire themaccording to the module hierarchy

Deadlock Avoidance Strategy in Bluebottle

System-Software WS 04/05203 © P. Reali / M. Corti

Priority Inversion A high-priority task can be blocked by a lower

priority one. Example:

Low

running

High

ready

waiting

Medium

System-Software WS 04/05204 © P. Reali / M. Corti

Priority Inversion Big problem for RTOS Solutions

– priority inheritancelow-priority task holding resource inherits priority of high-priority task wanting the resource

– priority ceilings each resource has a priority corresponding to the highest priority

of the users +1 the priority of the resource is transferred to the locking process can be used instead of semaphores

System-Software WS 04/05205 © P. Reali / M. Corti

Example: Mars Pathfinder (1996–1998) VxWorks real-time system: preemptive, priorities Communication bus: shared resource (mutexes) Low priority task (short): meteorological data

gathering Medium priority task (long): communication High priority: bus manager

Detection: watchdog on bus activity system reset Fix: activate priority inheritance via an uploaded on-

the-fly patch (no memory protection).

System-Software WS 04/05206 © P. Reali / M. Corti

Locking on Multiprocessor Machines Real parallelism! Cannot “disable interrupts” like on single processor

machines (could stop every task, but not efficient) Software solutions

– Peterson, Dekker, ...

Hardware support– bus locking– atomic instructions

(Test And Set, Compare And Swap)

System-Software WS 04/05207 © P. Reali / M. Corti

Locking on multiprocessor machines Test And Set

TAS s:

IF s = 0 THEN

s := 1

ELSE

CC := TRUE

END

Compare and Swap (Intel)CAS R1, R2, A:

R1: expected value

R2: new value

A: address

IF R1 = M[A] THEN

M[A] := R2; CC := TRUE

ELSE

R1 := M[A]; CC := FALSE

END

These instructions are atomic even on multiprocessors!The usually do so by locking the data bus

System-Software WS 04/05208 © P. Reali / M. Corti

Counter s: available resources Binary Semaphores with TAS

Try TAS sJMP TryCS

Spinning(busy wait)

TAS sJMP QueuingCS

Blocking

Example: Semaphores on SMP

System-Software WS 04/05209 © P. Reali / M. Corti

Counter s: available resources Generic Semaphores with CAS

P(s)Enter CS

Load R1sTryP MOVE R1R2

DEC R2CAS R1, R2, sBNE TryPCMP R2, 0BN Queuing[CS]

[CS]Load R1s

TryV MOVE R1R2INC R2CAS R1, R2, sBNE TryVCMP R2, 0BNP Dequeuing

Exit CSV(s)

Example: Semaphores on SMP

P(S): { S := S - 1}IF S < 0 THEN

jump queuingEND

V(S): { S := S + 1}IF S <= 0 THEN

jump dequeuingEND

System-Software WS 04/05210 © P. Reali / M. Corti

PROCEDURE AcquireSpinTimeout(VAR locked: BOOLEAN);CODE {SYSTEM.i386}MOV EBX, locked[EBP] ; EBX := ADR(locked)MOV AL, 1 ; AL := 1CLI ; switch interrupts off before

; acquiring lock

test:XCHG [EBX], AL ; set and read the lock

; atomically. ; LOCK prefix implicit.

CMP AL, 1 ; was locked?JE test ; retry

..

END AcquireSpinTimeout;simplified

version

Spin-Locks: the Bluebottle/i386 way

System-Software WS 04/05211 © P. Reali / M. Corti

Z = OBJECTVAR myT: T; I: INTEGER;

PROCEDURE & NEW (t: T);BEGIN myT := tEND NEW;

PROCEDURE P (u: U; VAR v: V);BEGIN { EXCLUSIVE } i := 1END P;

BEGIN { ACTIVE } BEGIN { EXCLUSIVE }

AWAIT (i > 0); ENDEND Z;

Condition

State

Object Activity

Method

Initializer

Mutual Exclusion

Active Objects in Active Oberon

System-Software WS 04/05212 © P. Reali / M. Corti

Ready Queue

Running

AwaitingAssertion

AwaitingObject

Ready

NIL

Ready

CPUs

1

Lock

Queue

Wait

Queue

2

Active Oberon Runtime Structures

System-Software WS 04/05213 © P. Reali / M. Corti

Running

AwaitingAssertion

AwaitingObject

Ready

NIL

NIL

1

2 3

4 5

0

7

6

END Run next ready

Preempt Set to ready; Run next ready

6

7

1

1

NEW Create object; Create process; Set to ready

0

Active Oberon Implementation

System-Software WS 04/05214 © P. Reali / M. Corti

Enter Monitor IF monitor lock set THEN Put me in monitor obj wait list; Run next ready ELSE set monitor lock END

Exit Monitor Find first asserted x in wait list; IF x found THEN set x to ready ELSE Find first x in obj wait list; IF x found THEN set x to ready ELSE clear monitor lock END END Run next ready

4

5

1

1

2

Running

AwaitingAssertion

AwaitingObject

Ready

NIL

NIL

1

2 3

4 5

0

7

6

Active Oberon Implementation

System-Software WS 04/05215 © P. Reali / M. Corti

Running

AwaitingAssertion

AwaitingObject

Ready

NIL

NIL

1

2 3

4 5

0

7

6AWAIT Put me in monitor assn wait list; Call Exit monitor

3

Active Oberon Implementation

System-Software WS 04/05216 © P. Reali / M. Corti

p

pq < p

end of quantum

Case Study: Windows CE 3.0 Real-time constraints

– Reaction time on events– Execution time

Threads with priorities and time quanta– Priorities: 0 (high), …, 255 (low)– Time quanta in ms

Default 100 ms 0 no quantum

Single processor

System-Software WS 04/05217 © P. Reali / M. Corti

IST

ISR

EventEvent IRQ

NK.EXE

Kernel Modus

User Modus

Case Study: Windows CE 3.0 Interrupt Handling

– ISR (Interrupt Service Routine) 1st level handling Kernel mode, uses kernel stack Installed at boot-time Creates event on-demand Preempted by ISR with higher priority

– IST (Interrupt Service Thread) 2nd level handling User mode Awaits events

System-Software WS 04/05218 © P. Reali / M. Corti

[

[

][ ]

]

CS

Case Study: Windows CE 3.0 Synchronization on common resources:

– Critical sections: enter, leave operations– Semaphores and mutexes (binary semaphores)

Synchronization is performed with system/library calls (they are not part of a language).

Priority inversion avoidance– priority inheritance (thread inherits priority of task wanting

the resource)

System-Software WS 04/05219 © P. Reali / M. Corti

Case Study: Java Activities are mapped to threads (no processes) Synchronization in the language

– locks– signals

Threads provided by the library Scheduling depends on the JVM

System-Software WS 04/05220 © P. Reali / M. Corti

Case Study: Java

public class MyThread() extends Thread {

public void run() { System.out.println("Running"); }

public static void main(String [] arguments) { MyThread t = (new MyStread()).start(); }

}

System-Software WS 04/05221 © P. Reali / M. Corti

Case Study: Java

public class MyThread() implements Runnable {

public void run() { System.out.println("Running"); }

public static void main(String [] arguments) { Thread t = (new Thread(this)).start(); }

}

System-Software WS 04/05222 © P. Reali / M. Corti

Case Study: Java Protection with monitor-like objects

– with method granularitypublic synchronized void someMethod()

– with statement granularitysynchronized(anObject) { ... }

Synchronization with signals– wait() (with optional time-out)– notify() / notifyAll() (“send and continue” pattern)

System-Software WS 04/05223 © P. Reali / M. Corti

Case Study: Java

private Object o;

public synchronized consume() { while (o == null) { try { wait(); } catch (InterruptedException e) {} } use(o); o = null; notifyAll();}public synchronized void produce(Object p) { while (o != null) { try { wait(); } catch (InterruptedException e) {} } o = p; notifyAll();}

System-Software WS 04/05224 © P. Reali / M. Corti

Case Study: POSIX Threads Standard interface for threads in C Mostly UNIX, possible on Windows Provided by a library (libpthread) and not part of the

language. IEEE POSIX 1003.1c standard (1995) Various implementations (both user and kernel

level)

System-Software WS 04/05225 © P. Reali / M. Corti

Case Study: POSIX Threads

#include <pthread.h>

pthread_mutex_t m;

void *run(){ pthread_mutex_lock(&m); // critical section pthread_mutex_unlock(&m); pthread_exit(NULL);}

int main (int argc, char *argv[]){ pthread_t t; pthread_create(&t, NULL, run,NULL); pthread_exit(NULL);}

File Systems

System-Software WS 04/05227 © P. Reali / M. Corti

File Systems - Overview Hardware File abstraction File organization File systems

– Oberon– Unix– FAT

Distributed file systems– NFS– AFS

Special topics– Error recovery– ISAM– B* Trees

System-Software WS 04/05228 © P. Reali / M. Corti

Hardware: the ATA Bus ATA / IDE (1986)

– Advanced Technology Attachment

– Integrated Drive Electronics ATA-2 / EIDE ATA-4 / ATAPI

– ATA Packet Interface(SCSI command set)

ATA-5– UDMA 66

ATA-6– UDMA 100– SATA

ATA-7– UDMA 133

bus with 2 devices– master / slave

low-level interface– head / cylinder / sector– support for LBA

(logical block addressing)

PIO mode– read byte by byte through

hardware port

DMA mode– use DMA transfer

System-Software WS 04/05229 © P. Reali / M. Corti

Hardware: the SCSI Bus SCSI: Small Computer

Systems Interface SCSI-2

– Fast SCSI– Wide SCSI

SCSI-3

Bus with 8 devices– wide: 16 / 32 devices– bus arbitration– disconnected mode

Device kinds– direct access– CD-ROM– ...

Block-oriented access– read-block, write-block

Transfer mode selection– asynchronous (hand-shake)– synchronous (period / offset)

System-Software WS 04/05230 © P. Reali / M. Corti

surf

ace

(h

ead

)rotationaxis

track (cylinder)sector

Hardware: Hard Disk Organization

– cylinder (c)– head (h)– sector (s)

Addressing– sector (c, h, s)– block (LBA)

System-Software WS 04/05231 © P. Reali / M. Corti

Hardware: Example

Current disk example: ATA-100 250GB 512 bytes per sector (488·106 sectors) 8MB cache 8.9ms average seek time 7200 rpm

System-Software WS 04/05232 © P. Reali / M. Corti

1

2

3

4

5

67

cylinder

Hardware: Hard Disk Improvements Interleaving

optimize sequential sector access

Read-ahead Caching Sector defect management

System-Software WS 04/05233 © P. Reali / M. Corti

Hardware: Disk Scheduling Disk controllers have a queue of pending requests:

– type: read or write– block number: translated into the (h,c,s)-tuple– memory address (where to copy from and to)– amount to be transferred (byte or block count)

System-Software WS 04/05234 © P. Reali / M. Corti

Hardware: Disk Scheduling

First-come, first-served (FCFS)

Shortest-seek-time-first (SSTF)

SCAN (elevator) &C-SCAN

LOOK &C-LOOK

Performance: minimize head movements, maximize throughput

Scheduling is now in the hardware

System-Software WS 04/05235 © P. Reali / M. Corti

Hardware: Disk Scheduling Example (head position, track number):

queue = 31, 72, 4, 18, 147, 193, 199, 153, 114, 72

System-Software WS 04/05236 © P. Reali / M. Corti

Hardware: Disk Scheduling

System-Software WS 04/05237 © P. Reali / M. Corti

Abstractions

Block: array of sectors some systems call

them “clusters” user configured reduces address space increases access

speed causes internal

fragmentation

Disk: array of sectors

File: stream of bytes sequential access random access stored on disk

– mapping byte to block– block allocation

management

System-Software WS 04/05238 © P. Reali / M. Corti

Disk

ReadSector, WriteSector

Volume

ReadBlock, WriteBlockAllocateBlock, FreeBlock

File System

OpenFile, WriteFile, ReadFile, SeekFile, CloseFile

Abstractions Implementations

ATA driver

SCSI driver

FAT

Oberon

ISO 9660

Abstraction Layers

ext3

NTFS

System-Software WS 04/05239 © P. Reali / M. Corti

File Organization How can we map groups of blocks into files? How do we manage free space? How can I jump to a certain location?

Operation: read n bytes at position p.

System-Software WS 04/05240 © P. Reali / M. Corti

File Organization: Contiguous Allocation

File is a group of contiguous blocks Simple management Fast transfers IBM MVS (mainframe)

start length

System-Software WS 04/05241 © P. Reali / M. Corti

File Organization: Contiguous Allocation

external fragmentation allocation

– how much space does a file need?– first fit, best fit, …?

file growth (error? move? extensions?) preallocation: internal fragmentation

start length

System-Software WS 04/05242 © P. Reali / M. Corti

File Organization: Linked Allocation File is a linked list of blocks

– no external fragmentation– no growth problems

Problems– sequential files only (positioning requires traversal)– space for pointers (1TB, 5B addr., 1% with 512B blocks)– reliability (lost pointers)

start

System-Software WS 04/05243 © P. Reali / M. Corti

File Organization: Linked Allocation Clusters: series of contiguous blocks

– faster (less jumps)– less space wasted for pointers– internal fragmentation

start

System-Software WS 04/05244 © P. Reali / M. Corti

File Organization: Linked Allocation Pointer tables

– the list of pointers is stored in a separate table– can be cached– usually is stored twice (reliability)– FAT (MS-DOS, OS/2, Windows, solid-state memory)

start

System-Software WS 04/05245 © P. Reali / M. Corti

File Organization: Indexed Allocation Index with block addresses Fast access for random-access files No external fragmentation Problems

– high management overhead– limited file size (depending on the index structure)– pointer overhead

file

System-Software WS 04/05246 © P. Reali / M. Corti

File Organization: Indexed Allocation Variation:

– linked list of indexes

Advantage:– no file size limitation

Disadvantage:– Index lookup requires sequential traversal of index list

file

System-Software WS 04/05247 © P. Reali / M. Corti

File Organization: Indexed Allocation multi-level indexes

(index of indexes) UNIX

Advantage:– fast index lookup

Disadvantage:– limited file size

file

System-Software WS 04/05248 © P. Reali / M. Corti

File Organization: Indexed Allocation

Example: blocks 2KB address 4B

First level index blocks:512 entries · 2KB = 1MB

Second level index block:512 entries · 2KB = 0.5GB

file

System-Software WS 04/05249 © P. Reali / M. Corti

Free Space Management Bitmap (e.g., HFS)

– bit vector to mark free blocks– simple– needs caching

Linked lists– list of free blocks (similar to linked allocation)

Grouping– free blocks contain n address of free blocks (similar to

multilevel indexing)

Counting– list of 2-tuples of series of free blocks (start, length)

System-Software WS 04/05250 © P. Reali / M. Corti

Case Study: Oberon File System Disk module: controller driver

– block management FileDir module:

– maps files to locations– implemented with B-trees– garbage collection (files)

the directory is the root set anonymous (nonregistered) files are collected

Files module:– allows user operations (read, create, write,

…)– access is performed through riders

Files

FileDir

Disk

System-Software WS 04/05251 © P. Reali / M. Corti

Characteristics Block size = 1KB File organization

– multilevel index: 64 direct 12 1st level indirect

– 672 data bytes in file header

Block allocation– allocation table created at boot-time (partition GC)– no collection at run-time (partition fills up!)

designed to optimize

small files

Case Study: Oberon File System

System-Software WS 04/05252 © P. Reali / M. Corti

d

01

63

75

d

(672B)(672B)

ddd

dddd

dddd

dddd

ddi1d

i2

i1

12 index blocks with 256data blocks each

64 blocks

Case Study: Oberon File System Block = 1KB

System-Software WS 04/05253 © P. Reali / M. Corti

Free block management: bitmap

Garbage collection at startup

Case Study: Oberon File System

11111111111111111111111111111111

11010010011110111101110100011100

11010010011110110001110100011100

11010010011110110000110100011100

startup / GC

allocate 16,17

allocate 19

0 8 16 24

0 8 16 24

0 8 16 24

0 8 16 24

System-Software WS 04/05254 © P. Reali / M. Corti

Rf

f

File Handle

R

Buffer

R

Rider

“Hint”

Case Study: Oberon File System

Internals “Rider”: current read

or write position Buffer (cache) for

consistency (each filesees the write operationson it)

System-Software WS 04/05255 © P. Reali / M. Corti

Case Study: Oberon RAM Disk

File = POINTER TO Header;Index = POINTER TO Sector;

Rider = RECORDeof: BOOLEAN;file: File;pos: LONGINT;adr: LONGINT;

END;

Header = RECORDmark: LONGINT;name: FileDir.Name;len, time, date: LONGINText: ARRAY 12 OF Index;sec: ARRAY 64 OF SectorTable;

END;

exttable

primarysectortable

header

points tosectors 0 - 63

indexsector 0

points tosectors64 - 319

indexsector 1

points tosectors320 - 575

System-Software WS 04/05256 © P. Reali / M. Corti

Case Study: Oberon RAM Disk

PROCEDURE Read(VAR r: Rider; VAR x: SYSTEM.BYTE);VAR m: INTEGER;

BEGINIF r.pos < r.file.len THEN SYSTEM.GET(r.adr, x); INC(r.adr); INC(r.pos);IF r.adr MOD SS = 0 THEN (*end of sector *)m := SHORT(r.pos DIV SS);IF m < STS THEN

r.adr := r.file.sec[m]ELSE

r.adr := r.file.ext[(m-STS) DIV XS].x[(m-STS) MOD XS]END

ENDELSE x := 0X; r.eof := TRUEEND

END Read;

SS = Sector SizeSTS = Sector Table SizeXS = Index Size

System-Software WS 04/05257 © P. Reali / M. Corti

Case Study: Oberon RAM Disk

PROCEDURE Write(VAR r: Rider; x: SYSTEM.BYTE);VAR k, m, n: INTEGER; ix: LONGINT;

BEGINIF r.pos < r.file.len THENm := SHORT(r.pos DIV SS); INC(r.pos);IF m < STS THEN

r.adr := r.file.sec[m]ELSE

r.adr := r.file.ext[(m-STS) DIV XS].x[(m-STS) MOD XS]END

ELSE....

END;SYSTEM.PUT(r.adr, x); INC(r.adr);

END Write;

overwrite

System-Software WS 04/05258 © P. Reali / M. Corti

Case Study: Oberon RAM DiskIF r.pos < r.file.len THEN ....ELSE IF r.adr MOD SS = 0 THEN m := SHORT(r.pos DIV SS); IF m < STS THEN Kernel.AllocSector(0, r.adr); r.file.sec[m] := r.adr ELSE n := (m-STS) DIV XS; k := (m-STS) MOD XS; IF k = 0 THEN Kernel.AllocSector(0, ix); r.file.ext[n] := SYSTEM.VAL(Index, ix) END; Kernel.AllocSector(0, r.adr); r.file.ext[n].x[k] := r.adr END; INC(r.pos); r.file.len := r.pos END; SYSTEM.PUT(r.adr, x); INC(r.adr);

expand

System-Software WS 04/05259 © P. Reali / M. Corti

Case Study: UNIX, inodes

Inode: file owner file type

– regular / directory / special

access permissions access time reference count (links) table of contents file size

Inode table of contents 10 (12) direct blocks 1 indirect block 1 double indirect block 1 triple indirect block

File system: files and directories (files with a special content) A file is represented by an inode

System-Software WS 04/05260 © P. Reali / M. Corti

ddi3i3 i2i2

i2i2

i1i1

i1i1

i1i1

Case Study: UNIX, inodes

01

101112

i3 i2 i1 d

info

inode

typeaccess

refc

i2 i1

i1

ddd

ddd

ddd

System-Software WS 04/05261 © P. Reali / M. Corti

Case Study: UNIX, directories Directories are normal files with a special content. The data part contains a list with

– inode– name

Every directory has two special entries– . the directory itself– .. the parent directory

System-Software WS 04/05262 © P. Reali / M. Corti

Case Study: UNIX, inodes

type: dirblocks: 132owner: rootref count: 1

inode 2

/2 .2 ..4 bin3 root

block 132

type: dirblocks: 406owner: rootref count: 1

inode 3

/root/3 .2 ..5 .tcshrc6 mbox

block 406

type: fileblocks: 42, 103owner: rootref count: 1

inode 6

datablock 42

data

block 103

inode # name

inodes

disk block

System-Software WS 04/05263 © P. Reali / M. Corti

Case Study: UNIX, soft and hard links Hard links:

– two directories entries with the same inode number– each file has a reference counter

42 file42 hardlink

Soft links– the directory entry points to a special file with the path of

the linked file

42 file43 softlink

(inode 43 points to a special file with the path of file)

System-Software WS 04/05264 © P. Reali / M. Corti

Case Study: UNIX, hard links

type: dirblocks: 132owner: rootref count: 1

inode 2

/2 .2 ..4 bin3 root

block 132

type: dirblocks: 406owner: rootref count: 1

inode 3

/root/3 .2 ..5 mails5 mbox

block 406

type: fileblocks: 42, 103owner: rootref count: 2

inode 5

datablock 42

data

block 103

inodes

disk block

System-Software WS 04/05265 © P. Reali / M. Corti

Case Study: UNIX, soft links

type: dirblocks: 132owner: rootref count: 1

inode 2

/2 .2 ..4 bin3 root

block 132

type: dirblocks: 406owner: rootref count: 1

inode 3

/root/3 .2 ..5 mbox6 mails

block 406

type: fileblocks: 42owner: rootref count: 1

inode 5datablock 42

type: fileblocks: 43owner: rootref count: 1

inode 6/root/mboxblock 43

System-Software WS 04/05266 © P. Reali / M. Corti

Case Study: UNIX, Volume Layout

A volume (partition) contains boot block

– bootstrap code

super block– size– max file– free space– …

inodes data blocks

bootblock

superblock inode list data blocks

System-Software WS 04/05267 © P. Reali / M. Corti

Case Study: UNIX, Functions

Core functions breadread block bwrite write block

iget get inode from disk iput put inode to disk bmapmap (inode, offset) to disk block namei convert path name to inode

System-Software WS 04/05268 © P. Reali / M. Corti

Case Study: UNIX, namei

namei (path)

if (absolute path)inode = root;

elseinode = current directory inode;

while (more path to process) {read directory (inode);if match(directory, name component) {inode = directory[name component];iget(inode);

} else {return no inode;

}}

return inode;

System-Software WS 04/05269 © P. Reali / M. Corti

FATnn: nn corresponds to the FAT size in bits FAT12, FAT16, FAT32 used by MS-DOS and

Windows for disks and floppies Volume Layout

bootblock

FAT1 FAT2 rootdirectory

data

Case Study: FAT

System-Software WS 04/05270 © P. Reali / M. Corti

Case Study: FAT, Example0

1

2 EOF

3 EOF

4 12

5 FREE

6 9

7 BAD

8 3

9 11

10 EOF

11 10

12 EOF

13 FREE

6 9 11 10

4 12

38

File 1:

File 2:

File 3:

disksize

System-Software WS 04/05271 © P. Reali / M. Corti

Case Study: FAT, Directory Information about files is kept in the directory

File name (8)

Extension (3)

A D V S H R

Reserved (10)

Time (2)

Date (2)

First block (2)

File size (4)

System-Software WS 04/05272 © P. Reali / M. Corti

Case Study: FAT, Max. Partition Size

Block size FAT-12 FAT-16 FAT-32

0.5 KB 2 MB

1 KB 4 MB

2 KB 8 MB 128 MB

4 KB 16 MB 256 MB 1 TB

8 KB 512 MB 2 TB

16 KB 1024 MB 2 TB

32 KB 2048 MB 2 TB

System-Software WS 04/05273 © P. Reali / M. Corti

File System Mounting More than one volume mounted in the same

directory tree.

/ usr

mnt

floppy

dos

cd

home corti

bin

afs ethz.ch

System-Software WS 04/05274 © P. Reali / M. Corti

Virtual File System Support for several file systems

– disk based– network– special

VFS: unifies the system calls Mirrors the traditional UNIX file system model

Applications

ext3 FAT NFS AFS proc pts

ext3 FAT NFS AFS proc ptsVFS

System-Software WS 04/05275 © P. Reali / M. Corti

File System Mounting Each file system type has a method table System calls are indirect function calls through the method

table Common interface (open, write, readdir, lock, …) Each file is associated with a the method table

System-Software WS 04/05276 © P. Reali / M. Corti

File System Mounting: Special Files Devices

– disks– memory– USB devices– serial ports– …

Kernel communication (e.g., proc) Uniform interface (open, close, read, write) Uniform protection (user, groups)

System-Software WS 04/05277 © P. Reali / M. Corti

File Systems: Protection Restrict: access (who), operations (what),

management– FAT: flags in the directory

e.g., read only execution based on name

– UNIX: restrictions in inodes based on users and groups operations: read, write, execute directories: manage files not so flexible

– VMS: access lists list of users and rights per file

Distributed File Systems

System-Software WS 04/05279 © P. Reali / M. Corti

Distributed File Systems (DFS) Clients, servers and storage are dispersed among

machines in a distributed system.

Client Client

ClientClient

Client

Server

ServerServer

Client

Server

System-Software WS 04/05280 © P. Reali / M. Corti

Overview

Naming (dynamic): location

transparency: file name does not reveal the file location

location independence: file name does not change when storage is moved

Caching (efficiency) write-through delayed-write write-on-close

Consistency client-initiated: poll

server for changes server-initiated: notify

clients

System-Software WS 04/05281 © P. Reali / M. Corti

Naming Simple approaches:

– file is identified by a host, path pair– Ibis (host:path)– SMB (\\host\path)

Transparent– remote directory are mounted in the local file system– not uniform (the mount point is not defined)– NFS (/mnt/home, /home/)– SMB (\\host\path mounted on Z:)

Global name structure– uniform and transparent naming– AFS (/afs/cell/path)

System-Software WS 04/05282 © P. Reali / M. Corti

Caching Reduces network and disk load Consistency problems Granularity:

– How much? Big/small chunks of data? Entire files?– Big: +hit ratio, +hit penalty, +consistency problems

Location:– memory: +diskless stations, +speed– disk: +cheaper, +persistent– hybrid

Space consumption on the clients

System-Software WS 04/05283 © P. Reali / M. Corti

Caching

Policies: write-through: +reliability, -performance (cache is

effective only for read operations) delayed-write: +write speed, +unnecessary writes

eliminated, -reliability– write when the cache is full (+performance, -long time in

the cache)– regular intervals

write-on-close

System-Software WS 04/05284 © P. Reali / M. Corti

Consistency Is my cached copy up-to-date? Client-initiated approach:

– the client performs validity checks– when? open/fixed intervals/every access

Server-initiated approach:– the server keeps track of cached files (parts)– notifies the clients when conflicts are detected– should the server allow conflicts?

System-Software WS 04/05285 © P. Reali / M. Corti

Stateless and Stateful Servers

Stateful: the server keeps track of each accessed file session IDs (e.g., identifying an inode on the server) fast

– simple requests– caches– fewer disk accesses– read ahead

volatile– server crash: rebuild structures (recovery protocol)– client crash: orphan detection and elimination

System-Software WS 04/05286 © P. Reali / M. Corti

Stateless and Stateful Servers

Stateless: each request is self-contained request: file and position complex requests need for uniform low-level naming scheme (to avoid

name translations) need idempotent operations (same results if

repeated)– absolute byte counts

No locking possible

System-Software WS 04/05287 © P. Reali / M. Corti

File Replication A file can be present on failure independent

machines Naming scheme manages the mapping

– same high-level name– different low-level names

Transparency Consistency

System-Software WS 04/05288 © P. Reali / M. Corti

Distributed File-Systems (mainstream) NFS: Network File System (Sun) AFS: Andrew File System (CMU) SMB: Server Message Block (Microsoft) NCFS: Network Computer FS (Oberon)

System-Software WS 04/05289 © P. Reali / M. Corti

Network File System (NFS) UNIX - based (Sun) mount file system from

another machine into local directory

stateless (no open/close)

uses UDP to communicate

based on RPC and XDR (External Data Representation)– every operation is a

remote procedure call

known problems:– no caching– no disconnected mode– efficiency

security: IP based

System-Software WS 04/05290 © P. Reali / M. Corti

NFS: Example

/home

corti

reali

etc

server

exports

/home/ client(rw)

mount -t nfs server:/home /home client

/home

etc

/home

corti

reali

etc

System-Software WS 04/05291 © P. Reali / M. Corti

NFS No special servers (each machine can act as a

server and as a client) Cascading mounts are allowed

– mount -t nfs server1:/home /home– mount -t nfs server2:/projects/corti /home/corti/projects

Limited scalability (limited number of exports)

System-Software WS 04/05292 © P. Reali / M. Corti

NFS: Stateless Protocol Each request contains a unique file identifier and an

absolute offset No concurrency control (locking has to be

performed by the applications) Committed information is assumed to be on disk

(the server cannot cache writes)

System-Software WS 04/05293 © P. Reali / M. Corti

Network File System (NFS)

Virtual file system layer

System call layer

Local filesystem

NFS client

RPC / XDR

Virtual file system layer

Local filesystem

NFS server

RPC / XDR

network (UDP)

System-Software WS 04/05294 © P. Reali / M. Corti

12 34

0 1

1234

Big-endian: MSB before LSB• IBM, Motorola, SPARC

Little-endian: LSB before MSB•VAX, Intel

network byte-

ordering

little end first

Remote Procedure Invocation: Overview

Problem– send structured information

from A to B– A and B may have different

memory layouts– byte order problems

– How is 0x1234 (2 bytes) represented in memory?

System-Software WS 04/05295 © P. Reali / M. Corti

Marshalling / SerializationMarshalling: packing one or

more data items into a buffer using a standard representation

Presentation layer (OSI)

RPC + XDR (Sun)– RFC 1014, June 1987– RFC 1057, June 1988

IIOP / CORBA (OMG)– V2.0, February 1997– V3.0, August 2002

SOAP / XML (W3C)– V1.1, May 2000

XDR Type System [unsigned] integer (32-bit) [unsigned] hyper-integer

(64-bit) enumeration (unsigned int) boolean (enum) float / double (IEEE 32/64-

bit) opaque string array (fix + variable size) structure union void

System-Software WS 04/05296 © P. Reali / M. Corti

Client

procedure P(a, b, c)•pack parameters•send message toserver

•await response

•unpack response

Server

Server

•unpack parameters• find procedure• invoke•pack response•send response

P(a, b, c)

RPC Protocol Remote procedure call Marshalling of procedure

parameters

Message format Authentication Naming

System-Software WS 04/05297 © P. Reali / M. Corti

NFS

Client Server

lookup lookup

read read

write write

RPC - protocol

System-Software WS 04/05298 © P. Reali / M. Corti

NFS Efficiency Stateless protocols are inherently slow

– e.g., directory lookup

Caching:– file blocks (data)– file attributes (inodes)

– read-ahead– delayed write

– tradeoff between speed and consistency

It is possible that two machines see different data

System-Software WS 04/05299 © P. Reali / M. Corti

NFS: Security Exports based on IP addresses

– low security– low granularity

Data is not encrypted Permissions based on user and group ID

– uniform naming needed (e.g., NIS)

System-Software WS 04/05300 © P. Reali / M. Corti

Andrew File System (AFS) 1983 CMU (later IBM, now open source) Scalable (>5000 workstations):

– network divided in clusters (cells)

Client/user mobility (files are accessible from everywhere)

Security: encrypted communication (Kerberos) Protection: control access lists Heterogeneity: clear interface to the server

System-Software WS 04/05301 © P. Reali / M. Corti

Andrew File System (AFS) server provides a cell world-wide addressing

scheme (name cell) client caches a whole

file server-synchronization

on file open and close

AFS is efficient low network overhead stateful: consistency is

implemented with callbacks

callback = client is in synch with server

on store, server changes the callbacks

System-Software WS 04/05302 © P. Reali / M. Corti

AFS: Logical View

/

afs

dir dir

vol

bin

binusr

Shared Space

PrivateSpace

f

Volume

Mount Point

System-Software WS 04/05303 © P. Reali / M. Corti

AFS: Physical View

ethz.ch

epfl.ch

cmu.edu

clientsever

cell

network

System-Software WS 04/05304 © P. Reali / M. Corti

AFS

Client Server

open open

RPC - protocol

close

Cache

read

write

close

System-Software WS 04/05305 © P. Reali / M. Corti

AFS: Consistency Interaction only when opening and closing files. Writes are not visible on other machines before a

close. Clients assume that cached files are up-to-date. Servers keep track of caching by the clients

(callbacks)– clients are notified in case of changes

System-Software WS 04/05306 © P. Reali / M. Corti

AFS: Kerberos Kerberos (Cerberos: three-headed dog guarding the

Hades)– authentication– accounting– audit

Needham-Schroeder shared key protocol Distributed AFS: communication is encrypted

System-Software WS 04/05307 © P. Reali / M. Corti

AFS: Protection

Access lists:%> fs listacl thesis

Access list for thesis is

Normal rights:

system:anyuser l

trg rlidwk

corti rlidwka It’s possible to allow (or deny) access to users or

customized groups Restriction on: read, write, lookup, insert,

administer, lock and delete. Supports UNIX control bits.

System-Software WS 04/05308 © P. Reali / M. Corti

The Eight Fallacies of Distributed Computing (Peter Deutsch)

Network Fallacies

The network is reliable Latency is zero Bandwidth is infinite The network is secure

The network topology doesn’t change

There is one administrator Transport cost is zero The network is

homogeneous

System-Software WS 04/05309 © P. Reali / M. Corti

General Principles (Satyanarayan)

From DFSs we learned the following lessons: we should try to move computations to the clients use caching whenever possible special files (e.g., temporary) can be specially

treated. make scalable systems. trust the fewest possible entities batch work if possible

Kernel Structure

System-Software WS 04/05311 © P. Reali / M. Corti

Introduction Kernel performs “dangerous” operations

– page table mapping– scheduling

Kernel must be protected against malign user code– access to other processes’ data– increasing own processes’ priority

Kernel must have more rights than user code Solution:

– distinguish between kernel mode and user mode– access kernel through system calls– the system calls define the interface to the kernel

System-Software WS 04/05312 © P. Reali / M. Corti

application application applicationsystem calls

Kernel Protection

application application application

driversmemorymanager

filesystems

System-Software WS 04/05313 © P. Reali / M. Corti

Kernel Protection

Means: hardware support

– privileged instructions– supervisor mode

separate address spaces– user process has no access to kernel structures

access memory / functions through symbolic names– user has no access to hardware

System-Software WS 04/05314 © P. Reali / M. Corti

Kernel Protection Privileged instructions in user mode generate a trap Mode switch:

– interrupts– gated calls (user generated sw interrupt calls)

Parameters:– stack– registers

Examples:– Intel x86: 4 protection levels (code/segment attribute),

interrupt– PowerPC: 2 levels (CPU attribute), special instruction

System-Software WS 04/05315 © P. Reali / M. Corti

Linux System Calls (Intel) System calls are wrapped in libraries (e.g., libc) The library function

– writes the parameters in registers (5)– writes the parameters on the stack (>5)– writes the system call number in EAX– calls int 0x80

The kernel– jumps to the corresponding function in sys_call_table

System-Software WS 04/05316 © P. Reali / M. Corti

Linux System CallsExamples: pid_t fork(void): creates a child process ssize_t write(int fd, const void *buf, size_t count): writes count bytes from buf to fd

int kill(pid_t pid, int sig): send signal to a process

int gettimeofday(struct timeval *tv, struct timezone *tz): gets the current time

int open(const char *pathname, int flags): opens a file

int ioctl(int d, int request, ...): manipulates special devices

System-Software WS 04/05317 © P. Reali / M. Corti

Windows System Calls Layered system: system

call must be performed by a wrapper (NTDLL.DLL).

The system call position in the KiSystemServiceTable is not known (depends on the build)

call WriteFile()

KiSystemServiceTable

NtWriteFile()

application

KERNEL32.DLL

…int 0x2e

NTDLL.DLL

System-Software WS 04/05318 © P. Reali / M. Corti

Kernel Design: API vs. System Calls

Linux system-calls are clearly

specified (POSIX standard) system-calls do not change about 100 calls

Windows system-calls are hidden only Win32 API is published Win32 is standard “thousands” of API calls,

still growing some API calls are handled

in user space More than one API:

– POSIX– OS/2

System-Software WS 04/05319 © P. Reali / M. Corti

Protection and SMP What happens when two process (on two CPUs)

enter in kernel mode?– Big kernel lock: not allowed (OpenBSD, NetBSD)– Fine grained locks in the kernel (FreeBSD 5, Linux 2.6)

CPU 1 CPU 2

proc1:

int 0x80

proc1:

int 0x80

System-Software WS 04/05320 © P. Reali / M. Corti

Kernel Structure monolithic kernel

– big mess, no structure, one big block, fast– MS-DOS (no protection), original UNIX– micro-kernel (AIX, OS X)

layered system– layern uses functions from layern-1

– OS/2 (some degree of layering)

virtual machine– define artificial environment for programs

client-server – tiny communication microkernel to access various

services

System-Software WS 04/05321 © P. Reali / M. Corti

Monolithic Kernels

terminal controllersdevice drivers

memory controllers

schedulersignal handling

file systemswapping

virtual memory

user-levelapplications

terminal controllersdevice drivers

memory controllers

schedulersignal handling

file systemswapping

virtual memory

user-levelapplications

Monolithic Micro-kernel

System-Software WS 04/05322 © P. Reali / M. Corti

Layered Systems THE operating system A layer uses only functions from

below What goes where? Less efficient

user programs

buffering I/O

console drivers

memory management

CPU scheduling

hardware

System-Software WS 04/05323 © P. Reali / M. Corti

Virtual Machines VM operating system (IBM) slow and difficult to implement complete protection no sharing of resources useful for development and

research compatibility

hardware

virtual machine

procs procs procs

System-Software WS 04/05324 © P. Reali / M. Corti

Design: Kernel or User Space?

Big monolithic kernel: fast (less switches) less protection

Examples: HTTP server in the Linux

kernel. graphic routines in

Windows

Modular and micro-kernels: structured more separation move code to user space less efficient more secure

Example: user level drivers

System-Software WS 04/05325 © P. Reali / M. Corti

Virtual Machines Machine specification in

software– instruction set– memory layout– virtual devices– ....

JVM (Java Virtual Machine) .NET / Mono VMWare

– specified machine is a whole PC

– allows multiple PC environments on same machine

IBM VM/370

Case Study: JVM

System-Software WS 04/05327 © P. Reali / M. Corti

Reality is somewhat fuzzy!

Is a Pentium-II a machine?

Hardware and software are

logically equivalent

(A. Tanenbaum)

RISCCore

instructions

decoderOp1

Op2

Op3

Virtual Machines

What is a machine? does something (...useful) programmable concrete (hardware)

What is a virtual machine? a machine that is not

concrete a software emulation of a

physical computing environment

System-Software WS 04/05328 © P. Reali / M. Corti

Virtual Machine, Intermediate Language Pascal P-Code (1975)

– stack-based processor– strong type machine language– compiler: one front end, many back ends– UCSD Apple][ implementation, PDP 11, Z80

Modula M-Code (1980)– high code density– Lilith as microprogrammed virtual processor

JVM – Java Virtual Machine (1995)– Write Once – Run Everywhere– interpreters, JIT compilers, Hot Spot Compiler

Microsoft .NET (2000)– language interoperability

System-Software WS 04/05329 © P. Reali / M. Corti

JVM Case Study compiler (Java to bytecode) interpreter, ahead-of-time

compiler, JIT dynamic loading and linking exception Handling memory management,

garbage collection

OO model with single inheritance and interfaces

system classes to provide OS-like implementation

– compiler– class loader– runtime– system

System-Software WS 04/05330 © P. Reali / M. Corti

JVM: Type System Primitive types

– byte– short– int– long– float– double– char

– reference– boolean mapped to int

Object types– classes– interfaces– arrays

Single class inheritance Multiple interface

implementation Arrays

– anonymous types– subclasses of

java.lang.Object

System-Software WS 04/05331 © P. Reali / M. Corti

JVM: Java Byte-Code

Memory access tload / tstore ttload / ttstore tconst getfield / putfield getstatic / putstatic

Operations tadd / tsub / tmul / tdiv tshifts

Conversions f2i / i2f / i2l / .... dup / dup2 / dup_x1 / ...

Control ifeq / ifne / iflt / .... if_icmpeq / if_acmpeq invokestatic invokevirtual invokeinterface athrow treturn

Allocation new / newarray

Casting checkcast / instanceof

System-Software WS 04/05332 © P. Reali / M. Corti

JVM: Java Byte-Code Example

bipush

Operation Push byte

Format

Forms bipush = 16 (0x10)

Operand Stack ... => ..., value

Description The immediate byte is sign-extended to an int value. That value is pushed onto the operand stack.

bipush

byte

System-Software WS 04/05333 © P. Reali / M. Corti

JVM: Machine Organization

Virtual Processor stack machine no registers typed instructions no memory addresses, only

symbolic names

Runtime Data Areas pc register stack

– locals– parameters– return values

heap method area

– code

runtime constant pool native method stack

System-Software WS 04/05334 © P. Reali / M. Corti

iload 5iload 6iaddistore 4

iaddv5+v6

v5

v6

locals

v4

istore 4

pro

gra

m

Time

v5

iload 5

v6

iload 6

operand stack

JVM: Execution Example

System-Software WS 04/05335 © P. Reali / M. Corti

JVM: Reflection

java.lang.Class– getFields– getMethods– getConstructors

java.lang.reflect.Field– setObject getObject– setInt getInt– setFloatgetFloat– .....

java.lang.reflect.Method– getModifiers– invoke

java.lang.reflectConstructor

Load and manipulate unknown classes at runtime.

System-Software WS 04/05336 © P. Reali / M. Corti

JVM: Reflection – Example

import java.lang.reflect.*;

public class ReflectionExample {

public static void main(String args[]) { try { Class c = Class.forName(args[0]); Method m[] = c.getDeclaredMethods(); for (int i = 0; i < m.length; i++) { System.out.println(m[i].toString()); } } catch (Throwable e) { System.err.println(e); } }}

System-Software WS 04/05337 © P. Reali / M. Corti

JVM: Java Weaknesses

Transitive closure of java.lang.Object contains 1.1 47 1.2 178 1.3 180 1.4 248 5 (1.5) 280 classpath 0.03 299

class Object {public String toString();....

}class String {

public String toUpperCase(Locale loc);....

} public final class Locale implements Serializable, Cloneable {

....}

System-Software WS 04/05338 © P. Reali / M. Corti

B

static {y = A.f();

}

A

static {x = B.f();

}

JVM: Java Weaknesses

Class static initialization T is a class and an instance of T

is created

T tmp = new T(); T is a class and a static method

of T is invoked

T.staticMethod(); A nonconstant static field of T is

used or assigned(field is not static, not final, and not initialized with compile-time constant)

T.someField = 42;

Problem circular dependencies in static

initialization code

System-Software WS 04/05339 © P. Reali / M. Corti

JVM: Java Weaknessesinterface Example {

final static String labels[] = {“A”, “B”, “C”}}

hidden static initializer:labels = new String[3];

labels[0] = “A”; labels[1] = “B”; labels[2] = “C”;

Warning: in Java final means write-once! interfaces may contain code

System-Software WS 04/05340 © P. Reali / M. Corti

JVM: Memory Model The JVM specs define a memory model:

– defines the relationship between variables and the underlying memory

– meant to guarantee the same behavior on every JVM

The compiler is allowed to reorder operations unless synchronized or volatile is specified.

System-Software WS 04/05341 © P. Reali / M. Corti

JVM: Reordering read and writes to ordinary variables can be

reordered.public class Reordering { int x = 0, y = 0;

public void writer() { x = 1; y = 2; }

public void reader() { int r1 = y; int r2 = x; }}

System-Software WS 04/05342 © P. Reali / M. Corti

JVM: Memory Model synchronized: in addition to specify a monitor it

defines a memory barrier:– acquiring the lock implies an invalidation of the caches– releasing the lock implies a write back of the caches

synchronized blocks on the same object are ordered.

order among accesses to volatile variables is guaranteed (but not among volatile and other variables).

System-Software WS 04/05343 © P. Reali / M. Corti

JVM: Double Checked Lock

Singleton

public class SomeClass {

private static Resource resource = null;

public Resource synchronized getResource() { if (resource == null) { resource = new Resource(); } return resource; }}

System-Software WS 04/05344 © P. Reali / M. Corti

JVM: Double Checked Lock

Double checked locking

public class SomeClass {

private static Resource resource = null;

public Resource getResource() { if (resource == null) { synchronized (this) { if (resource == null) { resource = new Resource(); } } } return resource; }}

System-Software WS 04/05345 © P. Reali / M. Corti

JVM: Double Checked Lock

Thread 1 Thread 2

public class SomeClass {

private Resource resource = null;

public Resource getResource() { if (resource == null) { synchronized { if (resource == null) { resource = new Resource(); } } } return resource; }}

public class SomeClass {

private Resource resource = null;

public Resource getResource() { if (resource == null) { synchronized { if (resource == null) { resource = new Resource(); } } } return resource; }}

The object isinstantiated

but not yet initialized!

System-Software WS 04/05346 © P. Reali / M. Corti

JVM: Immutable Objects are not Immutable Immutable objects:

– all types are primitives or references to immutable objects– all fieds are final

Example (simplified): java.lang.String– contains

an array of characters the length an offset

– example: s = “abcd”, length = 2, offset = 2, string = “cd”

String s1 = “/usr/tmp”String s2 = s1.substring(4); //should contain “/tmp”

Sequence: s2 is instantiated, the fields are initialized (to 0), the array is copied, the fields are written by the constructor.

What happens if instructions are reordered?

System-Software WS 04/05347 © P. Reali / M. Corti

JVM: Reordering Volatile and Nonvolatile Stores

volatile reads and writes are totally ordered among threads

but not among normal variables example

Thread 1 Thread 2

o = new SomeObject;initialized = true;

while (!initialized) {sleep();

}o.field = 42;

volatile boolean initialized = false;SomeObject o = null;

?

System-Software WS 04/05348 © P. Reali / M. Corti

JVM: JSR 133 Java Community Process Java memory model revision

Final means final Volatile fields cannot be reordered

System-Software WS 04/05349 © P. Reali / M. Corti

Java JVM: Execution Interpreted (e.g., Sun JVM)

– bytecode instructions are interpreted sequentially– the VM emulates the Java Virtual Machine– slower– quick startup

Just-in-time compilers (e.g., Sun JVM, IBM JikesVM)– bytecode is compiled to native code at load time (or later)– code can be optimized (at compile time or later)– quicker– slow startup

Ahead-of time compilers (e.g., GCJ)– bytecode is compiled to native code offline– quick startup– quick execution– static compilation

System-Software WS 04/05350 © P. Reali / M. Corti

JVM: Loader – The Classfile Format

ClassFile {

version

constant pool

flags

super class

interfaces

fields

methods

attributes

}

Constants: Values

String / Integer / Float / ... References

Field / Method / Class / ...

Attributes: ConstantValue Code Exceptions

System-Software WS 04/05351 © P. Reali / M. Corti

JVM: Class File Formatclass HelloWorld {

public static void printHello() {System.out.println("hello, world");

}

public static void main (String[] args) {HelloWorld myHello = new HelloWorld();myHello.printHello();

}

}

System-Software WS 04/05352 © P. Reali / M. Corti

JVM: Class File (Constant Pool)1. String hello, world2. Class HelloWorld3. Class java/io/PrintStream4. Class java/lang/Object5. Class java/lang/System6. Methodref HelloWorld.<init>()7. Methodref

java/lang/Object.<init>()8. Fieldref java/io/PrintStream

java/lang/System.out9. Methodref

HelloWorld.printHello()10. Methodref

java/io/PrintStream.println(java/lang/String )

11. NameAndType <init> ()V12. NameAndType out

Ljava/io/PrintStream;13. NameAndType printHello ()V14. NameAndType println

(Ljava/lang/String;)V

15. Unicode ()V16. Unicode (Ljava/lang/String;)V17. Unicode

([Ljava/lang/String;)V18. Unicode <init>19. Unicode Code20. Unicode ConstantValue21. Unicode Exceptions22. Unicode HelloWorld23. Unicode HelloWorld.java24. Unicode LineNumberTable25. Unicode Ljava/io/PrintStream;26. Unicode LocalVariables27. Unicode SourceFile28. Unicode hello, world29. Unicode java/io/PrintStream30. Unicode java/lang/Object31. Unicode java/lang/System32. Unicode main33. Unicode out34. Unicode printHello

System-Software WS 04/05353 © P. Reali / M. Corti

JVM: Class File (Code)Methods

0 <init>() 0 ALOAD0 1 INVOKESPECIAL [7] java/lang/Object.<init>() 4 RETURN

1 PUBLIC STATIC main(java/lang/String []) 0 NEW [2] HelloWorld 3 DUP 4 INVOKESPECIAL [6] HelloWorld.<init>() 7 ASTORE1 8 INVOKESTATIC [9] HelloWorld.printHello() 11 RETURN

2 PUBLIC STATIC printHello() 0 GETSTATIC [8] java/io/PrintStream java/lang/System.out 3 LDC1 hello, world 5 INVOKEVIRTUAL [10] java/io/PrintStream.println(java/lang/String ) 8 RETURN

System-Software WS 04/05354 © P. Reali / M. Corti

JVM: Compilation – Pattern Expansion Each byte code is translated according to fix

patterns+ easy- limited knowledge

Example (pseudocode) switch (o) {case ICONST<n>: generate(“push n”); PC++; break;case ILOAD<n>: generate(“push off_n[FP]”); PC++; break;case IADD: generate(“pop -> R1”); generate(“pop -> R2”); generate(“add R1, R2 -> R1”); generate(“push R1”); PC++; break;…

System-Software WS 04/05355 © P. Reali / M. Corti

JVM: Optimizing Pattern Expansion

Main Idea: use internal virtual stack stack values are consts / fields / locals / array

fields / registers / ... flush stack as late as possible

iload 4iload 5iaddistore 6

local4 local4

local5

EAX

local5

EAX

MOV EAX, off4[FP] ADD EAX, off5[FP]

iload4 iload5 iadd istore6

MOV off6[FP], EAXemittedcode

virtualstack

System-Software WS 04/05356 © P. Reali / M. Corti

JVM: Compiler Comparison

pattern expansionpush off4[FP]

push off5[FP]

pop EAX

add 0[SP], EAX

pop off6[FP]

optimizedmov EAX, off4[FP]

add EAX, off5[FP]

mov off6[FP], EAX

iload_4iload_5iaddistore_6

5 instructions9 memory accesses

3 instructions3 memory accesses

System-Software WS 04/05357 © P. Reali / M. Corti

Linking (General) A compiled program contains references to external

code (libraries) After loading the code the system need to link the

code to the library– identify the calls to external code– locate the callees (and load them if necessary)– patch the loaded code

Two options:– the code contains a list of sites for each callee– the calls to external code are jumps to a procedure

linkage table which is then patched (double indirection)

System-Software WS 04/05358 © P. Reali / M. Corti

Linking (General)

0 instr

1 instr

2 jump -

3 instr

4 instr

5 jump -

6 instr

7 jump 2

9 instr

10 instr

proc 0 5

proc 1 7

0 instr

1 instr

2 jump 101

3 instr

4 instr

5 jump 100

6 instr

7 jump 101

9 instr

10 instr

100 jump

101 jump

System-Software WS 04/05359 © P. Reali / M. Corti

Linking (General)

0 instr

1 instr

2 jump &p1

3 instr

4 instr

5 jump &p0

6 instr

7 jump &p1

9 instr

10 instr

proc 0 5

proc 1 7

0 instr

1 instr

2 jump 101

3 instr

4 instr

5 jump 100

6 instr

7 jump 101

9 instr

10 instr

100 jump &p0

101 jump &p1

System-Software WS 04/05360 © P. Reali / M. Corti

JVM: Linking Bytecode interpreter

– references to other objects are made through the JVM (e.g., invokevirtual, getfield, …)

Native code (ahead of time compiler)– static linking– classic native linking

JIT compiler– only some classes are compiled– calls could reference classes that are not yet loaded or

compiled (delayed compilation) code instrumentation

System-Software WS 04/05361 © P. Reali / M. Corti

JVM: Methods and Fields Resolution method and fields are accessed through special VM

functions (e.g., invokevirtual, getfield, …) the parameters of the special call defines the target the parameters are indexes in the constant pool the VM checks id the call is legal and if the target is

presentl

System-Software WS 04/05362 © P. Reali / M. Corti

class A {

....

...B.x}

class B {int x;

}

B.x CheckClass(B);B.x

IF ~B.initialized THEN

Initialize(B)END;

JVM: JIT – Linking and Instrumentation Use code instrumentation to detect first access of static

fields and methods

System-Software WS 04/05363 © P. Reali / M. Corti

C header C source

Compiler

ObjectFileObject

FileObjectFile

Object file

Linker

C header

Loader

LoadedCode

Compilation and Linking Overview

System-Software WS 04/05364 © P. Reali / M. Corti

Oberonsource

Compiler

ObjectFileObject

FileObjectFile

Object &Symbol

LoaderLinker

LoadedModule

LoadedModule

LoadedModuleLoaded

Module

Compilation and Linking Overview

System-Software WS 04/05365 © P. Reali / M. Corti

LoaderLinker

JITCompiler

Javasource

ClassFile

Compiler

ClassLoader

Class

ReflectionAPI

ClassClass

ClassClass

Compilation and Linking Overview

System-Software WS 04/05366 © P. Reali / M. Corti

Jaos Jaos (Java on Active Object System) is a Java

virtual machine for the Bluebottle system goals:

– implement a JVM for the Bluebottle system– show that the Bluebottle kernel is generic enough to

support more than one system– interoperability between the Active Oberon and Java

languages– interoperability between the Oberon System and the Java

APIs

System-Software WS 04/05367 © P. Reali / M. Corti

Metadata

LoaderLinker

LoadedModuleLoaded

Module

OberonLoaderLinker

LoadedModule

Oberonsource

Compiler

Object &Symbol

OberonMetadata

Loader

OberonBrowser

JavaReflection

API

JITCompiler

LoadedClass

Linker

ClassFile

LoaderJava

MetadataLoader

Jaos (Interoperability Framework)

System-Software WS 04/05368 © P. Reali / M. Corti

JVM: Verification Compiler generates

“good” code.... .... that could be

changed before reaching the JVM

need for verification

Verification makes the VM simpler (less run-time checks):

– no operand stack overflow– load / stores are valid– VM types are correct– no pointer forging– no violation of access

restrictions– access objects as they are

(type)– local variable initialized before

load– …

System-Software WS 04/05369 © P. Reali / M. Corti

JVM: Verification

Pass1 (Loading): class file version check class file format check class file complete

Pass 2 (Linking): final classes are not

subclassed every class has a

superclass (but Object) constant pool references constant pool names

System-Software WS 04/05370 © P. Reali / M. Corti

Byte-CodeVerification

Delayed forperformance

reasons

JVM: Verification

Pass 3 (Linking):

For each operation in code

(independent of the path): operation stack size is the

same accessed variable types are

correct method parameters are

appropriate field assignment with

correct types opcode arguments are

appropriate

Pass 4 (RunTime):

First time a type is referenced: load types when referenced check access visibility class initialization

First member access: member exists member type same as

declared current method has right to

access member

System-Software WS 04/05371 © P. Reali / M. Corti

JVM: Byte-Code Verification

Verification: branch destination must

exists opcodes must be legal access only existing locals code does not end in the

middle of an instruction

types in byte-code must be respected

execution cannot fall of the end of the code

exception handler begin and end are sound

Addendum: Security

System-Software WS 04/05373 © P. Reali / M. Corti

Security internal protection

– memory protection– file system accesses

external protection– accessibility

problems:– program threats

System-Software WS 04/05374 © P. Reali / M. Corti

Security: Program Threats Trojan horses: a code segment

that misuses its environment– mail attachments– web downloads (e.g., SEXY.EXE

which formats your hard disk)– programs with the same name as

common utilities– misleading names (e.g.,

README.TXT.EXE) Trap door (in programs or

compilers): an intentional hole in the software

System-Software WS 04/05375 © P. Reali / M. Corti

Security: System Threats worms: a standalone program that spawns other

processes (copies of itself) to reduce system performance– example: Morris worm (1988)

exploited holes in rsh, finger and sendmail to gainaccess to other machines

once on the other machine it was able to replicate itself

– used by spammers to spread and distribute spamming applications

viruses: similar to worms but embedded in other programs– they usually infect other programs and

the boot sector

System-Software WS 04/05376 © P. Reali / M. Corti

Security: System Threats Denial of service

– perform many requests to steal all the available resources– often distributed (using worms)

Example: SYN flooding attacks– the attacker tries to connect– the victim answers with a synchronize and acknowledge

packet– and waits for acknowledgment

Countermeasures– active filtering– request dropping– cookie based protocols (requests must be authenticated)– stateless protocols

System-Software WS 04/05377 © P. Reali / M. Corti

Security: System Threats badly implemented and designed software:

– lpr (setuid) with an option to delete the printed file– mkdir (first create the inode then change the owner)

it was possible to change the inode before the chown …– buffer overflows– password in memory or swap files– insecure protocols (FTP, SMTP)– missing sanity checks (syscalls, command in input, …)– short keys and passwords– proprietary protocols

System-Software WS 04/05378 © P. Reali / M. Corti

Bad design: A very recent example Texas Instruments produces RFID tags offering

cryptographic functionalities. used for cars and electronic payments 40 bit keys proprietary protocol Attack from Johns Hopkins University and RSA

Labs– less than 2 hours for 5 keys– less than 3500$

System-Software WS 04/05379 © P. Reali / M. Corti

Security: Buffer Overflows Overwrite a function’s return

address

function foo(int p1, int p2) { char array[10]; strcpy(array, someinput);}

array

FP

RET

p1 & p2

array

Avoid strcpy and check the length, e.g., strncpy

System-Software WS 04/05380 © P. Reali / M. Corti

Security: Monitoring check for suspicious patterns

– login times

audit logs periodic scans for security holes (bad passwords,

set-uid programs, changes to system programs)– system integrity checks (checksums for executable files)

[tripwire]

network services– monitor network activity

System-Software WS 04/05381 © P. Reali / M. Corti

Example: Firewalling Many applications use network sockets to

communicate (even on a single machine) Many applications are not protected

Solution: filter all the incoming connections by default and allow only the trusted ones

System-Software WS 04/05382 © P. Reali / M. Corti

Security: (some) Design Principles Open systems (programs and protocols) Default is deny access Check for current authority (timeouts, …) Give the least privilege possible Simple protection mechanisms Do not ask to much to the users (or they will avoid

to protect themselves)

System-Software WS 04/05383 © P. Reali / M. Corti

Security and Systems: Some Examples

Enhancements to memory management: Intel XD bit, AMD NX bit mark pages according to the content (data or code) an exception is generated if the PC is moved to a

data address prevents some buffer overflow attacks dynamically generated code has to be generated

through special system calls Windows XP SP2, Linux, BSD …

System-Software WS 04/05384 © P. Reali / M. Corti

Security and Systems: Some Examples

SELinux National Security Agency (USA) patches to the Linux kernel to enforce mandory

access control open source independent from the traditional UNIX roles (users

and groups) configurable policies restricting what a program is

able to do

System-Software WS 04/05385 © P. Reali / M. Corti

Security and Systems: Some Examples

OpenBSDaudit process (proactive bug search)random gaps in the stackProPolice: gcc puts a random integer on the

stack in a call prologue and checks it when returning

W^X: pages are writable xor executable

System-Software WS 04/05386 © P. Reali / M. Corti

Security and Systems: Some Examples

OpenBSDrandomized shared library order and

addressesmmap() and malloc() return randomized

addressesguard pages between objectsprivilege separation and revocation

System-Software WS 04/05387 © P. Reali / M. Corti

Privilege Separation unprivileged child process to contain and restrict

the effects of programming errors e.g., openssh

listen *22network connection

monitornetwork

processing

request authauth result

key exchange

authentication

fork unprivileged child

monitoruser requestprocessing

request PTYpass PTY user network data

state export

fork user child

time