Windows Os

40
The Windows Operating System

description

WINDOWS OS

Transcript of Windows Os

The Windows Operating System

Goals

• Hardware-portable– Used to support MIPS, PowerPC and Alpha– Currently supports x86, ia64, and amd64– Multiple vendors build hardware

• Software-portable– POSIX, OS2, and Win32 subsystems

• OS2 is dead• POSIX is still supported—separate product• Lots of Win32 software out there in the world

Goals

• High performance– Anticipated PC speeds approaching

minicomputers and mainframes– Async IO model is standard– Support for large physical memories– SMP was an early design goal– Designed to support multi-threaded processes– Kernel has to be reentrant

Process Model

• Threads and processes are distinct

• Process:– Address space– Handle table (Handles => file descriptors)– Process default security token

• Thread:– Execution Context– Optional thread-specific security token

Tokens

• “Who you are”—list of identities– Each identity is a SID

• Also contains Privileges– Shutdown, Load drivers, Backup, Debug…

• Can be passed through LPC ports and named pipe requests– Server side can use this to selectively

impersonate the client.

Object Manager

• Uniform interface to kernel mode objects.

• Handles are 32bit opaque integers

• Per-process handle table maps handles to objects and permissions on the objects

• Implements refcount GC– Pointer count—total number of references– Handle count—number of open handles

Object Manager

• Implements an object namespace– Win32 objects are under \BaseNamedObjects– Devices under \Device

• This includes filesystems

– Drive letters are symbolic links• \??\C: => the appropriate filesystem device

• Some things have other names– Processes and threads are opened by

specifying a CID: (Process.Thread)

Standard operations on handles

• CloseHandle()

• DuplicateHandle()– Takes source and destination process– Very useful for servers

• WaitForSingleObject(), WaitForMultipleObjects()– Wait for something to happen– Can wait on up to 64 handles at once

Security Descriptors

• Each object has a Security Descriptor– Owner—special SID, CREATOR_OWNER– Group—special SID, CREATOR_GROUP– DACL

• Discretionary Access Control List• List of SIDs and granted or denied access rights

– SACL• System Access Control List• List of SIDs and access rights to be audited

Access Rights

typedef struct _ACCESS_MASK { USHORT SpecificRights; UCHAR StandardRights; UCHAR AccessSystemAcl : 1; UCHAR Reserved : 3; UCHAR GenericAll : 1; UCHAR GenericExecute : 1; UCHAR GenericWrite : 1; UCHAR GenericRead : 1;} ACCESS_MASK;

Security Use

• Objects are referred to via handles• Security checks occur when an object is

opened– Open requests contain a mask of requested

access rights– If granted to the token by the DACL, the

handle contains those access rights

• Access rights are checked on use– Just a bit test—very fast

Object Open

evt = OpenEvent(EVENT_MODIFY_STATE, FALSE, "SomeName");

– Finds the event object by name– Walks the DACL, looking for token SIDs– Keeps looking until all permissions are

granted– If access is granted, inserts a handle to the

object into the process’s handle table, with EVENT_MODIFY_STATE access

Object Use

SetEvent(evt);– SetEvent() requires EVENT_MODIFY_STATE

access, and an event object.– The kernel looks up the handle in the

process’s handle table.– Checks to make sure that it maps to an event

object, and that the granted access bits contain the EVENT_MODIFY_STATE bit.

– If all is good, the event is set.

Object Use

WaitForSingleObject(evt)– WaitForSingleObject() requires a

synchronization object (like an event) and SYNCHRONIZE access.

– evt maps to an event object– SYNCHRONIZE access was not requested

when the handle was inserted.– Even if the DACL permits it, the wait fails.

Types of Objects

• Events– State is set or clear.– Can clear when a wait completes (auto-reset)

• Mutexes– Can be acquired by a single thread at a time.– Automatically release when owner exits.

• Semaphores– Maintain a count– Waits decrement the count

More objects

• Threads, Processes, Timers—like events

• Registry Keys– Manipulate data in the registry—centralized

store of system configuration info.

• LPC Ports– Fast local RPC– Security tokens can transfer over LPC calls

• Files

Files & IO

• File objects maintain a current offset, and a pointer to the underlying stream.

• Default internal model is asynchronous– Synchronous IO just waits for the IO to

complete– Async IO can set an event, or run a callback

in the thread which queued the IO, or post a message to an IO completion port.

• Each request is an IRP

IRPs

• Maintain state of IO requests, independent of the thread working on the IO

• IRPs are handed off through the device stack to their destinations– Threads process IRPs– Initiating thread processes the IRP until a

device returns STATUS_PENDING– Subsequent processing can be done in kernel

worker threads

Interrupts

IRQL—Interrupt Request Level:0 => PASSIVE_LEVEL

Processor is running threads

All usermode code is at IRQL 0

1 => APC_LEVEL; threads, APCs disabled

2 => DISPATCH_LEVEL• Running as the processor: can’t stop!• Can’t take a page fault• Only locks available are KSPIN_LOCKs

Interupts

3-26 => Device Interrupt Service Routines• Device interrupts are mapped to an IRQL and an

interrupt service routine; ISR is called at that IRQL

27 => PROFILE_LEVEL—profiling

28 => CLOCK2_LEVEL—clock interrupt

29 => IPI_LEVEL—interprocessor interrupt• Requests another processor to do something

30 => POWER_LEVEL—power failure

31 => HIGH_LEVEL—interrupts disabled

Interrupts

• Hardware signals an interrupt• Interrupt’s ISR runs at device IRQL

– Has to be fast; get off the processor and allow other ISRs to run

– Typically queues a DPC, acknowledges the interrupt, and returns

• DPC—Delayed Procedure Call– Further processing at DISPATCH_LEVEL– Queues work to kernel worker threads

IO Completion

• Driver calls IO Manager to complete the IRP

• IO Manager queues a kernel mode APC to the initiating thread

• APC: Asynchronous Procedure Call– Kernel mode APC preempts thread execution– Writes data back to user mode in the context

of the thread which initiated the IO– Signals completion of the IO

IO Cache

• Classic: block cache– Page mappings translate directly to blocks on

the underlying partition.

• Windows: stream cache– Page mappings are offsets within a stream.– IO Cache Manager uses the same mappings.– All cache management (trimming) is

centralized in the memory manager– All modifications show up in mapped views.

Virtual Memory

• Sections—another object type– Can be created to map a file– Can also be created off the pagefile– Optionally named, for shared memory

• Reservation– Range of VA which will not be handed out for

some other purpose

• Committed– VA which actually maps to something

Aside: CreateProcess

• Just a user mode Win32 API { NtCreateFile(&file, szImage); NtCreateSection(&sec, file); NtCreateProcess(&proc, sec); NtCreateThread(&thrd, proc);}

WaitForSingleObject(proc);

Virtual Memory

• Memory Manager maintains processor-specific page table entry mappings.– Some parts of the address space are shared

between processes—for instance, the kernel’s address space and the per-session space.

• On a pagefault, mm reads in the data

• Pages can be mapped without the appropriate access… what to do?

Signals

• With threads, signals don’t work very well.

• Some software designs expect to touch inaccessible memory.– Large structured files– Concurrent garbage collection– SLists

• Single global handler has to somehow know about all possible situations.

Structured Exception Handling

• Exceptions unwind the stack– Almost like C++!– C++ matches against a type hierarchy– SEH calls exception filter code—filters are

Turing-complete.

• Two ways to deal with exceptions:– try/finally– try/except

try/finally

res = AllocateSomeResource();try { SomeOperation(res);} finally { if (AbnormalTermination()) { FreeSomeResource(res); }}return res;

try/except

try {

SomeOperationWhichMayAV();

} except (Filter(

GetExceptionCode(),

GetExceptionInformation())) {

DoSomethingElse();

}

try/except

• GetExceptionCode()– A code indicating the cause of the exception

• GetExceptionInformation()– Additional code-specific info– The full processor context

• Filter decides what to do– EXCEPTION_EXECUTE_HANDLER– EXCEPTION_CONTINUE_SEARCH– EXCEPTION_CONTINUE_EXECUTION

Structured Exception Handling

• On x86, TEB points to stack of EXCEPTION_REGISTRATION_RECORD– auto structs, pointing to handler code– pushed by function prolog– popped by function epilog

• On exception, RtlDispatchException() walks the list.– Runs the filters to figure out what to do– Calls handler functions

Structured Exception Handling

• On x86, there’s some overhead with pushing and popping the registration record

• On ia64, there is no overhead– Stack traces are reliable– It’s always possible to look up the handler

• Exception handling is very slow– Especially on ia64

• Used only for truly exceptional conditions

Structured Exception Handling

• Used in kernel mode too!– Most user mode access will just work– Still need to validate address ranges & data– Works great for SMP when another thread

might be in the middle of modifying the address space

– Expected read exceptions are returned as status codes from system calls

– Expected writes are returned as SUCCESS– Unexpected => buggy kernel => blue screen

Top-level Exception Filter

• Top frame on each thread defines a catchall exception filter

• Top-level exception filter:– Notifies the debugger (if being debugged)– Launches a just-in-time debugger (if set up)– Loads faultrep.dll to report the failure

Faultrep.dll

• faultrep.dll offers to report the failure back to Microsoft

• We analyze the failures– A significant number are recognized instantly;

we can tell the user what happened and how to fix it.

– The others go through the standard triage process; developers analyze the dumps and figure out what happened.

OCA

• 67 million machines running XP

• Tens of thousands of drivers

• Over 100 drivers on any given machine

• One bug in one driver => Crash

• A significant number of crashes come from third-party drivers (some of which ship on the CD)

• Lots of different problems, though

Driver Verifier

• Controlled by verifier.exe

• Special-pool’s allocations– Detects allocation overruns & use after free

• Validates some behaviors– IRQL—touching paged memory?– DMA buffers

• Can inject failures—useful for testing behavior under sub-optimal conditions

Stress

• Every night, a couple hundred machines run stress on the latest build

• Stress exercises filesystems, memory, GUI, scheduler, &c, trying to uncover low-memory handling problems and race conditions

• Every morning, the stress test team triages failed machines

• Developers debug the failures

Questions?