3.2. Windows Trap Dispatching, Interrupts, Synchronization
-
Upload
aileen-benson -
Category
Documents
-
view
248 -
download
0
description
Transcript of 3.2. Windows Trap Dispatching, Interrupts, Synchronization
Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas PolzeWindows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze
Unit OS3: ConcurrencyUnit OS3: Concurrency
3.2. Windows Trap Dispatching, Interrupts, 3.2. Windows Trap Dispatching, Interrupts, SynchronizationSynchronization
2
Copyright NoticeCopyright Notice© 2000-2005 David A. Solomon and Mark Russinovich© 2000-2005 David A. Solomon and Mark Russinovich
These materials are part of the These materials are part of the Windows Operating Windows Operating System Internals Curriculum Development Kit,System Internals Curriculum Development Kit, developed by David A. Solomon and Mark E. developed by David A. Solomon and Mark E. Russinovich with Andreas PolzeRussinovich with Andreas Polze
Microsoft has licensed these materials from David Microsoft has licensed these materials from David Solomon Expert Seminars, Inc. for distribution to Solomon Expert Seminars, Inc. for distribution to academic organizations solely for use in academic academic organizations solely for use in academic environments (and not for commercial use)environments (and not for commercial use)
3
Roadmap for Section 3.2.Roadmap for Section 3.2.
Trap and Interrupt dispatchingTrap and Interrupt dispatching
IRQL levels & Interrupt PrecedenceIRQL levels & Interrupt Precedence
Spinlocks and Kernel SynchronizationSpinlocks and Kernel Synchronization
Executive SynchronizationExecutive Synchronization
4
Processes and ThreadsProcesses and ThreadsWhat is a process?What is a process?
Represents an instance of a running programRepresents an instance of a running programyou create a process to run a programyou create a process to run a program
starting an application creates a processstarting an application creates a process
Process defined by:Process defined by:
Address spaceAddress space
Resources (e.g. open handles)Resources (e.g. open handles)
Security profile (token)Security profile (token)
What is a thread?What is a thread?An execution context within a processAn execution context within a process
Unit of scheduling (threads run, processes don’t run)Unit of scheduling (threads run, processes don’t run)
All threads in a process share the same per-process address spaceAll threads in a process share the same per-process address spaceServices provided so that threads can synchronize access to shared resources Services provided so that threads can synchronize access to shared resources (critical sections, mutexes, events, semaphores)(critical sections, mutexes, events, semaphores)
All threads in the system are scheduled as peers to all others, without regard All threads in the system are scheduled as peers to all others, without regard to their “parent” processto their “parent” process
System callsSystem callsPrimary argument to CreateProcess is image file name (or command line)Primary argument to CreateProcess is image file name (or command line)
Primary argument to CreateThread is a function entry point addressPrimary argument to CreateThread is a function entry point address
Per-processaddress space
Systemwide Address Systemwide Address SpaceSpace
Thread
Thread
Thread
5
Kernel Mode Versus User ModeKernel Mode Versus User ModeA processor stateA processor state
Controls access to memoryControls access to memory
Each memory page is tagged Each memory page is tagged to show the required mode for to show the required mode for reading and for writingreading and for writing
Protects the system from Protects the system from the usersthe users
Protects the user (process) from Protects the user (process) from themselvesthemselves
System is not protected System is not protected from systemfrom system
Code regions are tagged Code regions are tagged “no write in any mode”“no write in any mode”
Controls ability to execute Controls ability to execute privileged instructionsprivileged instructions
A Windows abstractionA Windows abstractionIntel: Ring 0, Ring 3 Intel: Ring 0, Ring 3
Control flow (i.e.; a thread) ca Control flow (i.e.; a thread) ca change from user to kernel mode change from user to kernel mode and backand back
Does not affect schedulingDoes not affect scheduling
Thread context includes info Thread context includes info about execution mode (along about execution mode (along with registers, etc)with registers, etc)
PerfMon counters:PerfMon counters:““Privileged Time” and Privileged Time” and “User Time”“User Time”
4 levels of granularity: thread, 4 levels of granularity: thread, process, process, processor, systemprocessor, system
6
Getting Into Kernel ModeGetting Into Kernel ModeCode is run in kernel mode for one of three reasons:Code is run in kernel mode for one of three reasons:
1. Requests from user mode1. Requests from user modeVia the system service dispatch mechanismVia the system service dispatch mechanism
Kernel-mode code runs in the context of the requesting threadKernel-mode code runs in the context of the requesting thread
2. Interrupts from external devices2. Interrupts from external devicesWindows interrupt dispatcher invokes the interrupt service routineWindows interrupt dispatcher invokes the interrupt service routine
ISR runs in the context of the interrupted thread ISR runs in the context of the interrupted thread (so-called “arbitrary thread context”)(so-called “arbitrary thread context”)
ISR often requests the execution of a “DPC routine,” ISR often requests the execution of a “DPC routine,” which also runs in kernel modewhich also runs in kernel mode
Time not charged to interrupted threadTime not charged to interrupted thread
3. Dedicated kernel-mode system threads3. Dedicated kernel-mode system threadsSome threads in the system stay in kernel mode at all times Some threads in the system stay in kernel mode at all times (mostly in the “System” process)(mostly in the “System” process)
Scheduled, preempted, etc., like any other threadsScheduled, preempted, etc., like any other threads
7
Trap dispatchingTrap dispatching
Trap: processor‘s mechanism to capture executing threadTrap: processor‘s mechanism to capture executing threadSwitch from user to kernel modeSwitch from user to kernel mode
Interrupts – asynchronousInterrupts – asynchronous
Exceptions - synchronousExceptions - synchronous
Interruptdispatcher
Systemservice
dispatcher
Interruptserviceroutines
Interruptserviceroutines
Interruptserviceroutines
System services
System services
System services
Exceptiondispatcher
Exceptionhandlers
Exceptionhandlers
Exceptionhandlers
Virtual memorymanager‘s pager
Interrupt
System service call
HW exceptionsSW exceptions
Virtual addressexceptions
8
Interrupt DispatchingInterrupt Dispatching
Interrupt dispatch routine
Disable interrupts
Record machine state (trap frame) to allow resume
Mask equal- and lower-IRQL interrupts
Find and call appropriate ISR
Dismiss interrupt
Restore machine state (including mode and enabled interrupts)
Tell the device to stop interruptingInterrogate device state, start next operation on device, etc. Request a DPCReturn to caller
Interrupt service routine
interrupt !
user or kernel mode
codekernel mode
Note, no thread or process context switch!
9
Interrupt Precedence via IRQLs (x86)Interrupt Precedence via IRQLs (x86)IRQL = Interrupt Request LevelIRQL = Interrupt Request Level
the “precedence” of the interrupt with the “precedence” of the interrupt with respect to other interruptsrespect to other interrupts
Different interrupt sources have Different interrupt sources have different IRQLsdifferent IRQLs
not the same as IRQnot the same as IRQ
Passive/LowAPC
Dispatch/DPCDevice 1
.
.
.Profile & Synch (Srv 2003)
ClockInterprocessor Interrupt
Power failHigh
normal thread execution
Hardware interrupts
Deferrable software interrupts
012
302928
31
IRQL is also a state of the processorIRQL is also a state of the processor
Servicing an interrupt raises processor Servicing an interrupt raises processor IRQL to that interrupt’s IRQLIRQL to that interrupt’s IRQL
this masks subsequent interrupts at equal this masks subsequent interrupts at equal and lower IRQLsand lower IRQLs
User mode is limited to IRQL 0User mode is limited to IRQL 0
No waits or page faults at IRQL >= No waits or page faults at IRQL >= DISPATCH_LEVELDISPATCH_LEVEL
10
Interrupt processingInterrupt processing
Interrupt dispatch table (IDT)Interrupt dispatch table (IDT)Links to interrupt service routinesLinks to interrupt service routines
x86:x86:Interrupt controller interrupts processor (single line)Interrupt controller interrupts processor (single line)
Processor queries for interrupt vector; uses vector as index to IDT Processor queries for interrupt vector; uses vector as index to IDT
After ISR execution, IRQL is lowered to initial levelAfter ISR execution, IRQL is lowered to initial level
11
Interrupt objectInterrupt object
Allows device drivers to register ISRs for their devicesAllows device drivers to register ISRs for their devicesContains dispatch code (initial handler)Contains dispatch code (initial handler)
Dispatch code calls ISR with interrupt object as parameterDispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)(HW cannot pass parameters to ISR)
Connecting/disconnecting interrupt objects:Connecting/disconnecting interrupt objects:Dynamic association between ISR and IDT entryDynamic association between ISR and IDT entry
Loadable device drivers (kernel modules)Loadable device drivers (kernel modules)
Turn on/off ISRTurn on/off ISR
Interrupt objects can synchronize access to ISR dataInterrupt objects can synchronize access to ISR dataMultiple instances of ISR may be active simultaneously (MP machine)Multiple instances of ISR may be active simultaneously (MP machine)
Multiple ISR may be connected with IRQLMultiple ISR may be connected with IRQL
12
Predefined IRQLsPredefined IRQLs
HighHigh used when halting the system (via used when halting the system (via KeBugCheck()KeBugCheck()))
Power failPower fail originated in the NT design document, but has never been usedoriginated in the NT design document, but has never been used
Inter-processor interruptInter-processor interruptused to request action from other processor (dispatching a thread, used to request action from other processor (dispatching a thread, updating a processors TLB, system shutdown, system crash)updating a processors TLB, system shutdown, system crash)
ClockClockUsed to update system‘s clock, allocation of CPU time to threadsUsed to update system‘s clock, allocation of CPU time to threads
ProfileProfileUsed for kernel profiling (see Kernel profiler – Kernprof.exe, Res Kit)Used for kernel profiling (see Kernel profiler – Kernprof.exe, Res Kit)
13
Predefined IRQLs (contd.)Predefined IRQLs (contd.)
DeviceDeviceUsed to prioritize device interruptsUsed to prioritize device interrupts
DPC/dispatch DPC/dispatch andand APC APCSoftware interrupts that kernel and device drivers Software interrupts that kernel and device drivers generategenerate
PassivePassiveNo interrupt level at all, normal thread executionNo interrupt level at all, normal thread execution
14
IRQLs on 64-bit SystemsIRQLs on 64-bit Systems
Passive/LowAPC
Dispatch/DPCDevice 1
.
.Device n
Synch (Srv 2003)Clock
Interprocessor Interrupt/PowerHigh/Profile
012
1413
15
34
Passive/LowAPC
Dispatch/DPC & Synch (UP only)Correctable Machine Check
Device 1.
Device nSynch (MP only)
ClockInterprocessor Interrupt
High/Profile/Power
x64 IA64
12
15
Interrupt Prioritization & DeliveryInterrupt Prioritization & Delivery
IRQLs are determined as follows:IRQLs are determined as follows:x86 UP systems: IRQL = 27 - IRQx86 UP systems: IRQL = 27 - IRQ
x86 MP systems: bucketized (random)x86 MP systems: bucketized (random)
x64 & IA64 systems: IRQL = IDT vector number / 16x64 & IA64 systems: IRQL = IDT vector number / 16
On MP systems, which processor is chosen to deliver an interrupt?On MP systems, which processor is chosen to deliver an interrupt?By default, any processor can receive an interrupt from any deviceBy default, any processor can receive an interrupt from any device
Can be configured with IntFilter utility in Resource KitCan be configured with IntFilter utility in Resource Kit
On x86 and x64 systems, the IOAPIC (I/O advanced programmable On x86 and x64 systems, the IOAPIC (I/O advanced programmable interrupt controller) is programmed to interrupt the processor running at interrupt controller) is programmed to interrupt the processor running at the lowest IRQLthe lowest IRQL
On IA64 systems, the SAPIC (streamlined advanced programmable On IA64 systems, the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt controller) is configured to interrupt one processor for each interrupt sourceinterrupt source
Processors are assigned round robin for each interrupt vectorProcessors are assigned round robin for each interrupt vector
16
Software interruptsSoftware interrupts
Initiating thread dispatchingInitiating thread dispatchingDPC allow for scheduling actions when kernel is DPC allow for scheduling actions when kernel is deep within many layers of codedeep within many layers of code
Delayed scheduling decision, one DPC queue per Delayed scheduling decision, one DPC queue per processorprocessor
Handling timer expirationHandling timer expiration
Asynchronous execution of a procedure in Asynchronous execution of a procedure in context of a particular threadcontext of a particular thread
Support for asynchronous I/O operationsSupport for asynchronous I/O operations
17
Flow of InterruptsFlow of Interrupts
Peripheral Device Controller
CPU Interrupt Controller
CPU Interrupt Service Table
023
n
ISR Address
Spin Lock
Dispatch Code
Interrupt Object
Read from device
Acknowledge-Interrupt
Request DPC
Driver ISR
Raise IRQL
Lower IRQL
KiInterruptDispatch
Grab Spinlock
Drop Spinlock
18
031
Synchronization on SMP SystemsSynchronization on SMP Systems
Synchronization on MP systems use spinlocks to coordinate among the Synchronization on MP systems use spinlocks to coordinate among the processorsprocessors
Spinlock acquisition and release routines implement a one-owner-at-a-time Spinlock acquisition and release routines implement a one-owner-at-a-time algorithmalgorithm
A spinlock is either free, or is considered to be owned by a CPUA spinlock is either free, or is considered to be owned by a CPU
Analogous to using Windows API mutexes from user modeAnalogous to using Windows API mutexes from user mode
A spinlock is just a data cell in memoryA spinlock is just a data cell in memoryAccessed with a test-and-modify operation that is atomic across all processorsAccessed with a test-and-modify operation that is atomic across all processors
KSPIN_LOCK is an opaque data type, typedef’d as a ULONGKSPIN_LOCK is an opaque data type, typedef’d as a ULONG To implement synchronization, a single bit is sufficientTo implement synchronization, a single bit is sufficient
19
Kernel SynchronizationKernel Synchronization
Processor BProcessor A
do acquire_spinlock(DPC)until (SUCCESS)
begin remove DPC from queueend
release_spinlock(DPC)
do acquire_spinlock(DPC)until (SUCCESS)
begin remove DPC from queueend
release_spinlock(DPC)
.
.
.
.
.
.
Critical section
spinlock
DPC DPC
A spinlock is a locking primitive associatedwith a global data structure, such as the DPC queue
20
Try to acquire spinlock:Test, set, was set, loopTest, set, was set, loopTest, set, was set, loopTest, set, was set, loopTest, set, WAS CLEAR(got the spinlock!)Begin updating data
Try to acquire spinlock:Test, set, WAS CLEAR(got the spinlock!)Begin updating data that’s protected by the spinlock
(done with update)Release the spinlock:Clear the spinlock bit
Spinlocks in ActionSpinlocks in Action
CPU 1 CPU 2
21
Queued SpinlocksQueued Spinlocks
ProblemProblem: Checking status of spinlock via test-and-set : Checking status of spinlock via test-and-set operation creates bus contentionoperation creates bus contention
Queued spinlocks maintain queue of waiting processorsQueued spinlocks maintain queue of waiting processors
First processor acquires lock; other processors wait on First processor acquires lock; other processors wait on processor-local flagprocessor-local flag
Thus, busy-wait loop requires no access to the memory busThus, busy-wait loop requires no access to the memory bus
When releasing lock, the first processor’s flag is modifiedWhen releasing lock, the first processor’s flag is modifiedExactly one processor is being signaledExactly one processor is being signaled
Pre-determined wait orderPre-determined wait order
22
SMP Scalability ImprovementsSMP Scalability Improvements
Windows 2000: queued spinlocks Windows 2000: queued spinlocks !qlocks in Kernel Debugger!qlocks in Kernel Debugger
XP/2003:XP/2003:Minimized lock contention for hot locks (PFN or Page Frame Database) lock Minimized lock contention for hot locks (PFN or Page Frame Database) lock
Some locks completely eliminatedSome locks completely eliminatedCharging nonpaged/paged pool quotas, allocating and mapping system page table entries, Charging nonpaged/paged pool quotas, allocating and mapping system page table entries, charging commitment of pages, allocating/mapping physical memory through charging commitment of pages, allocating/mapping physical memory through AWE functions AWE functions
New, more efficient locking mechanism (pushlocks)New, more efficient locking mechanism (pushlocks)Doesn’t use spinlocks when no contentionDoesn’t use spinlocks when no contention
Used for object manager and address windowing extensions (AWE) related locksUsed for object manager and address windowing extensions (AWE) related locks
Server 2003:Server 2003:More spinlocks eliminated (context swap, system space, commit)More spinlocks eliminated (context swap, system space, commit)
Further reduction of use of spinlocks & length they are heldFurther reduction of use of spinlocks & length they are held
Scheduling database now per-CPUScheduling database now per-CPUAllows thread state transitions in parallelAllows thread state transitions in parallel
23
WaitingWaitingFlexible wait callsFlexible wait calls
Wait for one or multiple objects in one callWait for one or multiple objects in one call
Wait for multiple can wait for “any” one or “all” at onceWait for multiple can wait for “any” one or “all” at once““All”: all objects must be in the signalled state concurrently to resolve the waitAll”: all objects must be in the signalled state concurrently to resolve the wait
All wait calls include optional timeout argumentAll wait calls include optional timeout argument
Waiting threads consume no CPU timeWaiting threads consume no CPU time
Waitable objects include:Waitable objects include:Events (may be auto-reset or manual reset; may be set or “pulsed”)Events (may be auto-reset or manual reset; may be set or “pulsed”)
Mutexes (“mutual exclusion”, one-at-a-time)Mutexes (“mutual exclusion”, one-at-a-time)
Semaphores (n-at-a-time)Semaphores (n-at-a-time)
TimersTimers
Processes and Threads (signalled upon exit or terminate)Processes and Threads (signalled upon exit or terminate)
Directories (change notification)Directories (change notification)
No guaranteed ordering of wait resolutionNo guaranteed ordering of wait resolutionIf multiple threads are waiting for an object, and only one thread is released (e.g. it’s a If multiple threads are waiting for an object, and only one thread is released (e.g. it’s a mutex or auto-reset event), which thread gets released is unpredictablemutex or auto-reset event), which thread gets released is unpredictable
Typical order of wait resolution is FIFO; however APC delivery may change this orderTypical order of wait resolution is FIFO; however APC delivery may change this order
24
Executive SynchronizationExecutive Synchronization
Waiting on Dispatcher Objects – outside the kernelWaiting on Dispatcher Objects – outside the kernel
Thread waitson an object
handle
Create and initialize thread object
Initialized
Ready
Transition
Waiting
Running
Terminated
Standby
Wait is complete;Set object to
signaled state
Interaction with thread scheduling
25
Interactions between Interactions between SSynchronization and ynchronization and TThread hread DDispatchingispatching
1.1. User mode thread waits on an event object‘s handleUser mode thread waits on an event object‘s handle
2.2. Kernel changes thread‘s scheduling state from ready to waiting and Kernel changes thread‘s scheduling state from ready to waiting and adds thread to wait-listadds thread to wait-list
3.3. Another thread sets the eventAnother thread sets the event
4.4. Kernel wakes up waiting threads; variable priority threads get priority Kernel wakes up waiting threads; variable priority threads get priority boostboost
5.5. Dispatcher re-schedules new thread – it may preempt running thread Dispatcher re-schedules new thread – it may preempt running thread it it has lower priority and issues software interrupt to initiate context it it has lower priority and issues software interrupt to initiate context switchswitch
6.6. If no processor can be preempted, the dispatcher places the ready If no processor can be preempted, the dispatcher places the ready thread in the dispatcher ready queue to be scheduled laterthread in the dispatcher ready queue to be scheduled later
26
What signals an object?What signals an object?
Dispatcher object
System events and resultingstate change
Effect of signaled stateon waiting threads
nonsignaled signaled
Owning thread releases mutex
Resumed thread acquires mutex
Kernel resumes one waiting thread
Mutex (kernel mode)
nonsignaled signaled
Owning thread or otherthread releases mutex
Resumed thread acquires mutex
Kernel resumes one waiting thread
Mutex (exported to user mode)
nonsignaled signaled
One thread releases thesemaphore, freeing a resource
A thread acquires the semaphore.More resources are not available
Kernel resumes one or more waiting threads
Semaphore
27
What signals an object? (contd.)What signals an object? (contd.)
Dispatcher object System events and resultingstate change
Effect of signaled stateon waiting threads
nonsignaled signaled
A thread sets the event
Kernel resumes one or more threads
Kernel resumes one or more waiting threads
Event
nonsignaled signaled
Dedicated thread sets oneevent in the event pair
Kernel resumes theother dedicated thread
Kernel resumes waitingdedicated thread
Event pair
nonsignaled signaled
Timer expires
A thread (re) initializes the timer
Kernel resumes all waiting threads
Timer
28
A thread reinitializesthe thread object
What signals an object? (contd.)What signals an object? (contd.)
Dispatcher object System events and resultingstate change
Effect of signaled stateon waiting threads
nonsignaled signaled
IO operation completes
Thread initiates waiton an IO port
Kernel resumes waitingdedicated thread
File
nonsignaled signaled
Process terminates
A process reinitializesthe process object
Kernel resumes all waiting threads
Process
nonsignaled signaled
Thread terminatesKernel resumes all
waiting threadsThread
29
Further ReadingFurther Reading
Mark E. Russinovich and David A. Solomon, Mark E. Russinovich and David A. Solomon, Microsoft Windows Internals, 4th Edition, Microsoft Windows Internals, 4th Edition, Microsoft Press, 2004. Microsoft Press, 2004.
Chapter 3 - System MechanismsChapter 3 - System MechanismsTrap Dispatching (from pp. 85)Trap Dispatching (from pp. 85)
Synchronization (from pp. 149)Synchronization (from pp. 149)
Kernel Event Tracing (from pp. 175)Kernel Event Tracing (from pp. 175)