Windows CE Real-Time Performance Architecture John Hatch Program Manager for CE Kernel Microsoft...
-
Upload
alanna-harbach -
Category
Documents
-
view
215 -
download
1
Transcript of Windows CE Real-Time Performance Architecture John Hatch Program Manager for CE Kernel Microsoft...
Windows CE Real-Time Windows CE Real-Time Performance ArchitecturePerformance Architecture
John HatchJohn HatchProgram Manager for CE KernelProgram Manager for CE KernelMicrosoft CorporationMicrosoft Corporation
AgendaAgenda
Real-Time OverivewReal-Time Overivew
Interrupt ModelInterrupt Model
FeaturesFeatures
Taking ControlTaking Control
Measurement ToolsMeasurement Tools
AgendaAgenda
Real-Time OverviewReal-Time Overview
Interrupt ModelInterrupt Model
FeaturesFeatures
Taking ControlTaking Control
Measurement ToolsMeasurement Tools
Real-Time OverviewReal-Time Overview
Real timeReal timeApplications where specific timings Applications where specific timings are requestedare requested
Hard real timeHard real timeApplications where system fails if timings are Applications where system fails if timings are not metnot met
Soft real timeSoft real timeApplications where system tolerates Applications where system tolerates large latencieslarge latencies
Actual timing requirements are Actual timing requirements are system-specificsystem-specific
Real Time Defined By OMACReal Time Defined By OMAC
Hard Real-Time
Cycle Variation or Jitter (Cycle Variation or Jitter (µµs)s)
500 us500 us
1 ms1 ms
5 ms5 ms
10 ms10 ms
20 ms20 ms
100 ms100 ms
Cyc
le T
ime
Cyc
le T
ime
Hard RealTime
00 1,000 µs1,000 µs 5,000 µs5,000 µs 10,000 µs10,000 µs100 µs100 µs
Soft Real-TimeSoft Real-TimeWindowsCE 2.X
Windows NT
Windows CE .net
90%90%AppsApps
OMAC represents Industrial Automation CommunityOMAC represents Industrial Automation Community
Real World ExampleReal World Example
Consumers wanted to know if CE Consumers wanted to know if CE is HARD real-timeis HARD real-time
Want to know if CE was capable of Want to know if CE was capable of running radio and UIrunning radio and UI
Concerned that CE was not HARD real-time Concerned that CE was not HARD real-time enough to meet the requirementsenough to meet the requirements
RequirementsRequirementsRun cellular radio DSP Run cellular radio DSP
Meet “tight” timing requirementsMeet “tight” timing requirements
ARM9 250Mhz ARM9 250Mhz
Full Windows CE UIFull Windows CE UI
And play videoAnd play video
Real World Timing RequirementsReal World Timing Requirements
So what where the actually requirements?So what where the actually requirements?Interrupt every 4.6 msInterrupt every 4.6 ms
Allowable jitter < 0.5msAllowable jitter < 0.5ms
Interrupt every 4.6 ms
0.5 ms JitterActual Application Requirements
Windows CE Test ResultsWindows CE Test Results
Respond time test using the Respond time test using the following configurationfollowing configuration
Samsung SMDK2410 development boardSamsung SMDK2410 development board
200 mHz ARM with 16x16 cache 200 mHz ARM with 16x16 cache
Windows CE 5.0 with full UIWindows CE 5.0 with full UI
Running a WMV videoRunning a WMV video
ISR starts IST starts
minimum 1.2 µs 31.7 µsaverage 3.3 µs 67.2 µsMaximum 13.3 µs 103.0 µsTime in microseconds (µs)
Windows CE Real-Time Test Results
What We LearnedWhat We Learned
In terms of the 0.5 ms jitter aloneIn terms of the 0.5 ms jitter aloneCE’s longest ISR response time was 13.3 µsCE’s longest ISR response time was 13.3 µs
2.6% of max allowed2.6% of max allowed
CE’s longest IST response time was 103 µsCE’s longest IST response time was 103 µs20.6% of max allowed20.6% of max allowed
ConclusionConclusionCE’s response time was well within CE’s response time was well within the requirementsthe requirements
Project went ahead and is progressing wellProject went ahead and is progressing well
AgendaAgenda
Real-Time OverviewReal-Time Overview
Interrupt ModelInterrupt Model
FeaturesFeatures
Taking ControlTaking Control
Measurement ToolsMeasurement Tools
DefinitionsDefinitions
InterruptInterruptHardware signal indicating an event has Hardware signal indicating an event has happened and needs to be servicedhappened and needs to be serviced
LatencyLatencyThe time from when the interrupt occurred to The time from when the interrupt occurred to when the event is servicedwhen the event is serviced
JitterJitterRange of allowable variation in service timeRange of allowable variation in service time
Threads, Process, And DriversThreads, Process, And Drivers
ThreadThreadA unit of execution A unit of execution
A piece of code that can be scheduled to run by the kernelA piece of code that can be scheduled to run by the kernel
May be launch by a process or a driverMay be launch by a process or a driver
ProcessProcessA collection of threads with a common execution environmentA collection of threads with a common execution environment
A process has at least on threadA process has at least on thread
Launch from an executable fileLaunch from an executable file
Can create threads to handle interruptsCan create threads to handle interrupts
DriverDriverA DLL, (dynamically loaded library) loaded into the device A DLL, (dynamically loaded library) loaded into the device manager processmanager process
Supports the Device I/O Control InterfaceSupports the Device I/O Control Interface
Can create threads to handle interruptsCan create threads to handle interrupts
ISRs And ISTsISRs And ISTs
Interrupt Service Routine (ISR)Interrupt Service Routine (ISR)A piece of code loaded into the kernel A piece of code loaded into the kernel
Assigned to a particular IRQAssigned to a particular IRQ
Called immediately to handle the hardware interruptCalled immediately to handle the hardware interrupt
Should be written to run quickly with few outside dependenciesShould be written to run quickly with few outside dependencies
Can be chained together if multiple device might use the same IRQCan be chained together if multiple device might use the same IRQ
Notifies the kernel which IST should runNotifies the kernel which IST should run
Interrupt Service Thread (IST)Interrupt Service Thread (IST)A thread registered to handle an interruptA thread registered to handle an interrupt
Can be created by either a process or a driverCan be created by either a process or a driver
Scheduled like any other thread on the systemScheduled like any other thread on the system
Should be written to do the bulk of the interrupt handling workShould be written to do the bulk of the interrupt handling work
ISRs And ISTs Work TogetherISRs And ISTs Work Together
ISRs and ISTs usually work as pairsISRs and ISTs usually work as pairsISR handles the critical workISR handles the critical work
IST handles the bulk of the workIST handles the bulk of the work
They synchronize by using an Event ObjectThey synchronize by using an Event ObjectThe IST creates an Event Object The IST creates an Event Object
Uses the API WaitForSingleObject to sit and wait on Uses the API WaitForSingleObject to sit and wait on that object to be signaledthat object to be signaled
The ISR tells the kernel which object to signalThe ISR tells the kernel which object to signal
Which unblocks the IST and makes it runableWhich unblocks the IST and makes it runable
If the IST is the highest priority runable thread, it If the IST is the highest priority runable thread, it will get scheduled to run immediatelywill get scheduled to run immediately
Priority LevelsPriority Levels
Windows CE 5.0 has 256 levels of priorityWindows CE 5.0 has 256 levels of priority
Level 0 is the highest and 255 is the lowestLevel 0 is the highest and 255 is the lowestThe old CE model of 8 levels now map to the lowest The old CE model of 8 levels now map to the lowest 8 of the new model8 of the new model
The default level for a thread is 252The default level for a thread is 252
Levels 0 through 248 can be reserved by OEMLevels 0 through 248 can be reserved by OEM
Levels Description0 through 96 Real-time above drivers
97 through 152 Default used by CE device drivers
153 through 247 Real-time below drivers
248 through 255 Non-real-time priorities
SchedulerScheduler
Is responsible for determining which thread will runIs responsible for determining which thread will runHas a queue for threads for each priority levelHas a queue for threads for each priority level
Will always schedule the first thread at the highest priority levelWill always schedule the first thread at the highest priority level
A thread gets to run for set length A thread gets to run for set length of time, called a quantumof time, called a quantum
Typically 100 millisecondsTypically 100 milliseconds
A quantum of 0 means the quantum never runs outA quantum of 0 means the quantum never runs out The thread can run until blocked or interruptedThe thread can run until blocked or interrupted
A Thread runs until—A Thread runs until—Its quantum runs outIts quantum runs out
It is interrupted by a higher priority threadIt is interrupted by a higher priority thread
Its blocked by a resource contention Its blocked by a resource contention Such as access to a critical section or a mutexSuch as access to a critical section or a mutex
Fitting It All TogetherFitting It All Together
Interrupt Handler calls
registered ISRInterrupt Occurs ISR runs, tells kernel
which event to signal
Kernel signals event, IST becomes runnable
Scheduler runsthe IST
IST runs and resets the interrupt
Interrupt ArchitectureInterrupt ArchitectureK
ern
el
Ke
rne
lH
WH
W
All Higher-Priority Int. EnabledAll Except
IDAll
OA
LO
AL
Th
read
Th
read
ISR
KCall + Scheduler (SetEvent)
ISH
ISR ISR LatencyLatency
IST
IST LatencyIST Latency
IDID
Latency BehaviorLatency Behavior
MAXIMUM ISR LATENCY
OAL
KERNEL
ISR
Scheduler
ISH
NormalThread
ISH
Scheduler
IST
Normal Thread
ISR Latency IST Latency Interrupts Disabled
Preemption Disabled
INTERRUPT!
Int Off
MAXIMUM IST LATENCY
OAL
KERNEL
ISR
Scheduler
ISH
Normal Thread
ISH
Scheduler
IST
Normal Thread
ISR Latency IST Latency Interrupts Disabled
Preemption Disabled
INTERRUPT!
KCall KCall
Maximum ISR Latency PathMaximum ISR Latency Path
Maximum IST Latency PathMaximum IST Latency Path
Where Latency OccursWhere Latency Occurs
For an ISRFor an ISRTime required for the kernel to vector Time required for the kernel to vector to the ISR handler (normal)to the ISR handler (normal)
Saving register, etc. Saving register, etc.
The amount of time that interrupts are The amount of time that interrupts are turned off (variation)turned off (variation)
For an ISTFor an ISTTime to schedule a thread (normal)Time to schedule a thread (normal)
Time spent in a KCall (variation)Time spent in a KCall (variation)KCall = Kernel code executing with KCall = Kernel code executing with pre-emption disabledpre-emption disabled
Worst Case IST LatencyWorst Case IST Latency
General caseGeneral caseIn the thread scheduler KCall and take an In the thread scheduler KCall and take an IRQ that will trigger a different ISTIRQ that will trigger a different IST
Software assisted TLB/cache miss Software assisted TLB/cache miss on the IST threadon the IST thread
Improvements To LatencyImprovements To Latency
Non-preemptable code reducedNon-preemptable code reducedLarge Kcalls split apart and state saved to resume correctlyLarge Kcalls split apart and state saved to resume correctly
Reduces the latency for an ISTReduces the latency for an IST
Kernel data structures moved to statically mapped Kernel data structures moved to statically mapped virtual addressvirtual address
This avoids any TLB misses associated This avoids any TLB misses associated with accessing its datawith accessing its data
Special-cased ISTsSpecial-cased ISTsAn event registering for an IST can only be An event registering for an IST can only be used in a WaitForSingleObjectused in a WaitForSingleObject
New priority inversion model reduces the upper boundsNew priority inversion model reduces the upper boundsWas a large KCallWas a large KCall
AgendaAgenda
Real-Time OverviewReal-Time Overview
Interrupt ModelInterrupt Model
FeaturesFeatures
Taking ControlTaking Control
Measurement ToolsMeasurement Tools
Nested InterruptsNested Interrupts
Higher priority ISRs can preempt lower ISRsHigher priority ISRs can preempt lower ISRs
Based on support by the CPU, additional Based on support by the CPU, additional hardware, and/or OEM codehardware, and/or OEM code
ARMARMUses a vectored interrupt tableUses a vectored interrupt table
Single CPU interrupt level with an Interrupt registerSingle CPU interrupt level with an Interrupt registerNo built in concept of priority IRQNo built in concept of priority IRQ
Except FIQExcept FIQ
Interrupts are not turned on before entering ISRInterrupts are not turned on before entering ISROEM can re-enable CPU interruptOEM can re-enable CPU interrupt
OEMs can prioritize the interrupts with bit masks to OEMs can prioritize the interrupts with bit masks to turn on and off the different interruptsturn on and off the different interrupts
Shared InterruptsShared Interrupts
The hardware design might attach several The hardware design might attach several devices to the same interrupt linedevices to the same interrupt lineMultiple ISRs can be chained together to Multiple ISRs can be chained together to handle shared interruptshandle shared interruptsEach ISR in turn determines if it can Each ISR in turn determines if it can handle the interrupthandle the interrupt
If it can, it does its work and either If it can, it does its work and either completes the interrupt or the SYSINTR completes the interrupt or the SYSINTR indicating which IST is to runindicating which IST is to runIf not, it returns SYSINTR_CHAIN If not, it returns SYSINTR_CHAIN indicating the kernel should try the indicating the kernel should try the next ISR in the chainnext ISR in the chain
Priority InheritancePriority Inheritance
Higher priority threads can get stuck waiting for a Higher priority threads can get stuck waiting for a lower priority thread to release a resourcelower priority thread to release a resource
Such as a critical section, semaphore, or mutexSuch as a critical section, semaphore, or mutexCause priority inversionCause priority inversion
Kernel detects priority inversion and handles it Kernel detects priority inversion and handles it with priority inheritance, or boostingwith priority inheritance, or boosting
The lower priority thread inherits the higher The lower priority thread inherits the higher priority thread’s prioritypriority thread’s priorityIts quantum is set to 0, which lets it run to completionIts quantum is set to 0, which lets it run to completion
Supports only one level of inheritanceSupports only one level of inheritanceKernel will only boost one threadKernel will only boost one threadIf the boosted thread is also in turn block by a If the boosted thread is also in turn block by a third thread, the thread third is not boostedthird thread, the thread third is not boosted
Thread QuantumThread Quantum
Per thread quantumPer thread quantum
Default set by the OEM in the OALDefault set by the OEM in the OALdwDefaultThreadQuantumdwDefaultThreadQuantum
APIs to set QuantumAPIs to set QuantumCe(Set/Get)ThreadQuantumCe(Set/Get)ThreadQuantum
Quantum of 0 sets thread to Quantum of 0 sets thread to run-to-completionrun-to-completion
At any priorityAt any priority
Preempted only by higher priority threadsPreempted only by higher priority threads
System TickSystem Tick
1 ms timer tick in normal mode1 ms timer tick in normal mode
Tick interrupt causes a rescheduleTick interrupt causes a rescheduleWill run next highest priority runnable threadWill run next highest priority runnable thread
Sleep(N) will generally wake up in Sleep(N) will generally wake up in N to N + 1 msN to N + 1 ms
In Idle mode system tick is reset to In Idle mode system tick is reset to next scheduled eventnext scheduled event
On system tick check for reschedule or nopOn system tick check for reschedule or nop
Full Kernel ModeFull Kernel Mode
All threads are running in kernel modeAll threads are running in kernel modeSecurity checks are disabledSecurity checks are disabled
No need to call SetKModeNo need to call SetKMode
Entire system is open to all processesEntire system is open to all processesAll statically mapped virtual addressesAll statically mapped virtual addresses
Virtual protection is still in placeVirtual protection is still in place
Optimizations for high traffic functionOptimizations for high traffic functionFor example a router network boxFor example a router network box
AgendaAgenda
Real-Time OverviewReal-Time Overview
Interrupt ModelInterrupt Model
FeaturesFeatures
Taking ControlTaking Control
Measurement ToolsMeasurement Tools
Taking ControlTaking Control
Real-time developers want to Real-time developers want to retain control at all timesretain control at all times
Control of the scheduleControl of the schedule
Control is managed by understanding—Control is managed by understanding—The hardwareThe hardware
The OSThe OS
Writing code to make optimal use of both Writing code to make optimal use of both features is key to real-time performancefeatures is key to real-time performance
Understanding The HardwareUnderstanding The Hardware
Accessing hardware can delay Accessing hardware can delay ISRs and ISTsISRs and ISTs
Same CPUs on different boards can Same CPUs on different boards can produce a wide range of resultsproduce a wide range of results
Devices and associated drivers can Devices and associated drivers can produce a wide range of delaysproduce a wide range of delays
Understand The HardwareUnderstand The Hardware
Understand device accessUnderstand device accessI/O-based access may incur a penaltyI/O-based access may incur a penalty
Certain devices can lock out a bus for Certain devices can lock out a bus for many microsecondsmany microseconds
For example on x86 avoid access to For example on x86 avoid access to the CMOS RTCthe CMOS RTC
Use a software RTCUse a software RTC
Understanding The OSUnderstanding The OS
Priority based preemptive thread schedulerPriority based preemptive thread schedulerVirtual memory systemVirtual memory system
Provides protectionProvides protectionThere is some overheadThere is some overhead
Synchronization ObjectsSynchronization ObjectsCritical Sections, Mutexs, Semaphores, MSQueuesCritical Sections, Mutexs, Semaphores, MSQueuesCan cause your thread to blockCan cause your thread to block
System call interactionsSystem call interactionsDemand paging of non-XIP codeDemand paging of non-XIP codeStack memory reclaimingStack memory reclaiming
Can delay thread executionCan delay thread execution
Going Idle can delay threadsGoing Idle can delay threads
Gaining ControlGaining Control
Separate User Interface operations from Separate User Interface operations from Real-time threadsReal-time threads
Keeping UI calls out of the real-time threads Keeping UI calls out of the real-time threads prevents them from being blocked by the UIprevents them from being blocked by the UI
User Interface involves many interactions User Interface involves many interactions across the OSacross the OS
It can block threadsIt can block threads
Performance of UI threads is affected by Performance of UI threads is affected by all UI applicationsall UI applications
Use shared buffers or MSQueues to Use shared buffers or MSQueues to communicate between UI and RT threadscommunicate between UI and RT threads
Gaining ControlGaining Control
Memory and objectsMemory and objectsPreallocate all memoryPreallocate all memory
Preallocate all threads, sync objectsPreallocate all threads, sync objects
Thread schedulingThread schedulingSet the appropriate prioritySet the appropriate priority
Set the appropriate quantumSet the appropriate quantumUse a Quantum of 0 to ‘run-to-completion’Use a Quantum of 0 to ‘run-to-completion’
Use DisableThreadLibraryCallsUse DisableThreadLibraryCallsPrevent thread notifications to DLLsPrevent thread notifications to DLLs
Gaining ControlGaining Control
Avoid making system calls on your Avoid making system calls on your real-time threadreal-time threadDon’t use SetTimer as a real-time timerDon’t use SetTimer as a real-time timerAvoid priority inversion conditionsAvoid priority inversion conditions
Use Event tracking/Kernel trackerUse Event tracking/Kernel tracker
Use dwNKMaxPrioNoScav to prevent stack Use dwNKMaxPrioNoScav to prevent stack space recovery from real-time threadsspace recovery from real-time threadsTrusted Security model and real-time Trusted Security model and real-time performance do not mixperformance do not mix
Security checks slow down untrusted applicationsSecurity checks slow down untrusted applicationsLaunch RT threads from a Trusted process or driverLaunch RT threads from a Trusted process or driver
Gaining ControlGaining Control
Disable Idle processingDisable Idle processingWhen OS calls OEMIdle return immediately When OS calls OEMIdle return immediately instead of sleeping the deviceinstead of sleeping the device
Disable demand pagingDisable demand pagingLoadDriverLoadDriver
Locks in a single DLLLocks in a single DLL
Set configuration in Set configuration in CONFIG.BIB ROMFLAGSCONFIG.BIB ROMFLAGS
Set to 0x0001Set to 0x0001Locks in all modulesLocks in all modules
File system block driver can disallowFile system block driver can disallowDon’t set the flag DISK_INFO_FLAG_PAGEABLEDon’t set the flag DISK_INFO_FLAG_PAGEABLE
AgendaAgenda
Real-Time OverviewReal-Time Overview
Interrupt ModelInterrupt Model
FeaturesFeatures
Taking ControlTaking Control
Measurement ToolsMeasurement Tools
ILTimingILTiming
ILTimingILTimingSoftware-based real-time measurement toolSoftware-based real-time measurement tool
Measures both ISR and IST latenciesMeasures both ISR and IST latenciesISR latencyISR latency
From IRQ to ISRFrom IRQ to ISR
IST latencyIST latencyFrom the end of the ISR to the start of the ISTFrom the end of the ISR to the start of the IST
Enabled for all sample platformsEnabled for all sample platforms
Varying system loadsVarying system loads
OSBenchOSBench
Scheduler performance-timing testsScheduler performance-timing tests
Enables you to determine how long it Enables you to determine how long it takes to perform a basic kernel takes to perform a basic kernel tasks such as—tasks such as—
Acquire or release a critical section Acquire or release a critical section
Wait or signal an event Wait or signal an event
Create a semaphore or mutex Create a semaphore or mutex
Yield a thread Yield a thread
Call system APIs Call system APIs
Kernel TrackerKernel Tracker
Shows interaction between processes, threads, and interrupts Shows interaction between processes, threads, and interrupts Track interruptsTrack interruptsTLB missesTLB missesPriority inversion Priority inversion Thread state such as running, blocked, sleeping, and migrating Thread state such as running, blocked, sleeping, and migrating
SummarySummary
Windows CE is real-timeWindows CE is real-time
Windows CE provides all the Windows CE provides all the functionality needed to qualify as a functionality needed to qualify as a real-time operating systemreal-time operating system
Windows CE provides tools to optimize Windows CE provides tools to optimize your real-time platformyour real-time platform
Real WorldReal World
ZMP Nuvo Robot
Real WorldReal World
KUKA Roboter KUKA Roboter Launching CeWin to help customers build Launching CeWin to help customers build blended real-time solutions based on blended real-time solutions based on Windows XP using Windows CE as the Windows XP using Windows CE as the real-time schedulerreal-time scheduler
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
John.Hatch @ Microsoft.com