Operating Systems WT 2019/20 - uni-potsdam.de · Operating Systems 16 Managing Physical Memory...
Transcript of Operating Systems WT 2019/20 - uni-potsdam.de · Operating Systems 16 Managing Physical Memory...
Operating Systems 2
● Mapping via page table entries● Indirect relationship between virtual pages and physical memory
Address Translation
Virtualpages
Physical memory
Page tableentries
10 10 122231 21 11 012
Page directoryindex
Page tableindex Byte index
x86:
user
system
user
system
Operating Systems 3
32-bit x86 Virtual Address Space● 2 GB per-process
● Address space of one process is not directly reachable from other processes
● 2 GB system-wide● The operating system is loaded here,
and appears in every process’s address space
● The operating system is not a process (though there are processes that do things for the OS, more or less in “background”)
● 2 GB or addressable Memory per process is a serious limitation (think compilers, browsers)
Code: EXE/DLLsData: EXE/DLL
static storage, per-thread user mode stacks, process
heaps, etc.
Code: EXE/DLLsData: EXE/DLL
static storage, per-thread user mode stacks, process
heaps, etc.
00000000
7FFFFFFF
Code: NTOSKRNL, HAL, driversData: kernel stacks,
File system cacheNon-paged pool,Paged pool
Code: NTOSKRNL, HAL, driversData: kernel stacks,
File system cacheNon-paged pool,Paged poolFFFFFFFF
80000000
Process page tables,hyperspace
C0000000
Unique per process,
accessible in user or kernel
mode
System wide,accessible
only in kernel mode
Per process, accessible only in kernel mode
Operating Systems 4
Virtual Address Space Allocation● Virtual address space is sparse
● Address spaces contain reserved, committed, and unused regions● Not every virtual address is mapped onto physical memory
● Unit of protection and usage is one page● On x86, default page size is 4 KB (x86 supports 4KB or 4MB)● On x64, default page size is 4 KB (large pages are 4 MB)
● Reserved regions are referenced in page tables● Committed regions are backed by physical memory (RAM or Storage)
Operating Systems 5
3GB Process Space Option● Provides 3 GB per-process address space
● Commonly used by database servers (for file mapping)
● .EXE must have “large address space aware” flag in image header, or they’re limited to 2 GB (specify at link time or with imagecfg.exe from ResKit)
● Chief “loser” in system space is file system cache
● System still can only utilize at most 4 GB of physical RAM
Unique per process(= per appl.),user mode
.EXE codeGlobals
Per-thread user mode stacks
.DLL codeProcess heaps
Exec, kernel, HAL,drivers, etc.
00000000
BFFFFFFF
FFFFFFFF
C0000000
Unique per process,
accessible in user or kernel
mode
System wide,accessible
only in kernel mode
Per process, accessible
only in kernel mode
Process page tables,hyperspace
Operating Systems 6
Physical Memory Usage in PAE Mode● Virtual address space is still 4 GB, so how can you “use” > 4 GB of memory?
1. Although each process can only address 2 GB, many may be in memory at the same time (e.g. 5 * 2 GB processes = 10 GB)
2. Files in system cache remain in physical memory– memory manager keeps unmapped data in physical memory
●
System Working Set
Assigned to Virtual CacheStandby List
960 MB
Other
~60 GB
64 GB Physical Memory
Operating Systems 7
Address Windowing Extensions● AWE functions allow Windows processes
to allocate large amounts of physical memory and then map “windows” into that memory
● Applications: database servers can cache large databases
● Up to programmer to control● Like memory banks in
embedded controllers● No longer necessary on 64Bit systems
AWE memory
Physical memory
Process virtual memory
AWE memory
AWE memory
Operating Systems 8
64-bit Virtual Address Space (ia64)● 64 bits = 2^64 = 17 billion GB
(16 exabytes) total● (Diagram not to scale)
● 7152 GB default per-process● Pages are 8 Kbytes ● All pointers are now 64 bits wide
User mode space per process
00000000 00000000
E0000000 00000000
FFFFFF00 00000000FFFFFFFF FFFFFFFF
System space page tables
6FC 00000000
Kernel modeper process
1FFFFF00 00000000Process
page tables20000000 00000000
Session space
Session spacepage tables
System space
3FFFFF00 00000000
E0000600 00000000
Operating Systems 9
Process vs. System-Space Addresses
● “Upper half” of page directory for every process contains same entries (with a few exceptions), which point to system-wide page tables
● exceptions are for example the page tables that map the process page tables
Page Directories(one per process)
Sets of per-processpage tables
System-wide page tables
Operating Systems 10
Exercise – Size of Page Tables● Given a Computer System with 40 Bit Words and 40 Bit
Addresses, and a single level Page Table Address Translation, and a page size of 8 KiB with 5 status bits per page table entry:
● What is the size of the virtual address space?● How many entries does a single page table have?● What is the size of a page table entry in Bits?● How much memory does a single page table encompass?
– individual entries take a multiple of machine word size
Operating Systems 11
Page directories & Page tables● Each process has a single page directory
(phys. addr. in KPROCESS block, at 0xC0300000, in cr3 (x86))● cr3 is re-loaded on inter-process context switches● Page directory is composed of page directory entries (PDEs) which
describe state/location of page tables for this process– Page tables are created on demand
● x86: 1024 page tables describe 4GB ● Each process has a private set of page tables● System has one set of page tables
● System PTEs are a finite resource: computed at boot time
Operating Systems 12
Page Table Entries● Page tables are array of Page Table Entries (PTEs)● Valid PTEs have two fields:
● Page Frame Number (PFN)● Flags describing state and protection of the page Res (writable on MP Systems)
ResResGlobalRes (large page if PDE)DirtyAccessedCache disabledWrite throughOwnerWrite (writable on MP Systems)
valid
Reserved bitsare used only when PTE is not valid
31 12 0Page frame number VU P Cw Gi L D A Cd Wt O W
Operating Systems 13
PTE Status and Protection Bits (x86)Name of Bit Meaning on x86Accessed Page has been readCache disabled Disables caching for that pageDirty Page has been written toGlobal Translation applies to all processes
(a translation buffer flush won‘t affect this PTE)Large page Indicates that PDE maps a 4MB page (used to map kernel)Owner Indicates whether user-mode code can access the page of whether the
page is limited to kernel mode accessValid Indicates whether translation maps to page in phys. Mem.Write through Disables caching of writes; immediate flush to diskWrite Uniproc: Indicates whether page is read/write or read-only;
Multiproc: ind. whether page is writeable/write bit in res. bit
Operating Systems 14
Invalid PTEs and their structure● Page file: desired page resides in paging file
● in-page operation is initiated● Demand Zero: pager looks at zero page list;
if list is empty, pager takes list from standby list and zeros it;● PTE format as shown below, but page file number
and offset are zero
Page file offset ProtectionPageFile No 0
TransitionPrototypeValid31 12 11 10 9 5 4 1 0
Operating Systems 15
Invalid PTEs and their structure (contd.)● Transition: the desired page is in memory on either the standby, modified, or modified-no-
write list● Page is removed from the list and added to working set
● Unknown: the PTE is zero, or the page table does not yet exist– examine virtual address space descriptors (VADs) to see whether this virtual address
has been reserved● Build page tables to represent newly committed space
Page Frame Number Protection1
TransitionPrototypeProtectionCache disableWrite throughOwnerWriteValid
31 12 11 10 9 5 4 1 0
1 0
23
Operating Systems 16
Managing Physical Memory● System keeps unassigned physical pages on one of several lists
● Free page list● Modified page list● Standby page list● Zero page list● Bad page list - pages that failed memory test at system startup
● Lists are implemented by entries in the “PFN database”● Maintained as FIFO lists or queues
Operating Systems 17
Paging Dynamics
StandbyPageList
ZeroPageList
FreePageList
ProcessWorking
Sets
page read from disk or kernel allocations
demand zero page faults
working set replacement
ModifiedPageList
modifiedpagewriter
zeropage
thread
“soft”pagefaults
BadPageList
Private pages at process exit
Operating Systems 18
Paging Dynamics● New pages are allocated to working sets from the top of the free or zero
page list● Pages released from the working set due to working set replacement go to
the bottom of:● The modified page list (if they were modified while in the working set)● The standby page list (if not modified)
– Decision made based on “D” (dirty = modified) bit in page table entry
● Association between the process and the physical page is still maintained while the page is on either of these lists
Operating Systems 19
Standby and Modified Page Lists● Modified pages go to modified (dirty) list
● Avoids writing pages back to disk too soon● Unmodified pages go to standby (clean) list● They form a system-wide cache of “pages likely to be needed
again”● Pages can be faulted back into a process from the standby
and modified page list● These are counted as page faults, but not page reads
Operating Systems 20
Modified Page Writer● When modified list reaches certain size, modified page writer system
thread is awoken to write pages out● See MmModifiedPageMaximum● Also triggered when memory is overcomitted (too few free pages)● Does not flush entire modified page list
● Two system threads● One for mapped files, one for the paging file
● Pages move from the modified list to the standby list● E.g. they can still be soft faulted into a working set
Operating Systems 21
Free and Zero Page Lists● Free Page List
● Used for page reads● Private modified pages go here on process exit● Pages contain junk in them (e.g. not zeroed)● On most busy systems, this is empty
● Zero Page List● Used to satisfy demand zero page faults
– References to private pages that have not been created yet● When free page list has 8 or more pages, a priority zero thread is
awoken to zero them● On most busy systems, this is empty too
Operating Systems 22
Paging Dynamics
StandbyPageList
ZeroPageList
FreePageList
ProcessWorking
Sets
page read from disk or kernel allocations
demand zero page faults
working set replacement
ModifiedPageList
modifiedpagewriter
zeropage
thread
“soft”pagefaults
BadPageList
Private pages at process exit
Operating Systems 23
Page Frame Number-Database● One entry (24 bytes) for each physical page
● Describes state of each page in physical memory● Entries for active/valid and transition pages contain:
● Original PTE value (to restore when paged out)● Original PTE virtual address and container PFN● Working set index hint (for the first process...)
● Entries for other pages are linked in:● Free, standby, modified, zeroed, bad lists (parity error will kill kernel)
● Share count (active/valid pages):● Number of PTEs which refer to that page; 1->0: candidate for free list
● Reference count:● Locking for I/O: INC when share count 1->0; DEC when unlocked● Share count = 0 & reference count = 1 is possible● Reference count 1->0: page is inserted in free, standby or modified lists
Operating Systems 24
PFN – states of pages in physical memoryStatus DescriptionActive/valid Page is part of working set (sys/proc), valid PTE points to itTransition Page not owned by a working set, not on any paging list
I/O is in progress on this pageStandby Page belonged to a working set but was removed; not modifiedModified Removed from working set, modified, not yet written to diskModified no write
Modified page, will not be touched by modified page write, used by NTFS for pages containing log entries (explicit flushing)
Free Page is free but has dirty data in it – cannot be given to user process – C2 security requirement
Zeroed Page is free and has been initialized by zero page threadBad Page has generated parity or other hardware errors
Operating Systems 25
Page tables and page frame databasevalid
Invalid:disk address
Invalid:transition
valid
Invalid:disk address
Validvalid
Invalid:transition
Invalid:disk address
Prototype PTE
Process 1 page table
Process 2 page table
Process 3 page table
Active
Standby
Active
Active
Modified
Zeroed
Free
Standby
Modified
Bad
Modifiedno write
Operating Systems 26
Page Fault Handling● Reference to invalid page is called a page fault● Kernel trap handler dispatches:
● Memory manager fault handler (MmAccessFault) called● Runs in context of thread that incurred the fault● Attempts to resolve the fault or
raises exception● Page faults can be caused by variety of conditions● Four basic kinds of invalid Page Table Entries (PTEs)
Operating Systems 27
In-Paging I/O due to Access Faults● Accessing a page that is not resident in memory but on disk in page file/mapped file
● Allocate memory and read page from disk into working set● Occurs when read operation must be issued to a file to satisfy page fault
● Page tables are pageable -> additional page faults possible● In-page I/O is synchronous
● Thread waits until I/O completes● Not interruptible by asynchronous procedure calls
● During in-page I/O: faulting thread does not own critical memory management synchronization objects
Other threads in process may issue VM functions, but:● Another thread could have faulted same page: collided page fault● Page could have been deleted (remapped) from virtual address space● Protection on page may have changed● Fault could have been for prototype PTE and page that maps prototype PTE could have been
out of working set
Operating Systems 28
Reasons for access faults● Accessing page that is on standby or modified list
● Transition the page to process or system working set● Accessing page that has no committed storage
● Access violation● Accessing kernel page from user-mode
● Access violation● Writing to a read-only page
● Access violation
Operating Systems 29
Reasons for access faults (contd.)● Writing to a guard page
● Guard page violation (if a reference to a user-mode stack, perform automatic stack expansion)
● Writing to a copy-on-write page● Make process-private copy of page and replace original in process or system
working set● Referencing a page in system space that is valid but not in the process page directory
(if paged pool expanded after process directory was created)● Copy page directory entry from master system page directory structure and
dismiss exception● On a multiprocessor system: writing to valid page that has not yet been written to
● Set dirty bit in PTE
Operating Systems 30
Reserving & Committing Memory● Optional 2-phase approach to memory allocation:
● Reserve address space (in multiples of page size)● Commit storage in that address space● Can be combined in one call (VirtualAlloc, VirtualAllocEx)
● Reserved memory: ● Range of virtual addresses reserved for future use (contiguous buffer)● Accessing reserved memory results in access violation● Fast, inexpensive
● Committed memory:● Has backing store (pagefile.sys, memory-mapped file)● Either private or mapped into a view of a section● Decommit via VirtualFree, VirtualFreeEx
A thread‘s user-mode stack is constructed usingthis 2-phase approach: initial reserved size is 1MB,only 2 pages are committed: stack & guard page
Operating Systems 31
Soft vs. Hard Page Faults● Types of “soft” page faults:
● Pages can be faulted back into a process from the standby and modified page lists● A shared page that’s valid for one process can be faulted into other processes
● Some hard page faults unavoidable● Process startup (loading of EXE and DLLs)● Normal file I/O done via paging
– Cached files are faulted into system working set● To determine paging vs. normal file I/Os:
● Monitor Memory->Page Reads/sec– Not Memory->Page Faults/sec, as that includes soft page faults
● Subtract System->File Read Operations/sec from Page Reads/sec● Or, use Filemon to determine what file(s) are having paging I/O (asterisk next to I/O
function)● Should not stay high for sustained period
Operating Systems 32
Working Set● Working set: All the physical pages “owned” by a process
● Essentially, all the pages the process can reference without incurring a page fault● Working set limit: The maximum pages the process can own
● When limit is reached, a page must be released for every page that’s brought in (“working set replacement”)
● Default upper limit on size for each process● System-wide maximum calculated & stored in MmMaximumWorkingSetSize
– approximately RAM minus 512 pages (2 MB on x86) minus min size of system working set (1.5 MB on x86)
– Interesting to view (gives you an idea how much memory you’ve “lost” to the OS)● True upper limit: 2 GB minus 64 MB for 32-bit Windows
Operating Systems 33
Working Set List● A process always starts with an empty working set
● It then incurs page faults when referencing a page that isn’t in its working set
● Many page faults may be resolved from memory (to be described later)
PerfMonProcess “WorkingSet”
newer pages older pages
Operating Systems 34
Birth of a Working Set● Pages are brought into memory as a result of page faults
● If the page is not in memory, the appropriate block in the associated file is read in● Physical page is allocated● Block is read into the physical page● Page table entry is filled in● Exception is dismissed● Processor re-executes the instruction that caused the page fault
(and this time, it succeeds)
● The page has now been “faulted into” the process “working set”
Operating Systems 35
Process Creation● New processes need three pages:
● The page directory– Points to itself– Map the page table of the hyperspace– Map system paged and nonpaged areas– Map system cache page table pages
● The page table page for working set● The page for the working set list