Operating Systems WT 2019/20 - uni-potsdam.de · Operating Systems 16 Managing Physical Memory...

36
Operating Systems WT 2019/20 Memory Management

Transcript of Operating Systems WT 2019/20 - uni-potsdam.de · Operating Systems 16 Managing Physical Memory...

Operating Systems WT 2019/20Memory Management

Operating Systems 2

● Mapping via page table entries● Indirect relationship between virtual pages and physical memory

Address Translation

Virtualpages

Physical memory

Page tableentries

10 10 122231 21 11 012

Page directoryindex

Page tableindex Byte index

x86:

user

system

user

system

Operating Systems 3

32-bit x86 Virtual Address Space● 2 GB per-process

● Address space of one process is not directly reachable from other processes

● 2 GB system-wide● The operating system is loaded here,

and appears in every process’s address space

● The operating system is not a process (though there are processes that do things for the OS, more or less in “background”)

● 2 GB or addressable Memory per process is a serious limitation (think compilers, browsers)

Code: EXE/DLLsData: EXE/DLL

static storage, per-thread user mode stacks, process

heaps, etc.

Code: EXE/DLLsData: EXE/DLL

static storage, per-thread user mode stacks, process

heaps, etc.

00000000

7FFFFFFF

Code: NTOSKRNL, HAL, driversData: kernel stacks,

File system cacheNon-paged pool,Paged pool

Code: NTOSKRNL, HAL, driversData: kernel stacks,

File system cacheNon-paged pool,Paged poolFFFFFFFF

80000000

Process page tables,hyperspace

C0000000

Unique per process,

accessible in user or kernel

mode

System wide,accessible

only in kernel mode

Per process, accessible only in kernel mode

Operating Systems 4

Virtual Address Space Allocation● Virtual address space is sparse

● Address spaces contain reserved, committed, and unused regions● Not every virtual address is mapped onto physical memory

● Unit of protection and usage is one page● On x86, default page size is 4 KB (x86 supports 4KB or 4MB)● On x64, default page size is 4 KB (large pages are 4 MB)

● Reserved regions are referenced in page tables● Committed regions are backed by physical memory (RAM or Storage)

Operating Systems 5

3GB Process Space Option● Provides 3 GB per-process address space

● Commonly used by database servers (for file mapping)

● .EXE must have “large address space aware” flag in image header, or they’re limited to 2 GB (specify at link time or with imagecfg.exe from ResKit)

● Chief “loser” in system space is file system cache

● System still can only utilize at most 4 GB of physical RAM

Unique per process(= per appl.),user mode

.EXE codeGlobals

Per-thread user mode stacks

.DLL codeProcess heaps

Exec, kernel, HAL,drivers, etc.

00000000

BFFFFFFF

FFFFFFFF

C0000000

Unique per process,

accessible in user or kernel

mode

System wide,accessible

only in kernel mode

Per process, accessible

only in kernel mode

Process page tables,hyperspace

Operating Systems 6

Physical Memory Usage in PAE Mode● Virtual address space is still 4 GB, so how can you “use” > 4 GB of memory?

1. Although each process can only address 2 GB, many may be in memory at the same time (e.g. 5 * 2 GB processes = 10 GB)

2. Files in system cache remain in physical memory– memory manager keeps unmapped data in physical memory

System Working Set

Assigned to Virtual CacheStandby List

960 MB

Other

~60 GB

64 GB Physical Memory

Operating Systems 7

Address Windowing Extensions● AWE functions allow Windows processes

to allocate large amounts of physical memory and then map “windows” into that memory

● Applications: database servers can cache large databases

● Up to programmer to control● Like memory banks in

embedded controllers● No longer necessary on 64Bit systems

AWE memory

Physical memory

Process virtual memory

AWE memory

AWE memory

Operating Systems 8

64-bit Virtual Address Space (ia64)● 64 bits = 2^64 = 17 billion GB

(16 exabytes) total● (Diagram not to scale)

● 7152 GB default per-process● Pages are 8 Kbytes ● All pointers are now 64 bits wide

User mode space per process

00000000 00000000

E0000000 00000000

FFFFFF00 00000000FFFFFFFF FFFFFFFF

System space page tables

6FC 00000000

Kernel modeper process

1FFFFF00 00000000Process

page tables20000000 00000000

Session space

Session spacepage tables

System space

3FFFFF00 00000000

E0000600 00000000

Operating Systems 9

Process vs. System-Space Addresses

● “Upper half” of page directory for every process contains same entries (with a few exceptions), which point to system-wide page tables

● exceptions are for example the page tables that map the process page tables

Page Directories(one per process)

Sets of per-processpage tables

System-wide page tables

Operating Systems 10

Exercise – Size of Page Tables● Given a Computer System with 40 Bit Words and 40 Bit

Addresses, and a single level Page Table Address Translation, and a page size of 8 KiB with 5 status bits per page table entry:

● What is the size of the virtual address space?● How many entries does a single page table have?● What is the size of a page table entry in Bits?● How much memory does a single page table encompass?

– individual entries take a multiple of machine word size

Operating Systems 11

Page directories & Page tables● Each process has a single page directory

(phys. addr. in KPROCESS block, at 0xC0300000, in cr3 (x86))● cr3 is re-loaded on inter-process context switches● Page directory is composed of page directory entries (PDEs) which

describe state/location of page tables for this process– Page tables are created on demand

● x86: 1024 page tables describe 4GB ● Each process has a private set of page tables● System has one set of page tables

● System PTEs are a finite resource: computed at boot time

Operating Systems 12

Page Table Entries● Page tables are array of Page Table Entries (PTEs)● Valid PTEs have two fields:

● Page Frame Number (PFN)● Flags describing state and protection of the page Res (writable on MP Systems)

ResResGlobalRes (large page if PDE)DirtyAccessedCache disabledWrite throughOwnerWrite (writable on MP Systems)

valid

Reserved bitsare used only when PTE is not valid

31 12 0Page frame number VU P Cw Gi L D A Cd Wt O W

Operating Systems 13

PTE Status and Protection Bits (x86)Name of Bit Meaning on x86Accessed Page has been readCache disabled Disables caching for that pageDirty Page has been written toGlobal Translation applies to all processes

(a translation buffer flush won‘t affect this PTE)Large page Indicates that PDE maps a 4MB page (used to map kernel)Owner Indicates whether user-mode code can access the page of whether the

page is limited to kernel mode accessValid Indicates whether translation maps to page in phys. Mem.Write through Disables caching of writes; immediate flush to diskWrite Uniproc: Indicates whether page is read/write or read-only;

Multiproc: ind. whether page is writeable/write bit in res. bit

Operating Systems 14

Invalid PTEs and their structure● Page file: desired page resides in paging file

● in-page operation is initiated● Demand Zero: pager looks at zero page list;

if list is empty, pager takes list from standby list and zeros it;● PTE format as shown below, but page file number

and offset are zero

Page file offset ProtectionPageFile No 0

TransitionPrototypeValid31 12 11 10 9 5 4 1 0

Operating Systems 15

Invalid PTEs and their structure (contd.)● Transition: the desired page is in memory on either the standby, modified, or modified-no-

write list● Page is removed from the list and added to working set

● Unknown: the PTE is zero, or the page table does not yet exist– examine virtual address space descriptors (VADs) to see whether this virtual address

has been reserved● Build page tables to represent newly committed space

Page Frame Number Protection1

TransitionPrototypeProtectionCache disableWrite throughOwnerWriteValid

31 12 11 10 9 5 4 1 0

1 0

23

Operating Systems 16

Managing Physical Memory● System keeps unassigned physical pages on one of several lists

● Free page list● Modified page list● Standby page list● Zero page list● Bad page list - pages that failed memory test at system startup

● Lists are implemented by entries in the “PFN database”● Maintained as FIFO lists or queues

Operating Systems 17

Paging Dynamics

StandbyPageList

ZeroPageList

FreePageList

ProcessWorking

Sets

page read from disk or kernel allocations

demand zero page faults

working set replacement

ModifiedPageList

modifiedpagewriter

zeropage

thread

“soft”pagefaults

BadPageList

Private pages at process exit

Operating Systems 18

Paging Dynamics● New pages are allocated to working sets from the top of the free or zero

page list● Pages released from the working set due to working set replacement go to

the bottom of:● The modified page list (if they were modified while in the working set)● The standby page list (if not modified)

– Decision made based on “D” (dirty = modified) bit in page table entry

● Association between the process and the physical page is still maintained while the page is on either of these lists

Operating Systems 19

Standby and Modified Page Lists● Modified pages go to modified (dirty) list

● Avoids writing pages back to disk too soon● Unmodified pages go to standby (clean) list● They form a system-wide cache of “pages likely to be needed

again”● Pages can be faulted back into a process from the standby

and modified page list● These are counted as page faults, but not page reads

Operating Systems 20

Modified Page Writer● When modified list reaches certain size, modified page writer system

thread is awoken to write pages out● See MmModifiedPageMaximum● Also triggered when memory is overcomitted (too few free pages)● Does not flush entire modified page list

● Two system threads● One for mapped files, one for the paging file

● Pages move from the modified list to the standby list● E.g. they can still be soft faulted into a working set

Operating Systems 21

Free and Zero Page Lists● Free Page List

● Used for page reads● Private modified pages go here on process exit● Pages contain junk in them (e.g. not zeroed)● On most busy systems, this is empty

● Zero Page List● Used to satisfy demand zero page faults

– References to private pages that have not been created yet● When free page list has 8 or more pages, a priority zero thread is

awoken to zero them● On most busy systems, this is empty too

Operating Systems 22

Paging Dynamics

StandbyPageList

ZeroPageList

FreePageList

ProcessWorking

Sets

page read from disk or kernel allocations

demand zero page faults

working set replacement

ModifiedPageList

modifiedpagewriter

zeropage

thread

“soft”pagefaults

BadPageList

Private pages at process exit

Operating Systems 23

Page Frame Number-Database● One entry (24 bytes) for each physical page

● Describes state of each page in physical memory● Entries for active/valid and transition pages contain:

● Original PTE value (to restore when paged out)● Original PTE virtual address and container PFN● Working set index hint (for the first process...)

● Entries for other pages are linked in:● Free, standby, modified, zeroed, bad lists (parity error will kill kernel)

● Share count (active/valid pages):● Number of PTEs which refer to that page; 1->0: candidate for free list

● Reference count:● Locking for I/O: INC when share count 1->0; DEC when unlocked● Share count = 0 & reference count = 1 is possible● Reference count 1->0: page is inserted in free, standby or modified lists

Operating Systems 24

PFN – states of pages in physical memoryStatus DescriptionActive/valid Page is part of working set (sys/proc), valid PTE points to itTransition Page not owned by a working set, not on any paging list

I/O is in progress on this pageStandby Page belonged to a working set but was removed; not modifiedModified Removed from working set, modified, not yet written to diskModified no write

Modified page, will not be touched by modified page write, used by NTFS for pages containing log entries (explicit flushing)

Free Page is free but has dirty data in it – cannot be given to user process – C2 security requirement

Zeroed Page is free and has been initialized by zero page threadBad Page has generated parity or other hardware errors

Operating Systems 25

Page tables and page frame databasevalid

Invalid:disk address

Invalid:transition

valid

Invalid:disk address

Validvalid

Invalid:transition

Invalid:disk address

Prototype PTE

Process 1 page table

Process 2 page table

Process 3 page table

Active

Standby

Active

Active

Modified

Zeroed

Free

Standby

Modified

Bad

Modifiedno write

Operating Systems 26

Page Fault Handling● Reference to invalid page is called a page fault● Kernel trap handler dispatches:

● Memory manager fault handler (MmAccessFault) called● Runs in context of thread that incurred the fault● Attempts to resolve the fault or

raises exception● Page faults can be caused by variety of conditions● Four basic kinds of invalid Page Table Entries (PTEs)

Operating Systems 27

In-Paging I/O due to Access Faults● Accessing a page that is not resident in memory but on disk in page file/mapped file

● Allocate memory and read page from disk into working set● Occurs when read operation must be issued to a file to satisfy page fault

● Page tables are pageable -> additional page faults possible● In-page I/O is synchronous

● Thread waits until I/O completes● Not interruptible by asynchronous procedure calls

● During in-page I/O: faulting thread does not own critical memory management synchronization objects

Other threads in process may issue VM functions, but:● Another thread could have faulted same page: collided page fault● Page could have been deleted (remapped) from virtual address space● Protection on page may have changed● Fault could have been for prototype PTE and page that maps prototype PTE could have been

out of working set

Operating Systems 28

Reasons for access faults● Accessing page that is on standby or modified list

● Transition the page to process or system working set● Accessing page that has no committed storage

● Access violation● Accessing kernel page from user-mode

● Access violation● Writing to a read-only page

● Access violation

Operating Systems 29

Reasons for access faults (contd.)● Writing to a guard page

● Guard page violation (if a reference to a user-mode stack, perform automatic stack expansion)

● Writing to a copy-on-write page● Make process-private copy of page and replace original in process or system

working set● Referencing a page in system space that is valid but not in the process page directory

(if paged pool expanded after process directory was created)● Copy page directory entry from master system page directory structure and

dismiss exception● On a multiprocessor system: writing to valid page that has not yet been written to

● Set dirty bit in PTE

Operating Systems 30

Reserving & Committing Memory● Optional 2-phase approach to memory allocation:

● Reserve address space (in multiples of page size)● Commit storage in that address space● Can be combined in one call (VirtualAlloc, VirtualAllocEx)

● Reserved memory: ● Range of virtual addresses reserved for future use (contiguous buffer)● Accessing reserved memory results in access violation● Fast, inexpensive

● Committed memory:● Has backing store (pagefile.sys, memory-mapped file)● Either private or mapped into a view of a section● Decommit via VirtualFree, VirtualFreeEx

A thread‘s user-mode stack is constructed usingthis 2-phase approach: initial reserved size is 1MB,only 2 pages are committed: stack & guard page

Operating Systems 31

Soft vs. Hard Page Faults● Types of “soft” page faults:

● Pages can be faulted back into a process from the standby and modified page lists● A shared page that’s valid for one process can be faulted into other processes

● Some hard page faults unavoidable● Process startup (loading of EXE and DLLs)● Normal file I/O done via paging

– Cached files are faulted into system working set● To determine paging vs. normal file I/Os:

● Monitor Memory->Page Reads/sec– Not Memory->Page Faults/sec, as that includes soft page faults

● Subtract System->File Read Operations/sec from Page Reads/sec● Or, use Filemon to determine what file(s) are having paging I/O (asterisk next to I/O

function)● Should not stay high for sustained period

Operating Systems 32

Working Set● Working set: All the physical pages “owned” by a process

● Essentially, all the pages the process can reference without incurring a page fault● Working set limit: The maximum pages the process can own

● When limit is reached, a page must be released for every page that’s brought in (“working set replacement”)

● Default upper limit on size for each process● System-wide maximum calculated & stored in MmMaximumWorkingSetSize

– approximately RAM minus 512 pages (2 MB on x86) minus min size of system working set (1.5 MB on x86)

– Interesting to view (gives you an idea how much memory you’ve “lost” to the OS)● True upper limit: 2 GB minus 64 MB for 32-bit Windows

Operating Systems 33

Working Set List● A process always starts with an empty working set

● It then incurs page faults when referencing a page that isn’t in its working set

● Many page faults may be resolved from memory (to be described later)

PerfMonProcess “WorkingSet”

newer pages older pages

Operating Systems 34

Birth of a Working Set● Pages are brought into memory as a result of page faults

● If the page is not in memory, the appropriate block in the associated file is read in● Physical page is allocated● Block is read into the physical page● Page table entry is filled in● Exception is dismissed● Processor re-executes the instruction that caused the page fault

(and this time, it succeeds)

● The page has now been “faulted into” the process “working set”

Operating Systems 35

Process Creation● New processes need three pages:

● The page directory– Points to itself– Map the page table of the hyperspace– Map system paged and nonpaged areas– Map system cache page table pages

● The page table page for working set● The page for the working set list

Operating Systems 36

Paging Dynamics

StandbyPageList

ZeroPageList

FreePageList

ProcessWorking

Sets

page read from disk or kernel allocations

demand zero page faults

working set replacement

ModifiedPageList

modifiedpagewriter

zeropage

thread

“soft”pagefaults

BadPageList

Private pages at process exit