Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.
-
Upload
moris-jenkins -
Category
Documents
-
view
222 -
download
1
Transcript of Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.
Computer Architecture 2008 – VM1
Computer Architecture
Virtual Memory
Dr. Lihu Rappoport
Computer Architecture 2008 – VM2
Virtual Memory
Provides the illusion of a large memory
Different machines have different amount of physical memory– Allows programs to run regardless of actual physical memory size
The amount of memory consumed by each process is dynamic– Allow adding memory as needed
Many processes can run on a single machine– Provide each process its own memory space
– Prevents a process from accessing the memory of other processes running on the same machine
– Allows the sum of memory spaces of all process to be larger than physical memory
Basic terminology– Virtual Address Space: address space used by the programmer
– Physical Address: actual physical memory address space
Computer Architecture 2008 – VM3
Virtual Memory: Basic Idea
Divide memory (virtual and physical) into fixed size blocks
– Pages in Virtual space, Frames in Physical space
– Page size = Frame size
– Page size is a power of 2: page size = 2k
All pages in the virtual address space are contiguous
Pages can be mapped into physical Frames in any order
Some of the pages are in main memory (DRAM), some of the pages are on disk
All programs are written using Virtual Memory Address Space
The hardware does on-the-fly translation between virtual and physical address spaces
– Use a Page Table to translate between Virtual and Physical addresses
Computer Architecture 2008 – VM4
Main memory can act as a cache for the secondary storage (disk)
Advantages:– illusion of having more physical memory– program relocation – protection
Virtual Memory
Virtual Addresses Physical Addresses
Address Translation
Disk Addresses
Computer Architecture 2008 – VM5
Virtual to Physical Address translation
31
Page offset
011
Virtual Page Number
Page offset
11 0
Physical Frame Number
29
Virtual Address
Physical Address
V D Frame number
1
Page table basereg
0
Valid bit
Dirty bit
12
AC
Access Control
12
Page size: 212 byte =4K byte
Computer Architecture 2008 – VM6
Page Tables
Valid
1
Physical Memory
Disk
Page TablePhysical Page
Or Disk Address
1
1
1
1
1
11
1
0
0
0
Virtual page number
Computer Architecture 2008 – VM7
If V = 1 then
page is in main memory at frame address stored in table
Fetch data
else (page fault)
need to fetch page from disk
causes a trap, usually accompanied by a context switch:
current process suspended while page is fetched from disk
Access Control (R = Read-only, R/W = read/write, X = execute only)
If kind of access not compatible with specified access rights then protection_violation_fault
causes trap to hardware, or software fault handler
Missing item fetched from secondary memory only on the occurrence of a fault demand load policy
Address Mapping Algorithm
Computer Architecture 2008 – VM8
Page Replacement Algorithm
Not Recently Used (NRU)– Associated with each page is a reference flag such that
ref flag = 1 if the page has been referenced in recent past
If replacement is needed, choose any page frame such that its reference bit is 0. – This is a page that has not been referenced in the recent past
Clock implementation of NRU:
While (PT[LRP].NRU) { PT[LRP].NRU LRP++ (mod table size) }
1 01 000
page table entry
Ref bit
1 0
Possible optimization: search for a page that is both not recently
referenced AND not dirty
Computer Architecture 2008 – VM9
Page Faults
Page faults: the data is not in memory retrieve it from disk– The CPU must detect situation
– The CPU cannot remedy the situation (has no knowledge of the disk)
– CPU must trap to the operating system so that it can remedy the situation
– Pick a page to discard (possibly writing it to disk)
– Load the page in from disk
– Update the page table
– Resume to program so HW will retry and succeed!
Page fault incurs a huge miss penalty – Pages should be fairly large (e.g., 4KB)
– Can handle the faults in software instead of hardware
– Page fault causes a context switch
– Using write-through is too expensive so we use write-back
Computer Architecture 2008 – VM10
Optimal Page Size
Minimize wasted storage– Small page minimizes internal fragmentation
– Small page increase size of page table
Minimize transfer time– Large pages (multiple disk sectors) amortize access cost
– Sometimes transfer unnecessary info
– Sometimes prefetch useful data
– Sometimes discards useless data early
General trend toward larger pages because– Big cheap RAM
– Increasing memory / disk performance gap
– Larger address spaces
Computer Architecture 2008 – VM11
Translation Lookaside Buffer (TLB)
Page table resides in memory each translation requires a memory access
TLB
– Cache recently used PTEs
– speed up translation
– typically 128 to 256 entries
– usually 4 to 8 way associative
– TLB access time is comparable to L1 cache access time
Yes
NoTLB Hit ?
AccessPage Table
Virtual Address
Physical Addresses
TLB Access
Computer Architecture 2008 – VM12
TLB is a cache for recent address translations:
Making Address Translation Fast
Valid
1
1
1
1
0
1
1
0
1
1
0
1
1
1
1
1
0
1
Physical Memory
Disk
Virtual page number
Page Table
Valid Tag Physical PageTLB
Physical PageOr
Disk Address
Computer Architecture 2008 – VM13
TLB Access
Tag Set
Offset
Set#
Hit/Miss
Way MUX
PTE
1
1
1
1
1
1
11
1
11
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
====
Way 0 Way 1 Way 2 Way 3 Way 0 Way 1 Way 2 Way 3
Virtual page number
Computer Architecture 2008 – VM14
Virtual Memory And Cache
Yes
No
TLB Access
TLB Hit ?Access
Page Table
Access Cache
Virtual Address
Cache Hit ?
Yes
No AccessMemory
Physical Addresses
Data
TLB access is serial with cache access
Computer Architecture 2008 – VM15
More On Page Swap-out DMA copies the page to the disk controller
– Reads each byte: Executes snoop-invalidate for each byte in the cache (both L1 and L2) If the byte resides in the cache:
if it is modified reads its line from the cache into memory invalidates the line
– Writes the byte to the disk controller
– This means that when a page is swapped-out of memory All data in the caches which belongs to that page is invalidated The page in the disk is up-to-date
The TLB is snooped– If the TLB hits for the swapped-out page, TLB entry is invalidated
In the page table – The valid bit in the PTE entry of the swapped-out pages set to 0
– All the rest of the bits in the PTE entry may be used by the operating system for keeping the location of the page in the disk
Computer Architecture 2008 – VM16
Overlapped TLB & Cache Access
#Set is not contained within the Page Offset
The #Set is not known until the physical page number is known
Cache can be accessed only after address translation done
Virtual Memory view of a Physical Address
Cache view of a Physical Address
0
Page offset
11
Page Number
1229
0
disp
13
tag
1429 5
set
6
Computer Architecture 2008 – VM17
Overlapped TLB & Cache Access (cont)
In the above example #Set is contained within the Page Offset
The #Set is known immediately
Cache can be accessed in parallel with address translation
Once translation is done, match upper bits with tags
Limitation: Cache ≤ (page size × associativity)
Virtual Memory view of a Physical Address
Cache view of a Physical Address
0
Page offset
11
Page Number
1229
029 5
disptag set
61112
Computer Architecture 2008 – VM18
Overlapped TLB & Cache Access (cont)
Assume 4K byte per page bits [11:0] are not translated
Assume cache is 32K Byte, 2 way set-associative, 64 byte/line– (215/ 2 ways) / (26 bytes/line) = 215-1-6 = 28 = 256 sets
Physical_addr[13:12] may be different than virtual_addr[13:12] – Tag is comprised of bits [31:12] of the physical address
The tag may mis-match bits [13:12] of the physical address– Cache miss allocate missing line according to its virtual set address
and physical tag
0
Page offset
11
Page Number
1229
0
disp
13
tag
1420 5
set
6
Computer Architecture 2008 – VM19
Context Switch
Each process has its own address space– Each process has its own page table
– When the OS allocates to each process frames in physical memory, and updates the page table of each process
– A process cannot access physical memory allocated to another process Unless the OS deliberately allocates the same physical frame to 2
processes (for memory sharing)
On a context switch– Save the current architectural state to memory
Architectural registers Register that holds the page table base address in memory
– Flush the TLB
– Load the new architectural state from memory Architectural registers Register that holds the page table base address in memory
Computer Architecture 2008 – VM20
VM in VAX: Address Format
Page size: 29 byte = 512 bytes
31
Page offset
08
Virtual Page Number
Virtual Address930 29
0 0 - P0 process space (code and data)0 1 - P1 process space (stack)1 0 - S0 system space1 1 - S1
Page offset
8 0
Physical Frame Number
29
Physical Address
9
Computer Architecture 2008 – VM21
VM in VAX: Virtual Address Spaces
Process0 Process1 Process2 Process3
P0 process code & global vars grow upward
P1 process stack & local vars grow downward
S0 system space grows upward, generally static
0
7FFFFFFF80000000
Computer Architecture 2008 – VM22
Page Table Entry (PTE)
V PROT M Z OWN S S
0
Physical Frame Number
31 20
Valid bit =1 if page mapped to main memory, otherwise page fault: • Page on the disk swap area • Address indicates the page location on the disk
4 Protection bits
Modified bit
3 ownership bits
Indicate if the line was cleaned (zero)
Computer Architecture 2008 – VM23
System Space Address Translation
00
10 Page offset8 029 9
VPN
SBR (System page table base physical address)
+
VPN0
PTE physical address=
00
00
PFN (from PTE)
Page offset8 029 9
PFN
Get PTE
Computer Architecture 2008 – VM24
System Space Address Translation
SBR
VPN*4
PFN
10 offset8 029 9
VPN
offset8 029 9
PFN
31
Computer Architecture 2008 – VM25
P0 Space Address Translation
00
00 Page offset8 029 9
VPN
P0BR (P0 page table base virtual address)
+
VPN0
PTE S0 space virtual address=
00
00
PFN (from PTE)
Page offset8 029 9
PFN
Get PTE using system space translation algorithm
Computer Architecture 2008 – VM26
P0 Space Address Translation (cont)
SBR
P0BR+VPN*4
Offset’8 029 9
PFN’
00 offset8 029 9
VPN31
10 Offset’8 029 9
VPN’31
PFN’
VPN’*4
Physical addrof PTE
PFN
Offset8 029 9
PFN
Computer Architecture 2008 – VM27
P0 space Address translation Using TLB
Yes
No
ProcessTLB Access
ProcessTLB hit?
00 VPN offset
NoSystemTLB hit?
Yes
Get PTE of req page from the proc. TLB
Calculate PTE virtual addr (in S0): P0BR+4*VPN
System TLB Access
Get PTE from system TLB
Get PTE of req page from the process Page table
Access Sys Page Table inSBR+4*VPN(PTE)
Memory Access
Calculate physical address
PFN
PFN
Access Memory
Computer Architecture 2008 – VM28
Paging in x86 2-level hierarchical mapping
– Page directory and page tables
– All pages and page tables are 4K
Linear address divided to:– Dir 10 bits
– Table 10 bits
– Offset 12 bits
– Dir/Table serves as indexinto a page table
– Offset serves ptr into adata page
Page entry points to a
page table or page Performance issues: TLB
031
DIR TABLE OFFSET
Page Table
PG Tbl Entry
Page Directory
4K Dir Entry
4K Page Frame
Operand
Linear Address Space (4K Page)1121
CR3
Computer Architecture 2008 – VM29
x86 Page Translation Mechanism CR3 points to current page directory (may be changed per process)
Usually, a page directory entry (covers 4MB) points to a page table that covers data of the same type/usage
Can allocate different physical for same Linear (e.g. 2 copies of same code)
Sharing can alias pages from diff. processes to same physical (e.g., OS)
DIR TABLE OFFSET
Page Dir Code
Data
StackOS
Phys Mem
4K page
CR3
Page Tables
Computer Architecture 2008 – VM30
x86 Page Entry Format
20 bit pointer to a 4K Aligned address
12 bits flags Virtual memory
– Present
– Accessed, Dirt
Protection– Writable (R#/W)
– User (U/S#)
– 2 levels/type only
Caching– Page WT
– Page Cache Disabled
3 bit for OS usage
Figure 11-14. Format of Page Directory and Page Table Entries for 4K Pages
0
0 0
Page Frame Address 31:12 AVAIL 0 0 APCD
PWT
U W P
PresentWritableUserWrite-ThroughCache DisableAccessedPage Size (0: 4 Kbyte)Available for OS Use
Page DirEntry
04 12357911 681231
Page Frame Address 31:12 AVAIL D APCD
PWT
U W P
PresentWritableUserWrite-ThroughCache DisableAccessedDirtyAvailable for OS Use
Page TableEntry
04 12357911 681231
Reserved by Intel for future use (should be zero)
-
-
Computer Architecture 2008 – VM31
x86 Paging – Virtual memory
A page can be– Not yet loaded
– Loaded
– On disk
A loaded page can be– Dirty
– Clean
When a page is not loaded (P bit clear) => Page fault occurs– It may require throwing a loaded page to insert the new one
OS prioritize throwing by LRU and dirty/clean/avail bits Dirty page should be written to Disk. Clean need not.
– New page is either loaded from disk or “initialized”
– CPU will set page “access” flag when accessed, “dirty” when written
Computer Architecture 2008 – VM32
Virtually-Addressed Cache Cache uses virtual addresses (tags are virtual)
Only require address translation on cache miss– TLB not in path to cache hit
Aliasing: 2 different virtual addr. mapped to same physical addr– Two different cache entries holding data for the same physical address
– Must update all cache entries with same physical address
data
Trans-lation
Cache
MainMemory
VA
hit
PA
CPU
Computer Architecture 2008 – VM33
Virtually-Addressed Cache (cont).
Cache must be flushed at task switch
– Solution: include process ID (PID) in tag
How to share memory among processes
– Permit multiple virtual pages to refer to same physical frame
Problem: incoherence if they point to different
physical pages
– Solution: require sufficiently many common virtual LSB
– With direct mapped cache, guarantied that they all point to same
physical page
Computer Architecture 2008 – VM34
Backup
Computer Architecture 2008 – VM35
Inverted Page Tables
V.Page P. FramehashVirtual
Page
=
IBM System 38 (AS400) implements 64-bit addresses.
48 bits translated
start of object contains a 12-bit tag
=> TLBs or virtually addressed caches are critical
Computer Architecture 2008 – VM36
Hardware / Software Boundary
What aspects of the Virtual → Physical Translation is determined in hardware?
TLB Format Type of Page Table Page Table Entry Format Disk Placement Paging Policy
Computer Architecture 2008 – VM37
Why virtual memory? Generality
– ability to run programs larger than size of physical memory
Storage management– allocation/deallocation of variable sized blocks is costly and leads to
(external) fragmentation
Protection– regions of the address space can be R/O, Ex, . . .
Flexibility– portions of a program can be placed anywhere, without relocation
Storage efficiency– retain only most important portions of the program in memory
Concurrent I/O– execute other processes while loading/dumping page
Expandability– can leave room in virtual address space for objects to grow.
Performance
Computer Architecture 2008 – VM38
Address Translation with a TLB
virtual addressvirtual page number page offset
physical address
n–1 0p–1p
valid physical page numbertag
valid tag data
data=
cache hit
tag byte offsetindex
=
TLB hit
TLB
Cache
. ..