Virtual Memory & Address Translation

Virtual Memory &Address Translation

Vivek PaiPrinceton University

Oct 10, 2002 2

Memory Gedanken

• For a given system, estimate the following– # of processes– Amount of memory

• Are all portions of a program likely to be used?• What fraction might be unused?• What does this imply about memory

bookkeeping?

Oct 10, 2002 3

Some Numbers

• Bubba: desktop machine– 63 processes– 128MB memory + 12MB swap used– About 2 MB per process

• Oakley: CS shared machine– 188 processes– 2GB memory + 624MB swap– About 13 MB per process

• Arizona: undergrad shared machine– 818 processes– 8GB memory, but only 2GB used– About 2MB per process

Oct 10, 2002 4

Quiz 1 Answers

• Microkernel – small privileged kernel with most kernel services running as user-space process

• Speed? Lots of context switches, memory copying for communications

• Memory layout:

Oct 10, 2002 5

Quiz 1 Breakdown

• 4.0 – 5• 3.5 – 7• 3.0 – 8• 2.5 – 11• 2.0 – 1• 1.5 – 5• 0.5 – 3

Oct 10, 2002 6

General Memory Problem

• We have a limited (expensive) physical resource: main memory

• We want to use it as efficiently as possible• We have an abundant, slower resource: disk

Oct 10, 2002 7

Lots of Variants

• Many programs, total size less than memory– Technically possible to pack them together– Will programs know about each other’s existence?

• One program, using lots of memory– Can you only keep part of the program in memory?

• Lots of programs, total size exceeds memory– What programs are in memory, and how to decide?

Oct 10, 2002 8

History Versus Present

• History– Each variant had its own solution– Solutions have different hardware requirements– Some solutions software/programmer visible

• Present – general-purpose microprocessors– One mechanism used for all of these cases

• Present – less capable microprocessors– May still use “historical” approaches

Oct 10, 2002 9

Many Programs, Small Total Size

• Observation: we can pack them into memory• Requirements by segments

– Text: maybe contiguous– Data: keep contiguous, “relocate” at start– Stack: assume contiguous, fixed size

• Just set pointer at start, reserve space– Heap: no need to make it contiguous

Text 1

Data 1

Stack 1

Data 2

Text 2

Stack 2

Oct 10, 2002 10

Many Programs, Small Total Size

• Software approach– Just find appropriate space for data & code

segments– Adjust any pointers to globals/functions in the code– Heap, stack “automatically” adjustable

• Hardware approach– Pointer to data segment– All accesses to globals indirected

Oct 10, 2002 11

One Program, Lots of Memory

• Observations: locality– Instructions in a function generally related– Stack accesses generally in current stack frame– Not all data used all the time

• Goal: keep recently-used portions in memory– Explicit: programmer/compiler reserves, controls

part of memory space – “overlays”– Note: limited resource may be address space

Data

Stack

Text

Active Heap

OtherHeap(s)

Oct 10, 2002 12

Many Programs, Lots of Memory

• Software approach– Keep only subset of programs in memory– When loading a program, evict any programs that use the

same memory regions– “Swap” programs in/out as needed

• Hardware approach– Don’t permanently associate any address of any program to

any part of physical memory• Note: doesn’t address problem of too few address bits

Oct 10, 2002 13

Why Virtual Memory?

• Use secondary storage($)– Extend DRAM($$$) with reasonable performance

• Protection– Programs do not step over each other– Communications require explicit IPC operations

• Convenience– Flat address space– Programs have the same view of the world

Oct 10, 2002 14

How To Translate

• Must have some “mapping” mechanism• Mapping must have some granularity

– Granularity determines flexibility– Finer granularity requires more mapping info

• Extremes:– Any byte to any byte: mapping equals program size– Map whole segments: larger segments problematic

Oct 10, 2002 15

Translation Options

• Granularity– Small # of big fixed/flexible regions – segments– Large # of fixed regions – pages

• Visibility– Translation mechanism integral to instruction set – segments– Mechanism partly visible, external to processor – obsolete– Mechanism part of processor, visible to OS – pages

Oct 10, 2002 16

Translation Overview

• Actual translation is in hardware (MMU)

• Controlled in software• CPU view

– what program sees, virtual memory

• Memory view– physical memory

Translation(MMU)

CPU

virtual address

Physicalmemory

physical address

I/Odevice

Oct 10, 2002 17

Goals of Translation

• Implicit translation for each memory reference

• A hit should be very fast• Trigger an exception on

a miss• Protected from user’s

faults

Registers

Cache(s)

DRAM

Disk

10x

100x

10Mx

paging

Oct 10, 2002 18

Base and Bound• Built in Cray-1• A program can only access

physical memory in [base, base+bound]

• On a context switch: save/restore base, bound registers

• Pros: Simple• Cons: fragmentation, hard to

share, and difficult to use disks

virtual address

base

bound

error

+

>

physical address

Oct 10, 2002 19

Segmentation• Have a table of (seg, size)• Protection: each entry has

– (nil, read, write, exec)• On a context switch:

save/restore the table or a pointer to the table in kernel memory

• Pros: Efficient, easy to share

• Cons: Complex management and fragmentation within a segment

physical address

+

segment offset

Virtual address

seg size

...

> error

Oct 10, 2002 20

Paging

• Use a page table to translate

• Various bits in each entry• Context switch: similar to

the segmentation scheme• What should be the page

size?• Pros: simple allocation,

easy to share• Cons: big table & cannot

deal with holes easily

VPage # offset

Virtual address

...

>error

PPage# ...

PPage# ...

...

PPage # offset

Physical address

Page table

page table size

Oct 10, 2002 21

How Many PTEs Do We Need?

• Assume 4KB page– Equals “low order” 12 bits

• Worst case for 32-bit address machine– # of processes 220

• What about 64-bit address machine?– # of processes 252

Oct 10, 2002 22

Segmentation with Paging

VPage # offset

Virtual address

...

>

PPage# ...

PPage# ...

...

PPage # offset

Physical address

Page tableseg size

...

Vseg #

error

Oct 10, 2002 23

Stretch Time: Getting A Line

• First try: fflush (stdout)scanf(“%s”, line)

• 2nd try: scanf(“%c”, firstChar)if (firstChar != ‘\n’) scanf(“%s”, rest)

• Final: back to first principleswhile (…) getc( )

Oct 10, 2002 24

Multiple-Level Page Tables

Directory ...

pte

...

...

...

dir table offset

Virtual address

What does this buy us? Sparse address spaces and easier paging

Oct 10, 2002 25

Inverted Page Tables

• Main idea– One PTE for each physical

page frame– Hash (Vpage, pid) to

Ppage#• Pros

– Small page table for large address space

• Cons– Lookup is difficult – Overhead of managing hash

chains, etc

pid vpage offset

pid vpage

0

k

n-1

k offset

Virtual address

Physical address

Inverted page table

Oct 10, 2002 26

Virtual-To-Physical Lookups

• Programs only know virtual addresses• Each virtual address must be translated

– May involve walking hierarchical page table– Page table stored in memory– So, each program memory access requires several

actual memory accesses• Solution: cache “active” part of page table

Oct 10, 2002 27

Translation Look-aside Buffer (TLB)

offset

Virtual address

...

PPage# ...

PPage# ...

PPage# ...

PPage # offset

Physical address

VPage #

TLB

Hit

Miss

Realpagetable

VPage#VPage#

VPage#

Oct 10, 2002 28

Bits in a TLB Entry

• Common (necessary) bits– Virtual page number: match with the virtual address– Physical page number: translated address– Valid– Access bits: kernel and user (nil, read, write)

• Optional (useful) bits– Process tag– Reference– Modify– Cacheable

Oct 10, 2002 29

Hardware-Controlled TLB

• On a TLB miss– Hardware loads the PTE into the TLB

• Need to write back if there is no free entry– Generate a fault if the page containing the PTE is invalid– VM software performs fault handling– Restart the CPU

• On a TLB hit, hardware checks the valid bit– If valid, pointer to page frame in memory– If invalid, the hardware generates a page fault

• Perform page fault handling• Restart the faulting instruction

Oct 10, 2002 30

Software-Controlled TLB

• On a miss in TLB– Write back if there is no free entry– Check if the page containing the PTE is in memory– If no, perform page fault handling– Load the PTE into the TLB– Restart the faulting instruction

• On a hit in TLB, the hardware checks valid bit– If valid, pointer to page frame in memory– If invalid, the hardware generates a page fault

• Perform page fault handling• Restart the faulting instruction

Oct 10, 2002 31

Hardware vs. Software Controlled

• Hardware approach– Efficient– Inflexible– Need more space for page table

• Software approach– Flexible– Software can do mappings by hashing

• PP# (Pid, VP#)• (Pid, VP#) PP#

– Can deal with large virtual address space

Oct 10, 2002 32

Cache vs. TLBs

• Similarities– Both cache a portion of

memory– Both write back on a miss

• Combine L1 cache with TLB– Virtually addressed cache– Why wouldn’t everyone

use virtually addressed caches?

• Differences– Associativity

• TLB is usually fully set-associative

• Cache can be direct-mapped

– Consistency• TLB does not deal with

consistency with memory• TLB can be controlled by

software

Oct 10, 2002 33

Caches vs. TLBs

Similarities• Both cache a portion of

memory• Both read from memory on

misses

Differences• Associativity

– TLBs generally fully associative– Caches can be direct-mapped

• Consistency– No TLB/memory consistency– Some TLBs software-controlled

Combining L1 caches with TLBs• Virtually addressed caches• Not always used – what are their drawbacks?

Oct 10, 2002 34

Issues

• What TLB entry to be replaced?– Random– Pseudo LRU

• What happens on a context switch?– Process tag: change TLB registers and process register– No process tag: Invalidate the entire TLB contents

• What happens when changing a page table entry?– Change the entry in memory– Invalidate the TLB entry

Oct 10, 2002 35

Consistency Issues

• “Snoopy” cache protocols can maintain consistency with DRAM, even when DMA happens

• No hardware maintains consistency between DRAM and TLBs: you need to flush related TLBs whenever changing a page table entry in memory

• On multiprocessors, when you modify a page table entry, you need to do “TLB shoot-down” to flush all related TLB entries on all processors

Oct 10, 2002 36

Issues to Ponder

• Everyone’s moving to hardware TLB management – why?

• Segmentation was/is a way of maintaining backward compatibility – how?

• For the hardware-inclined – what kind of hardware support is needed for everything we discussed today?

Virtual Memory & Address Translation

Documents

Transcript of Virtual Memory & Address Translation