Resource Management Policy and Mechanism Jeff Chase Duke University.

47
Resource Management Policy and Mechanism Jeff Chase Duke University

Transcript of Resource Management Policy and Mechanism Jeff Chase Duke University.

Page 1: Resource Management Policy and Mechanism Jeff Chase Duke University.

Resource ManagementPolicy and Mechanism

Jeff ChaseDuke University

Page 2: Resource Management Policy and Mechanism Jeff Chase Duke University.

The kernel

syscall trap/return fault/return

interrupt/return

system call layer: files, processes, IPC, thread syscallsfault entry: VM page faults, signals, etc.

I/O completions timer ticks

thread/CPU/core management: sleep and ready queuesmemory management: block/page cache

sleep queue ready queue

Page 3: Resource Management Policy and Mechanism Jeff Chase Duke University.

The kernel

syscall trap/return fault/return

interrupt/return

system call layer: files, processes, IPC, thread syscallsfault entry: VM page faults, signals, etc.

I/O completions timer ticks

thread/CPU/core management: sleep and ready queuesmemory management: block/page cache

sleep queue ready queue

policy

policy

Page 4: Resource Management Policy and Mechanism Jeff Chase Duke University.

Separation of policy and mechanism

• Every OS platform has mechanisms that enable it to mediate access to machine resources.– Gain control of core by timer interrupts

– Fault on access to non-resident virtual memory

– I/O through system call traps

– Internal code and data structures to track resource usage and allocate resources

• The mechanisms enable resource management policy.

• But the mechanisms do not and must/should not determine the policy.

• We might want to change the policy!

Page 5: Resource Management Policy and Mechanism Jeff Chase Duke University.

Goals of policy

• Share resources fairly.

• Use machine resources efficiently.

• Be responsive to user interaction.

But what do these things mean?How do we know if a policy is good or not?What are the metrics?What do we assume about the workload?

Page 6: Resource Management Policy and Mechanism Jeff Chase Duke University.

Memory Allocation

How should an OS allocate its memory resources among contending demands?– Virtual address spaces: fork, exec, sbrk, page fault.

– The kernel controls how many machine memory frames back the pages of each virtual address space.

– The kernel can take memory away from a VAS at any time.

– The kernel always gets control if a VAS (or rather a thread running within a VAS) asks for more.

– The kernel controls how much machine memory to use as a cache for data blocks whose home is on slow storage.

– Policy choices: which pages or blocks to keep in memory? And which ones to evict from memory to make room for others?

Page 7: Resource Management Policy and Mechanism Jeff Chase Duke University.

What is a Virtual Address Space?

• Protection domain– A “sandbox” for threads that limits what memory they can access

for read/write/execute.

– Each thread is in exactly one sandbox, but many threads may play in the same sandbox.

• Uniform name space– Threads access their code and data items without caring where

they are in physical memory, or even if they are resident in memory at all.

• A set of VP translations– A level of indirection from virtual pages to physical frames.

– The OS kernel controls the translations in effect at any time.

Page 8: Resource Management Policy and Mechanism Jeff Chase Duke University.

Introduction to Virtual Addressing

text

data

BSS

user stack

args/envkernel

data

virtualmemory

(big?)

physicalmemory(small?)

virtual-to-physical translations

Code addresses memory through

virtual addresses.

The kernel and the machine collude to

translate virtual addresses to physical

addresses.

The kernel controls the virtual-physical

translations in effect (space).

The machine does not allow a user

process to access memory unless the kernel “says it’s O

K”.

The specific mechanisms for implementing virtual address

translation are machine-dependent.

Page 9: Resource Management Policy and Mechanism Jeff Chase Duke University.

Virtual Memory as a Cache

text

dataidatawdata

header

symboltable, etc.

programsections

text

data

BSS

user stack

args/envkernel

data

processsegments

page frames

virtualmemory

(big)

physicalmemory(small)

executablefile

backingstorage

virtual-to-physical translations

pageout/eviction

page fetch

Page 10: Resource Management Policy and Mechanism Jeff Chase Duke University.

Virtual Address Translation

VPN offset12

Example: typical 32-bitarchitecture with 4KB pages.

addresstranslatio

n

Virtual address translation maps a virtual page number (VPN) to a physical page frame number (PFN): the rest is easy.

PFN

offset

+

0

physical address {

Deliver exception toOS if translation is notvalid and accessible inrequested mode.

Page 11: Resource Management Policy and Mechanism Jeff Chase Duke University.

Cartoon View

PFN 0PFN 1

PFN i

page #i offset

user virtual address

PFN i+

offset

process page table (map)

physical memorypage frames

In this example, each VPN j maps to PFN j, but in practice any physical frame may be used for

any virtual page.

Each process/VAS has its own page table.

Virtual addresses are translated relative to

the current page table.

The maps are themselves stored in memory; a protected

register holds a pointer to the current map.

Page 12: Resource Management Policy and Mechanism Jeff Chase Duke University.

Under the Hood

raiseexception

probepage table

loadTLB

probe TLB

accessphysicalmemory

accessvalid?

pagefault?

signalprocess

allocateframe

page ondisk?

fetchfrom disk

zero-fillloadTLB

starthere

MMU

OS

Page 13: Resource Management Policy and Mechanism Jeff Chase Duke University.

Page/block maps

map

Idea: use a level of indirection through a map to assemble a

storage object from “scraps” of storage in different locations.

The “scraps” can be fixed-size slots: that makes allocation

easy because they are interchangeable.

Example: page tables that implement a VAS.

Page 14: Resource Management Policy and Mechanism Jeff Chase Duke University.

Names and layers

notes in notebook fileUserview

Application

File System

notefile fd, byte range*

Disk Subsystem

device, block #

surface, cylinder, sector

bytes

fd

block#

Add more layers as needed.

Page 15: Resource Management Policy and Mechanism Jeff Chase Duke University.

Representing a File On Disk

logicalblock 0

logicalblock 1

logicalblock 2

once upon a time/nin a l

and far far away,/nlived t

he wise and sagewizard.

physical block pointers in the block map are sector IDs or physical block numbers

file attributes: may include owner, access control list, time of create/modify/access, etc.

block mapIndex by logical block number

“inode”

Page 16: Resource Management Policy and Mechanism Jeff Chase Duke University.

A filesystem on disk

111000100010110110111101

100110100011000100010101

001011100001100101000100

inode 0bitmap file

allocationbitmap file

blocks

0

rain: 32

hail: 48

0

wind: 18

snow: 62

once upon a time/n in a l

and far far away, lived th

inode 1root directory

fixed locations on disk

This is a toy example (Nachos).

regular file(inode)

directory blocks

file blocks

Page 17: Resource Management Policy and Mechanism Jeff Chase Duke University.

The Buffer Cache

Memory

Filecache

Proc

Page 18: Resource Management Policy and Mechanism Jeff Chase Duke University.

File Buffer Cache

• Avoid the disk for as many file operations as possible.

• Cache acts as a filter for the requests seen by the disk reads served best.

• Delayed writeback will avoid going to disk at all for temp files.

Copyin/copyout

Filecache

Proc

Page 19: Resource Management Policy and Mechanism Jeff Chase Duke University.

Page/block cache internalsHASH(blockID)

Each frame/buffer of memory is described by a meta-object (header).

Resident pages or blocks are accessible through through a global hash table.

An ordered list of eviction candidates winds through the hash chains.

Some frames/buffers are free (no valid data). These are on a free list.

Page 20: Resource Management Policy and Mechanism Jeff Chase Duke University.

VM page cache internalsHASH(segment, page offset)

1. Pages in active use are mapped through the page table of one or more processes.

2. On a fault, the global object/offset hash table in kernel finds pages brought into memory by other processes.

3. Several page queues wind through the set of active frames, keeping track of usage.

4. Pages selected for eviction are removed from all page tables first.

Page 21: Resource Management Policy and Mechanism Jeff Chase Duke University.

Replacement

Think of physical memory as a cache

What happens on a cache miss? Page fault Must decide what to evict

Goal: reduce number of misses

Page 22: Resource Management Policy and Mechanism Jeff Chase Duke University.

Review of replacement algorithms

1. Random Easy implementation, not great results

2. FIFO (first in, first out) Replace page that came in longest ago Popular pages often come in early Problem: doesn’t consider last time used

3. OPT (optimal) Replace the page that won’t be needed for longest

time Problem: requires knowledge of the future

Page 23: Resource Management Policy and Mechanism Jeff Chase Duke University.

Review of replacement algorithms

LRU (least-recently used) Use past references to predict future Exploit “temporal locality” Problem: expensive to implement

exactly Why?

Either have to keep sorted list Or maintain time stamps + scan on eviction Update info on every access (ugh)

Page 24: Resource Management Policy and Mechanism Jeff Chase Duke University.

LRU

LRU is just an approximation of OPT

Could try approximating LRU instead Don’t have to replace oldest page Just replace an old page

Page 25: Resource Management Policy and Mechanism Jeff Chase Duke University.

– 25 – 15-213, F’02

LocalityLocalityPrinciple of Locality:Principle of Locality:

Programs tend to reuse data and instructions near those they have used recently, or that were recently referenced themselves.

Temporal locality: Recently referenced items are likely to be referenced in the near future.

Spatial locality: Items with nearby addresses tend to be referenced close together in time.

Locality Example:• Data

– Reference array elements in succession (stride-1 reference pattern):

– Reference sum each iteration:

• Instructions

– Reference instructions in sequence:

– Cycle through loop repeatedly:

sum = 0;for (i = 0; i < n; i++)

sum += a[i];return sum;

Spatial locality

Spatial locality

Temporal locality

Temporal locality

Page 26: Resource Management Policy and Mechanism Jeff Chase Duke University.

– 26 – 15-213, F’02

Memory HierarchiesMemory Hierarchies

Some fundamental and enduring properties of Some fundamental and enduring properties of hardware and software:hardware and software: Fast storage technologies cost more per byte and have less

capacity. The gap between CPU and main memory speed is widening. Well-written programs tend to exhibit good locality.

These fundamental properties complement each other These fundamental properties complement each other beautifully.beautifully.

They suggest an approach for organizing memory and They suggest an approach for organizing memory and storage systems known as a storage systems known as a memory hierarchymemory hierarchy..

Page 27: Resource Management Policy and Mechanism Jeff Chase Duke University.

– 27 – 15-213, F’02

An Example Memory HierarchyAn Example Memory Hierarchy

registers

on-chip L1cache (SRAM)

main memory(DRAM)

local secondary storage(local disks)

Larger, slower,

and cheaper (per byte)storagedevices

remote secondary storage(distributed file systems, Web servers)

Local disks hold files retrieved from disks on remote network servers.

Main memory holds disk blocks retrieved from local disks.

off-chip L2cache (SRAM)

L1 cache holds cache lines retrieved from the L2 cache memory.

CPU registers hold words retrieved from L1 cache.

L2 cache holds cache lines retrieved from main memory.

L0:

L1:

L2:

L3:

L4:

L5:

Smaller,faster,and

costlier(per byte)storage devices

Page 28: Resource Management Policy and Mechanism Jeff Chase Duke University.

– 28 – 15-213, F’02

CachesCaches

Cache:Cache: A smaller, faster storage device that acts as a A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, staging area for a subset of the data in a larger, slower device.slower device.

Fundamental idea of a memory hierarchy:Fundamental idea of a memory hierarchy: For each k, the faster, smaller device at level k serves as a

cache for the larger, slower device at level k+1.

Why do memory hierarchies work?Why do memory hierarchies work? Programs tend to access the data at level k more often than

they access the data at level k+1. Thus, the storage at level k+1 can be slower, and thus larger

and cheaper per bit. Net effect: A large pool of memory that costs as much as

the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.

Page 29: Resource Management Policy and Mechanism Jeff Chase Duke University.

– 29 – 15-213, F’02

Caching in a Memory HierarchyCaching in a Memory Hierarchy

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Larger, slower, cheaper storagedevice at level k+1 is partitionedinto blocks.

Data is copied betweenlevels in block-sized transfer units

8 9 14 3Smaller, faster, more expensivedevice at level k caches a subset of the blocks from level k+1

Level k:

Level k+1: 4

4

4 10

10

10

Page 30: Resource Management Policy and Mechanism Jeff Chase Duke University.

– 30 – 15-213, F’02

Request14

Request12

General Caching ConceptsGeneral Caching Concepts

Program needs object d, which is stored Program needs object d, which is stored in some block b.in some block b.

Cache hitCache hit Program finds b in the cache at level

k. E.g., block 14.

Cache missCache miss b is not at level k, so level k cache

must fetch it from level k+1. E.g., block 12.

If level k cache is full, then some current block must be replaced (evicted). Which one is the “victim”?

Placement policy: where can the new block go? E.g., b mod 4

Replacement policy: which block should be evicted? E.g., LRU

9 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Level k:

Level k+1:

1414

12

14

4*

4*12

12

0 1 2 3

Request12

4*4*12

Page 31: Resource Management Policy and Mechanism Jeff Chase Duke University.

– 31 – 15-213, F’02

A System with Virtual MemoryA System with Virtual Memory

Examples:Examples: workstations, servers, modern PCs, etc.

Address Translation: Hardware converts virtual addresses to physical addresses via OS-managed lookup table (page table)

CPU

0:1:

N-1:

Memory

0:1:

P-1:

Page Table

Disk

VirtualAddresses

PhysicalAddresses

Page 32: Resource Management Policy and Mechanism Jeff Chase Duke University.

– 32 – 15-213, F’02

Page Faults (like “Cache Misses”)Page Faults (like “Cache Misses”)What if an object is on disk rather than in memory?What if an object is on disk rather than in memory?

Page table entry indicates virtual address not in memory OS exception handler invoked to move data from disk into

memorycurrent process suspends, others can resumeOS has full control over placement, etc.

CPU

Memory

Page Table

Disk

VirtualAddresses

PhysicalAddresses

CPU

Memory

Page Table

Disk

VirtualAddresses

PhysicalAddresses

Before fault After fault

Page 33: Resource Management Policy and Mechanism Jeff Chase Duke University.

Dynamic address translation

User processUser process Translator(MMU)

Translator(MMU)

PhysicalmemoryPhysicalmemoryVirtual

addressPhysicaladdress

Will this allow us to provide protection?Sure, as long as the translation is correct

Page 34: Resource Management Policy and Mechanism Jeff Chase Duke University.

The Page Caching Problem

Each thread/process/job utters a stream of page references.– reference string: e.g., abcabcdabce..

The OS tries to minimize the number of faults incurred.– The set of pages (the working set) actively used by each job changes

relatively slowly.

– Try to arrange for the resident set of pages for each active job to closely approximate its working set.

Replacement policy is the key.– On each page fault, select a victim page to evict from memory; read the

new page into the victim’s frame.

– Simple: replace the page whose next reference is furthest in the future (OPT).

Page 35: Resource Management Policy and Mechanism Jeff Chase Duke University.

Managing the VM Page Cache

• Managing a VM page cache is similar to a file block cache, but with some new twists.

• Pages are typically referenced by page table (pmap) entries.

– Must invalidate mappings before reusing the frame.

• Reads and writes are implicit; the TLB hides them from the OS.

– How can we tell if a page is dirty?

– How can we tell if a page is referenced?

• Cache manager must run policies periodically, sampling page state.

– Continuously push dirty pages to disk to “launder” them.

– Continuously check references to judge how “hot” each page is.

– Balance accuracy with sampling overhead.

Page 36: Resource Management Policy and Mechanism Jeff Chase Duke University.

public interface IVirtualDisk {

/* Read a block specified by the dBID into buffer */public void readBlock(int dBID, byte buffer[]) throws…;

/* Write to block specified by the dBID from buffer */public void writeBlock(int dBID, byte buffer[]) throws…;

/* * Start an asynchronous request to the device/disk. * -- operation is either READ or WRITE * -- callbackIdentifer is an identifier the caller may use to match the * responses from the device (through a callback) with the requests. The * device does not interpret the callbackIdentifer, it just passes with * it along with the callback. * -- blockID uniquely identifies the block to access * -- buffer[] is a byte array used for read/write operations */public void startRequest(DiskOperationType operation, int callbackIdentifer,

int blockID, byte buffer[]) throws…;}

Page 37: Resource Management Policy and Mechanism Jeff Chase Duke University.

public interface IDFS {

/* creates a new DFile and returns the DFileID */public DFileID createDFile();

/* destroys the file specified by the DFileID */public void destroyDFile(DFileID dFID);

/* reads the file specified by DFileID starting from the offset startOffset * to the count specified into the buffer */public int read(DFileID dFID, byte[] buffer, int startOffset, int count);

/* writes to the file specified by DFileID from the buffer starting at * offset startOffset upto the count specified */public int write(DFileID dFID, byte[] buffer, int startOffset, int count);

/* List all the existing DFileIDs in the associated volume _volName */public List<DFileID> listAllDFiles();

}

Page 38: Resource Management Policy and Mechanism Jeff Chase Duke University.

public abstract class DBufferCache implements VirtualDiskCallback {

/* * Buffer allocation: Get locked buffer that can be used for block specified * by blockID */public abstract DBuffer getBlock(int dBID);

/* Release the locked buffer so that others waiting on it can use it */public abstract void releaseBuffer(byte[] buffer);

/* * sync() writes back all dirty blocks to DStore and forces DStore * to write back all contents to the disk device. The sync( ) method should * maintain clean block copies in DBufferCache. */public abstract void sync();

/* Similar to sync() but invalidates all cached blocks unlike sync(). */public abstract void flush();

}

Page 39: Resource Management Policy and Mechanism Jeff Chase Duke University.

public abstract class DBuffer {

/* If the block is not in cache, start a fetch from disk asynchronously */public abstract void startFetch();

/* Push a buffer block to device/disk asynchronously */public abstract void startPush();

/* Check whether the buffer is in use */ public abstract boolean checkValid();

/* Wait until the buffer is free */public abstract boolean waitValid();

/* Check whether the buffer is dirty, i.e., written to memory but not written to the disk device yet */

public abstract boolean checkClean();

/* Wait until the buffer is clean */public abstract boolean waitClean();

}

Page 40: Resource Management Policy and Mechanism Jeff Chase Duke University.

public abstract class DBuffer {/* * reads into the buffer[ ] array the cache block specified by blockID from * the DBufferCache if it is in cache, otherwise reads the corresponding * disk block from the disk device. Upon an error, it should return -1, * otherwise return number of bytes read. */public abstract int read(int blockID, byte[] buffer, int startOffset,

int count);

/* * writes the buffer[ ] array contents to the cache block specified by * blockID from the DBufferCache if it is in cache, otherwise finds a free * cache block and writes the buffer [ ] contents on it. Upon an error, it * should return -1, otherwise return number of bytes written. */public abstract int write(int blockID, byte[] buffer, int startOffset,

int count);}

Page 41: Resource Management Policy and Mechanism Jeff Chase Duke University.

How it should be

Page 42: Resource Management Policy and Mechanism Jeff Chase Duke University.

DFS

DBufferCache DBuffer

VirtualDisk

startRequest(r/w)

ioComplete()

copy bytes to/from bufferstartFetch(), startPush()waitValid(), waitClean()

sync();DBuffer = getBlock(blockID);releaseBlock(buf);

create, destroy, read, write a dfile list() dfilessync() cache

Page 43: Resource Management Policy and Mechanism Jeff Chase Duke University.

/* creates a new dfile and returns the DFileID */public DFileID createDFile();

/* destroys the dfile named by the DFileID */public void destroyDFile(DFileID dFID);

/* reads contents of the dfile named by DFileID into the buffer * starting from buffer offset startOffset; at most count bytes are transferred

*/public int read(DFileID dFID, byte[] buffer, int startOffset, int count);

/* writes to the file specified by DFileID from the buffer * starting from buffer offset startOffset; at most count bytes are transferred

*/public int write(DFileID dFID, byte[] buffer, int startOffset, int count);

/* List DFileIDs for all existing dfiles in the volume */public List<DFileID> listAllDFiles();

DFS

Page 44: Resource Management Policy and Mechanism Jeff Chase Duke University.

/* Get buffer for block specified by blockID The buffer is “busy” until the caller releases it.

*/public DBuffer getBlock(int blockID);

/* Release the buffer so that others */

public void releaseBlock(DBuffer buf);

/* Write back all dirty blocks to the volume, and wait for completion. */public void sync();

DBufferCache

Page 45: Resource Management Policy and Mechanism Jeff Chase Duke University.

/* Start an asynchronous fetch of associated block from the volume */public abstract void startFetch();

/* Start an asynchronous write of buffer contents to block on volume */public abstract void startPush();

/* Check whether the buffer has valid data*/ public abstract boolean checkValid();

/* Wait until the buffer is free */public abstract boolean waitValid();

/* Check whether the buffer is dirty, i.e., has modified data to be written back */public abstract boolean checkClean();

/* Wait until the buffer is clean, i.e., until a push operation completes */public abstract boolean waitClean();

/* Check if buffer is evictable: not evictable if I/O in progress, or buffer is held. */public abstract boolean isBusy();

DBuffer

Page 46: Resource Management Policy and Mechanism Jeff Chase Duke University.

/* * reads into the buffer[ ] array from the contents of the DBuffer.

* Check first that the DBuffer has a valid copy of the data! * startOffset and count are for the buffer array, not the DBuffer.

*/public int read(byte[] buffer, int startOffset, int count);

/* * writes into the Dbuffer from the contents of buffer[ ] array.

* startOffset and count are for the buffer array, not the Dbuffer. * Mark buffer dirty!

*/public int write(byte[] buffer, int startOffset, int count);

}

DBuffer

Page 47: Resource Management Policy and Mechanism Jeff Chase Duke University.

/* * Start an asynchronous request to the device/disk. * Nature of the request is encoded in the state of the DBuffer * -- operation is either READ or WRITE * -- blockID uniquely identifies the block to access * -- buffer[] is a byte array used for read/write operations */public void startRequest(DBuffer buf) throws…;

VirtualDisk