CS 6560 Operating System Design

31
CS 6560 Operating System Design Lecture 8: Memory Management

description

CS 6560 Operating System Design. Lecture 8: Memory Management. References. Our textbook: Robert Love, Linux Kernel Development, 2nd edition, Novell Press, 2005. Understanding the LINUX Kernel, 3rd. edition, O’Reilly, 2005. (covers 2.6) The kernel code and its own documentation. - PowerPoint PPT Presentation

Transcript of CS 6560 Operating System Design

Page 1: CS 6560 Operating System Design

CS 6560 Operating System Design

Lecture 8:

Memory Management

Page 2: CS 6560 Operating System Design

References

• Our textbook: Robert Love, Linux Kernel Development, 2nd edition, Novell Press, 2005.

• Understanding the LINUX Kernel, 3rd. edition, O’Reilly, 2005. (covers 2.6)

• The kernel code and its own documentation.• Also see Ulrich Drepper’s articles in LWN.net

– Part I: http://lwn.net/Articles/250967/– Part II: http://lwn.net/Articles/252125/– Part III: http://lwn.net/Articles/253361/

Page 3: CS 6560 Operating System Design

Plan

• Preliminaries Today

• Next time: Chap 11: Memory Management

Page 4: CS 6560 Operating System Design

Models of Memory

• Models of Memory• Physical: RAM array accessed through a

memory controller• Real: Linear array of bytes for entire machine• Virtual: Linear array of bytes for each process.

Page 5: CS 6560 Operating System Design

Physical Model

Memory Array

MemoryController

Bus

Page 6: CS 6560 Operating System Design

Physical Model of Memory

• Physical Models (See Drepper’s article for details)– Bits in SRAM and DRAM

• SRAM = static RAM - faster, but several transistors per bit

• DRAM - dynamic RAM - slower, each bit requires only one transistor, but needs refreshing

• Many types of DRAM with differing speeds of memory array and bus

Page 7: CS 6560 Operating System Design

Physical Memory Issues

– Accessing main memory may be slow:• Large addressing requires multiplexing which slows access.

• DRAM refresh requires time.

• DRAM speed is limited by power consumption considerations.

• The speed of light limits access time. c = 299,792,458 meters/second = .299792458 meters/nanosecond. That’s about 10 cm for one cycle of a 3Gig clock

– Other considerations• A serial bus can run much faster than a parallel bus.

• Parallel busses cost more and use more space.

• The memory itself may run slower than the bus (DDR2).

Page 8: CS 6560 Operating System Design

CPU/Memory

• Various arrangements:– CPUs, memory, and peripherals share the same

bus (classical IBM PC).– CPUs and memory share the same bus (FSB),

but peripherals share another bus (PCI bus).– Each CPU has its own memory (AMD Opteron

and Intel CSI). Then CPUs share a bus or form a network that connects to the bus.

Page 9: CS 6560 Operating System Design

DMA

• DMA = Direct Memory Access

• Here, the CPU sets up a data transfer between memory and a device, but does not participate in the actual transfer. A DMA controller does the transfer on behalf of the processor.

Page 10: CS 6560 Operating System Design

Processor Hierarchy

• Processors, Cores, and Threads: – Cores - run programs simultaneously with own

instruction processing units, but share chip infrastructure.

– Threads (hyperthreading) - run programs in interleaving fashion with separate register sets.

• Modern CPUs are designed for multiplicity at all levels (say 2 threads/core, 4 cores/processor, 2 processors/motherboard)

Page 11: CS 6560 Operating System Design

Challenges

• CPUs are getting significantly faster than main memory.

• CPUs are evolving to multiple parallel processing units.

• Main memory is being distributed among processing units (NUMA)

Page 12: CS 6560 Operating System Design

Memory Caching

• Caching schemes– Cache memory sits in between CPU and main

memory.– Different levels and arrangements of cache– Main memory tends to consist of DRAM and

caches tend to consist of SRAM.

Page 13: CS 6560 Operating System Design

Various levels of memory

• Registers: typical size 512 = 2^9 - typical access time 1 cycle

• L1: typical size 64KB = 2^16 - typical access time 3 cycles

• L2: typical size 4MB = 2^22 - typical access time 14 cycles

• Main memory: typical size 2GB = 2^31 - typical access time 240 cycles

Note: sizes are for Intel IA-32 Core technology (see sandpile.org)

Page 14: CS 6560 Operating System Design

How to Caches Work?

• Cache is divided into what are called cache lines. Each cache line has several bytes of data (64 bytes of data per line for Intel Core) plus addressing information (say 32 bits). (The cache size is computed just from the data size.)

• The cache lines are divided into sets (say 8192 sets, with 8 lines per set ) (say, 2^13 * 2^3 * 2^6 = 2^22 = 4MB).

• Cache addressing information is split into fields (say T, S and O). – The rightmost field O is used for addressing within the cache lines

(say 6 bits for 64 byte cache lines).– The middle field S is used determine which set (say 13 bits)– The left field T (say 13 bits) is used for matching the address.

Page 15: CS 6560 Operating System Design

Associativity

• The address matching is done with a comparator.

• The number of items that can be compared is called the associativity.

• Modern processors typically achieve 2-way, 8-way or 16-way associativity.

Page 16: CS 6560 Operating System Design

Cashe Hits and Misses

• Caching works in levels. With three levels– If the data cannot be matched in the L1 cache,

then the L2 cache is checked.– If the data cannot be found at L2, L3 is

checked.– If not in L3, get it from main memory.

Page 17: CS 6560 Operating System Design

Cache Sharing and Snooping

• Often cores will have their own L1 caches, one for data and one for instructions, and a shared L2 cache that contains both data and instructions.

• When two or more processing units share memory or the next level cache, they may snoop on each other’s individual cache and exchange data if needed.

• Popular cache snooping algorithms include MESI and SI. In these, the address bus is available from each processing to the other and a finite state machine determines fate of a cache line according to whether it was locally or remotely read or written to.

Page 18: CS 6560 Operating System Design

MESI

• MESI has four states for each cache line– M = Modified = modified by local processing

unit and only copy– E = Exclusive = only copy, but not modified– S = Shared = not modified, and might be shared– I = Invalid = unused or invalid

Page 19: CS 6560 Operating System Design

Transition Table• Each local or remote read or write changes

the state according to something like:

M E S I

Local read M E S S

Local write M M M+ M

remote read S* S S I

remote write I*+ I I I

*Also transmits data to other processor and to memory+Also notifies other processors with RFO

Page 20: CS 6560 Operating System Design

80x86 Processor Memory

• Memory addressing modes– Logical Address: segment, offset pair– Linear Address: virtual memory address– Physical address: hardware - 32 bit or 36-bit

number

Page 21: CS 6560 Operating System Design

Memory addressing modes

• Real: logical to physical mapping uses formula 16*seg + offset

• Protected: uses MMU to map from logical to physical by way of linear.

Page 22: CS 6560 Operating System Design

MMU

• MMU (Protected Mode) maps in two stages– Segmentation Unit: uses Segment Descriptors

to map logical to linear addresses– Paging Unit: Uses Page Table to map linear

addresses to physical addresses

Page 23: CS 6560 Operating System Design

Segment Descriptor Tables

• Two types– GDT = Global Descriptor Table– LDT = Local Descriptor Table

• Linux mainly uses the GDT (at least in the kernel memory). The LDT may be used to help emulate other OSs in user mode.

• Types of descriptors (8 bytes)– Code– Data– Task State– Local

Page 24: CS 6560 Operating System Design

Fields of Segment Descriptors

• Fields (distributed within descriptor)– Base (32-bits) = linear address of first byte of segment

– G (1-bit) = granularity of segment size (bytes or 4K)

– Limit (20-bits) = size of segment (two cases of G)

– S (1-bit) = System flag (1=system data)

– Type (4-bits)

– DPL (2-bits) = descriptor privilege

– P (1-bit) segment present (Linux has this as 1)

Page 25: CS 6560 Operating System Design

Segment Selectors

• Segment info is stored in 16-bit segment selector, contains – Index: index into the GDT or LDT– TI: choice of which table, 0=GDT, 1=LDT– Privilege level

• Six segment registers in the CPU– cs, ss, ds, es, fs, gs

• First three are special purpose: code, stack, data• cs also contains privilege level (0=kernel, 3=user)

Page 26: CS 6560 Operating System Design

Segmentation in Linux

• Very limited - only when required by processor.

• All user process use the same code and data segment descriptors.

• The kernel has a separate code and data segment descriptors.

Page 27: CS 6560 Operating System Design

Linux GDT

• Each CPU has its own GDT. These are stored in cpu_gdt_decr array.

• Each GDT has 18 segment descriptors for– Four user/kernel code/data– A Task State Segment (TSS) to help switch between user

and kernel mode and when users access I/O ports– A segment containing the LDT– Three Thread-Local Storage segments– Three Power Management segments– Five Plug and Play segments– One special TSS for exception handling

Page 28: CS 6560 Operating System Design

Virtual Memory

• Virtual Memory in Linux divides memory into pages which are either 4K or 4M. (The i386 uses segmentation as well.) (on 64-bit architecture, pages may be 8K)

• The Memory Management Unit (MMU) maps virtual addresses (linear) to physical addresses.

• The kernel fills in the page mapping table (called the page directory), but the MMU hardware does the work.

• The page table contains other information about each page including permission.

• On a 32-bit virtual addressing space, the kernel sees both the process user addressing space and the kernel addressing space. The kernel appears in the upper 1G and the user space as the lower 3G.

Page 29: CS 6560 Operating System Design

Page Faults

• If a page is not in memory, a page fault interrupt is generated.

• Code for handling the fault (for i386) is in ./arch/i386/mm

• Page faults should not happen in the kernel. How this is arranged is explained in chap 11 (coming up soon).

Page 30: CS 6560 Operating System Design

Two types of 32-bit paging

• Regular - pages 4KB

• Extended (Pentium) - pages 4MB

Page 31: CS 6560 Operating System Design

Linear Address Fields

• For 32-bit regular paging (4KB), the linear address consist of three bit fields– Directory (bits 22-31) (10 bits)– Table (bits 12-21)(10 bits)– Offset (bits 0-11)(12 bits)

• For 32-bit extended (4MB)– Directory (bits 22-31) (10 bits)– Offset (bits 0-21) (22 bits)

• For 64-bit, there are more (later).