Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Victoria J. Hodge Jim Austin
Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.
-
Upload
stella-patrick -
Category
Documents
-
view
221 -
download
6
Transcript of Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.
![Page 1: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/1.jpg)
Operating System Design - Linux
Instructor: Ching-Chi Hsu
TA:Yung-Yu Chuang
![Page 2: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/2.jpg)
Introduction to Linux (Nov. 1991, Linus Torvalds)
• Multi-tasking
• Demand loading & Copy On Write
• Paging (not swapping)
• Shared Libraries
• POSIX 1003.1
• Protected Mode
• Support different file systems and executable formats
![Page 3: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/3.jpg)
Multitaskingrequire service require service
CPU idle CPU idle
require service require service
time interrupt for time-sharingrequire service
time expire
require service
![Page 4: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/4.jpg)
• Based on i386 and Linux 2.0.33
• Topics– initialization– memory management (free space management, virt
ual memory management)– process management (context switching, schedulin
g)– system call
![Page 5: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/5.jpg)
Resources for Tracing Linux
• http://odie.csie.ntu.edu.tw/~osd
• TLK, KHG, Linux Kernel Internals
• Source code browser
• Intel Programmer’s manual
![Page 6: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/6.jpg)
Source Tree for Linux
/usr/src/linux
modules
fs
netkernel
init include ipclib
driversarch linux
asm-i386
asm-????
char
block
scsineti386
????
kernel boot mm
nfs
ext2
proc
….
..
![Page 7: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/7.jpg)
How to compile Linux Kernel
1. make config (make manuconfig)2. make depend3. make boot (generate a compressed bootable linux kernel arch/i386/boot/zIamge) make zdisk (generate kernel and write to disk dd if=zImage of=/dev/fd0) make zlilo (generate kernel and copy to /vmlinuz)
lilo: Linux Loader
![Page 8: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/8.jpg)
i386
• Segmented Addressing (segment:offset)
• Paging(Virtual Memory)
• Call Gate (Protection)
• TSS (Context Switching)
![Page 9: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/9.jpg)
T I
GDTR LDTR
GDT LDT
INDEX
SELECTOR
desc desc
OFFSET
+
Linear Address
![Page 10: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/10.jpg)
BASE LIMIT
BASE+LIMIT
BASE+8
BASE 15:0 LIMIT 15:0
BASE 31:24 AGD0 V L
LIMIT19:16 BASE 23:16TYPE
DP P S L
031
3263
Desc., Call gate, TSS
![Page 11: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/11.jpg)
yyyyy000zzzzz000
CR3
ddd ttt ooo
4K page
zzzzzooo+
PTEPDE
Page Addr. P
![Page 12: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/12.jpg)
Physical memory
Disk
Linear Address Space
4GBOS
![Page 13: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/13.jpg)
3
210
Call Gate
![Page 14: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/14.jpg)
Call TSS gate cause context switching
TSS Gate TSS desc.
CS,DS, ES…IPSP0, SP1,SP2, SP3CR3…..
in GDT
CPU
![Page 15: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/15.jpg)
• #RESET– real-address mode– self-test– EAX contains error code– EDX contains CPU id– CR0
i386 Initialization
PG
PE
TS
EM
M P
RESERVED
0
![Page 16: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/16.jpg)
EFLAGSEIPCS*DS**SSES**FSGSIDTR(base)IDTR(limit)DR7
0XXXX0002H0000FFF0H0F000H0000H0000H0000H0000H0000H00000000H03FFH0000H
Register State
* invisible part: 0FFFF0000(base) 0FFFF(limit)** invisible part: 0(base) 0FFFF(limit)
![Page 17: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/17.jpg)
FFFF0H : ROM-BIOS address* do some test* initialize interrupt vector at physical address 0* load the first sector of a bootable device to 0x7C00 (boot/bootsect.S)* jump to 0x7C00 and run
![Page 18: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/18.jpg)
Linux Kernel on Disk (vmlinux, 1,133,665 bytes)
bootsect.S Setup.S
1 sector 4 sectors
Self-extracted Kernel Image
Compressed Kernel Image (vmlinux.out, 455,321)
vmlinux (executable)
Decompressionmodule
/usr/src/linux/arch/i386/boot/zImage
![Page 19: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/19.jpg)
boot disk
CPUA20
1M
A0000
I/O & BIOS
7C000
90000
IP
64K
![Page 20: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/20.jpg)
0.5K bytes
7C000
Bootsect.S
BIOS load
IP 7C000
90000IP
bootsect.S
0.5K bytes
0.5K bytes
![Page 21: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/21.jpg)
0.5K bytes7C000
90000IP
2K bytes
90200
Setup.S
0.5K bytes7C000
0.5K bytes90000
IP
2K bytes
90200
Setup.S
10000
508K bytes
0.5K bytes
vmlinux
![Page 22: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/22.jpg)
SETUPSECS = 4 ! nr of setup-sectorsBOOTSEG = 0x07C0 ! original address of boot-sectorINITSEG = DEF_INITSEG ! we move boot here - out of the way 0x9000SETUPSEG = DEF_SETUPSEG ! setup starts here, 0x9020SYSSEG = DEF_SYSSEG ! system loaded at 0x10000 (65536)
< omitted>
mov ax,#BOOTSEG mov ds,ax mov ax,#INITSEG mov es,ax mov cx,#256 sub si,si sub di,di cld rep movsw
jmpi go,INITSEG ! Execute moved bootsectgo:
Copy bootsect.S to 0x90000
![Page 23: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/23.jpg)
<omit>load_setup:
xor dx, dx ! drive 0, head 0 mov cl,#0x02 ! sector 2, track 0 mov bx,#0x0200 ! address = 512, in INITSEG mov ah,#0x02 ! service 2, nr of sectors mov al,setup_sects ! (assume all on head 0, track 0) ! Setup_sects=4 int 0x13 ! read it (BIOS routine) jnc ok_load_setup ! ok - continue
push ax ! dump error code call print_nl mov bp, sp call print_hex pop ax
jmp load_setupok_load_setup:
Try to load setup.S from(drive 0, head 0,sector 2, track 0)to memory 0x90200
![Page 24: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/24.jpg)
<omit>! Print some inane message mov ah,#0x03 ! read cursor pos xor bh,bh int 0x10 mov cx,#9 mov bx,#0x0007 ! page 0, attribute 7 (normal) mov bp,#msg1 ! .byte 13,10 .ascii “Loading” mov ax,#0x1301 ! write string, move cursor int 0x10 ! BIOS routine
! ok, we've written the message, now! we want to load the system (at 0x10000) mov ax,#SYSSEG mov es,ax ! segment of 0x010000 call read_it ! Read 508K to 0x10000 (64K), one . per track call kill_motor ! Stop floopy motor call print_nl<omit> jmpi 0, SETUPSEG ! Jump to 0x90200 (setup.S)
Print “/nLoading”
![Page 25: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/25.jpg)
setup.S
• Check memory size
• set keyboard, video adapter, get HD data
• switch to protected mode– set GDT– set IDT– set PE bit (flush pipe)
![Page 26: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/26.jpg)
start: jmp start_of_setup! ------------------------ start of header --------------------------------!! SETUP-header, must start at CS:2 (old 0x9020:2)! .ascii "HdrS" ! Signature for SETUP-header .word 0x0201 ! Version number of header format ! (must be >= 0x0105 ! else old loadlin-1.5 will fail)
<omit>start_of_setup:
…………… (check signature)
good_sig: mov ax,cs ! aka #SETUPSEG sub ax,#DELTA_INITSEG ! aka #INITSEG mov ds,ax ! DS=9000
![Page 27: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/27.jpg)
loader_ok:! Get memory size (extended mem, kB)
mov ah,#0x88 int 0x15 mov [2],ax ! Store memory size in 0x90002 (bootsect.S)
<omit>(disable interrupts)(move kernel image to 1000)
end_move_self: lidt idt_48 ! load idt with 0,0 lgdt gdt_48 ! load gdt with whatever appropriate
idt_48:.word 0.word 0, 0
gdt_48:.word 0x800.word 512+gdt, 0x9
![Page 28: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/28.jpg)
BASE Limit
0,0 0idt_48
0x9, 512+gdt 0x800 (2048)gdt_48gdt: .word 0,0,0,0 ! dummy
.word 0,0,0,0 ! unused
.word 0xFFFF ! 4Gb - (0x100000*0x1000 = 4Gb) .word 0x0000 ! base address=0 .word 0x9A00 ! code read/exec .word 0x00CF ! granularity=4096, 386 (+5th nibble of limit)
.word 0xFFFF ! 4Gb - (0x100000*0x1000 = 4Gb) .word 0x0000 ! base address=0 .word 0x9200 ! data read/write .word 0x00CF ! granularity=4096, 386 (+5th nibble of limit)
![Page 29: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/29.jpg)
BASE 15:0 LIMIT 15:0
BASE 31:24 AGD0 V L
LIMIT19:16 BASE 23:16TYPE
DP P S L
031
3263
null
Not used
code
data
BASE=0x00000000, LIMIT=FFFFFF G=1 (4G)DPL=0 type=1010 (code, non-conforming, r/x, not accessed)
BASE=0x00000000, LIMIT=FFFFFF G=1 (4G)DPL=0 type=1010 (code, non-conforming, r/x, not accessed)
![Page 30: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/30.jpg)
! that was painless, now we enable A20, no wrapped
call empty_8042 mov al,#0xD1 ! command write out #0x64,al call empty_8042 mov al,#0xDF ! A20 on out #0x60,al call empty_8042
<omit>
mov ax,#1 ! protected mode (PE) bit lmsw ax ! This is it! Load into CR0 jmp flush_instr ! Flush pipeflush_instr: xor bx,bx ! Flag to indicate a boot
![Page 31: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/31.jpg)
! NOTE: For high loaded big kernels we need a! jmpi 0x100000,KERNEL_CS!! but we yet haven't reloaded the CS register, so the default size ! of the target offset still is 16 bit.! However, using an operant prefix (0x66), the CPU will properly! take our 48 bit far pointer. (INTeL 80386 Programmer's Reference! Manual, Mixing 16-bit and 32-bit code, page 16-6) db 0x66,0xea ! prefix + jmpi-opcodecode32: dd 0x1000 ! will be set to 0x100000 for big kernels dw KERNEL_CS ! KERNEL=0x10
0 0 0001 0000
TI
RPL
15 2 1 0
INDEX
0:GDT 1:LDT
![Page 32: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/32.jpg)
Decompress Kernelstartup_32: (gcc entry point) cld
cli movl $(KERNEL_DS),%eax # KERNEL_DS=0x18 mov %ax,%ds mov %ax,%es mov %ax,%fs mov %ax,%gs
<omit>
lss SYMBOL_NAME(stack_start),%esp xorl %eax,%eax1: incl %eax # check that A20 really IS enabled movl %eax,0x000000 # loop forever if it isn't cmpl %eax,0x100000 je 1b
![Page 33: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/33.jpg)
( clear BSS )
/* * Do the decompression, and jump to the new kernel.. */ subl $16,%esp # place for structure on the stack pushl %esp # address of structure as first arg call SYMBOL_NAME(decompress_kernel) # decompress kernel to 100000 orl %eax,%eax # gunzip 1.0.3 jnz 3f xorl %ebx,%ebx ljmp $(KERNEL_CS), $0x100000 # jump to decompressed kernel
![Page 34: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/34.jpg)
100000
101000
102000
103000
104000
105000
106000
swapper_pg_dir
pg0
empty_bad_page
empty_bad_page_table
empty_zero_page
stack
idtgdt
EIP
head.S
(copy parameters from 0x90000)
![Page 35: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/35.jpg)
100000
101000
102000
103000
104000
105000
106000
PG_DIR
PG0
empty_bad_page
empty_bad_page_table
empty_zero_page
stack
idtgdt
CR3
0
768 4M
Physical Memory
Setup Paging Table & Enable Paging
![Page 36: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/36.jpg)
100000
101000
102000
103000
104000
105000
106000
PG_DIR
PG0
empty_bad_page
empty_bad_page_table
empty_zero_page
stack
idtgdtGDTR
NULL0
00
2*NR_TASKS
C0000000 1G DPL=0 codeC0000000 1G DPL=0 data00000000 3G DPL=3 code00000000 3G DPL=3 data
0x100x180x230x2b
Setup GDT
![Page 37: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/37.jpg)
100000
101000
102000
103000
104000
105000
106000
PG_DIR
PG0
empty_bad_page
empty_bad_page_table
empty_zero_page
stack
idtgdt
255
0 GDT
ignore_int
IDTR
Setup IDT
![Page 38: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/38.jpg)
call setup_paging
setup_paging: movl $1024*2,%ecx /* 2 pages - swapper_pg_dir+1 page table */ xorl %eax,%eax movl $ SYMBOL_NAME(swapper_pg_dir),%edi /* swapper_pg_dir is at 0x1000 */ cld;rep;stosl/* Identity-map the kernel in low 4MB memory for ease of transition *//* set present bit/user r/w */ movl $ SYMBOL_NAME(pg0)+7,SYMBOL_NAME(swapper_pg_dir)/* But the real place is at 0xC0000000 *//* set present bit/user r/w */ movl $ SYMBOL_NAME(pg0)+7,SYMBOL_NAME(swapper_pg_dir)+3072 movl $ SYMBOL_NAME(pg0)+4092,%edi movl $0x03ff007,%eax /* 4Mb - 4096 + 7 (r/w user,p) */ std1: stosl /* fill the page backwards - more efficient :-) */ subl $0x1000,%eax jge 1b cld
![Page 39: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/39.jpg)
movl $ SYMBOL_NAME(swapper_pg_dir),%eax movl %eax,%cr3 /* cr3 - page directory start */ movl %cr0,%eax orl $0x80000000,%eax movl %eax,%cr0 /* set paging (PG) bit */ ret /* this also flushes the prefetch-queue */
31 12 6 5 2 1 0
Page Address D AU /S
R /W
P
Format of PDE & PTE
![Page 40: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/40.jpg)
lgdt gdt_descr
gdt_descr: .word (8+2*NR_TASKS)*8-1 .long 0xc0000000+SYMBOL_NAME(gdt)
ENTRY(gdt) .quad 0x0000000000000000 /* NULL descriptor */ .quad 0x0000000000000000 /* not used */ .quad 0xc0c39a000000ffff /* 0x10 kernel 1GB code at 0xC0000000 */ .quad 0xc0c392000000ffff /* 0x18 kernel 1GB data at 0xC0000000 */ .quad 0x00cbfa000000ffff /* 0x23 user 3GB code at 0x00000000 */ .quad 0x00cbf2000000ffff /* 0x2b user 3GB data at 0x00000000 */ .quad 0x0000000000000000 /* not used */ .quad 0x0000000000000000 /* not used */ .fill 2*NR_TASKS,8,0 /* space for LDT's and TSS's etc */
![Page 41: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/41.jpg)
(setup data segments and clear BSS)call setup_idt
setup_idt: lea ignore_int,%edx movl $(KERNEL_CS << 16),%eax movw %dx,%ax /* selector = 0x0010 = cs */ movw $0x8E00,%dx /* interrupt gate - dpl=0, present */
lea SYMBOL_NAME(idt),%edi mov $256,%ecxrp_sidt: movl %eax,(%edi) movl %edx,4(%edi) addl $8,%edi dec %ecx jne rp_sidt ret
![Page 42: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/42.jpg)
SELECTOR OFFSET
OFFSET 8 E 0 0
interrupt gate
ignore_int: just print “Unknown Interrupt”
![Page 43: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/43.jpg)
lidt idt_descr ljmp $(KERNEL_CS),$1f1: movl $(KERNEL_DS),%eax # reload all the segment registers mov %ax,%ds # after changing gdt. mov %ax,%es mov %ax,%fs mov %ax,%gs
call SYMBOL_NAME(start_kernel) # jump to C main routine
![Page 44: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/44.jpg)
start_kernelasmlinkage void start_kernel(void) {
setup_arch(&command_line, &memory_start, &memory_end); memory_start = paging_init(memory_start,memory_end); trap_init(); init_IRQ();
<-------------- omit ---------------->
memory_start = console_init(memory_start,memory_end);
memory_start = kmalloc_init(memory_start,memory_end); sti(); # enable interrupt
![Page 45: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/45.jpg)
memory_start = inode_init(memory_start,memory_end); memory_start = file_table_init(memory_start,memory_end); memory_start = name_cache_init(memory_start,memory_end);
mem_init(memory_start,memory_end);
<---------- omit ------------->
printk(linux_banner);
sysctl_init(); kernel_thread(init, NULL, 0); cpu_idle(NULL);}
![Page 46: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/46.jpg)
setup_arch
1M
kernelmemory_start
memory_start = (unsigned long) &_end;
memory_end
memory_end = (1<<20) + (EXT_MEM_K<<10); memory_end &= PAGE_MASK;
#define PARAM empty_zero_page#define EXT_MEM_K (*(unsigned short *) (PARAM+2))
![Page 47: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/47.jpg)
init_task.mm->start_code = TASK_SIZE; /* 0xC0000000 */ init_task.mm->end_code = TASK_SIZE + (unsigned long) &_etext; init_task.mm->end_data = TASK_SIZE + (unsigned long) &_edata; init_task.mm->brk = TASK_SIZE + (unsigned long) &_end;
/ * "mem=XXX[kKmM]" overrides the BIOS-reported memory size */
if (c == ' ' && *(const unsigned long *)from == *(const unsigned long *)"mem=")
memory_end = simple_strtoul(from+4, &from, 0); if ( *from == 'K' || *from == 'k' ) { memory_end = memory_end << 10; from++; } else if ( *from == 'M' || *from == 'm' ) { memory_end = memory_end << 20; from++; }
![Page 48: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/48.jpg)
paging_init
1M
kernelpg_dir
pg0
memory_startpg1
pg2
pgn01
768769
pg0pg1pg2
pgn
n
4M
4M
![Page 49: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/49.jpg)
start_mem = PAGE_ALIGN(start_mem); address = 0; pg_dir = swapper_pg_dir; while (address < end_mem) {
/* map the memory at virtual addr 0xC0000000 */ pg_table = (pte_t *) (PAGE_MASK & pgd_val(pg_dir[768])); if (!pg_table) { pg_table = (pte_t *) start_mem; start_mem += PAGE_SIZE; }
/* also map it temporarily at 0x0000000 for init */ pgd_val(pg_dir[0]) = _PAGE_TABLE | (unsigned long) pg_table; pgd_val(pg_dir[768]) = _PAGE_TABLE | (unsigned long) pg_table; pg_dir++;
![Page 50: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/50.jpg)
for (tmp = 0 ; tmp < PTRS_PER_PTE ; tmp++,pg_table++) { if (address < end_mem) set_pte(pg_table, mk_pte(address, PAGE_SHARED)); else pte_clear(pg_table); address += PAGE_SIZE; } } local_flush_tlb(); /* move cr3, r?; mov r?, cr3; */ return free_area_init(start_mem, end_mem);
![Page 51: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/51.jpg)
free_area_init
1. Set min_free_pages2. Initialize swap cache3. Mark all pages reserved4. Initialize Buddy system for free memory management
![Page 52: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/52.jpg)
Free Memory Management (Tanenbaum)• Bitmap
• Linked list (first-fit, next-fit, best-fit, quick-fit)
0 2 4 6 8 10 12 14 16
0011000011100100
P 0 2 H 2 2 P 4 4 H 8 3
P 11 2 H13 1 P 14 2
![Page 53: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/53.jpg)
Buddy System
A
B
C
A
A B
B
B D
D
C
C
C
C
Initialization
request A (2)
request B (1)
request C (2)
free A*
request D (1)
free B
free D
free C
0 2 4 6 8 10 12 14 16page
![Page 54: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/54.jpg)
B
0
1
0
0
0
0
0
00
0
1
1
0
0
0
1
2
3
free_area
0 1 2 3 4 5 6 7 8 9101112131415
mem_map
8
0 6
3
C
Request D (1)
![Page 55: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/55.jpg)
0
0
0
0
0
0
0
00
0
1
1
0
0
0
1
2
3
free_area
0 1 2 3 4 5 6 7 8 9101112131415
mem_map
8
0 6
C
BD
Free B
![Page 56: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/56.jpg)
0
1
0
0
0
0
0
00
0
1
1
0
0
0
1
2
3
free_area
0 1 2 3 4 5 6 7 8 9101112131415
mem_map
8
0 6
C
D
2
Free D
![Page 57: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/57.jpg)
0
0
0
0
0
0
0
00
0
1
0
1
0
0
1
2
3
free_area
0 1 2 3 4 5 6 7 8 9101112131415
mem_map
8
6
C0
Free C
![Page 58: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/58.jpg)
0
0
0
0
0
0
0
00
0
0
0
0
0
0
1
2
3
free_area
0 1 2 3 4 5 6 7 8 9101112131415
mem_map
80
Request 2
![Page 59: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/59.jpg)
0
0
0
0
0
0
0
00
0
0
1
1
0
0
1
2
3
free_area
0 1 2 3 4 5 6 7 8 9101112131415
mem_map
8
4
2
![Page 60: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/60.jpg)
Kernel
pg1-pgn
swap cache
mem_map
free_area[].bitmap
start_mem
(4 bytes per page)
typedef struct page { /* these must be first (free area handling) */ struct page *next; struct page *prev; struct inode *inode; unsigned long offset; ……….. atomic_t count; unsigned flags; unsigned dirty:16, age:8; ……... unsigned long map_nr; /* page->map_nr == page - mem_map */} mem_map_t;
![Page 61: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/61.jpg)
0
0
0
0
0
0
0
00
0
0
0
0
0
0
1
2
3
free_area
0 1 2 3 4 5 6 7 8 9101112131415
mem_map
![Page 62: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/62.jpg)
unsigned long free_area_init(unsigned long start_mem, unsigned long end_mem){
/* * select nr of pages we try to keep free for important stuff * with a minimum of 48 pages. This is totally arbitrary */ i = (end_mem - PAGE_OFFSET) >> (PAGE_SHIFT+7); if (i < 24) i = 24; i += 24; /* The limit for buffer pages in __get_free_pages is * decreased by 12+(i>>3) */ min_free_pages = i;
start_mem = init_swap_cache(start_mem, end_mem); mem_map = (mem_map_t *) start_mem; p = mem_map + MAP_NR(end_mem); start_mem = LONG_ALIGN((unsigned long) p); memset(mem_map, 0, start_mem - (unsigned long) mem_map);
![Page 63: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/63.jpg)
do { --p; p->flags = (1 << PG_DMA) | (1 << PG_reserved); p->map_nr = p - mem_map; } while (p > mem_map); /* 6 */ for (i = 0 ; i < NR_MEM_LISTS ; i++) { unsigned long bitmap_size; init_mem_queue(free_area+i); mask += mask; /* mask *=2 */ end_mem = (end_mem + ~mask) & mask; /* should be i+1 */ bitmap_size = (end_mem - PAGE_OFFSET) >> (PAGE_SHIFT + i); bitmap_size = (bitmap_size + 7) >> 3; bitmap_size = LONG_ALIGN(bitmap_size); free_area[i].map = (unsigned int *) start_mem; memset((void *) start_mem, 0, bitmap_size); start_mem += bitmap_size; } return start_mem;}
![Page 64: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/64.jpg)
trap_init
1. Setup interrupt routines2. Int 0x80 for system call3. Setup TSS and LDT in GDT for each task
![Page 65: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/65.jpg)
486 Exceptions
0 Fault Divided by Zero1 Fault Debug…..0B Fault Not Present…..0D Fault General Protection0E Fault Page Fault
…..
20-FF Int/Trap Used for OS
![Page 66: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/66.jpg)
void trap_init(void){ set_call_gate(&default_ldt,lcall7); set_trap_gate(0,÷_error); set_trap_gate(1,&debug); set_trap_gate(2,&nmi); set_system_gate(3,&int3); /* int3-5 can be called from all */ set_system_gate(4,&overflow); set_system_gate(5,&bounds); set_trap_gate(6,&invalid_op); set_trap_gate(7,&device_not_available); set_trap_gate(8,&double_fault); set_trap_gate(9,&coprocessor_segment_overrun); set_trap_gate(10,&invalid_TSS); set_trap_gate(11,&segment_not_present); set_trap_gate(12,&stack_segment); set_trap_gate(13,&general_protection); set_trap_gate(14,&page_fault); set_trap_gate(15,&spurious_interrupt_bug); set_trap_gate(16,&coprocessor_error); set_trap_gate(17,&alignment_check);
![Page 67: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/67.jpg)
for (i=18;i<48;i++) set_trap_gate(i,&reserved); set_system_gate(0x80,&system_call); /* set up GDT task & ldt entries */ p = gdt+FIRST_TSS_ENTRY; set_tss_desc(p, &init_task.tss); /* init_task: hardwired task #0 */ p++; set_ldt_desc(p, &default_ldt, 1); p++;
for(i=1 ; i<NR_TASKS ; i++) { p->a=p->b=0; p++; p->a=p->b=0; p++; }
![Page 68: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/68.jpg)
set_call_gate(a, addr) set_gate(a, 12, 3, addr)
set_trap_gate(n, addr) set_gate(&idt[n], 15, 0, addr)
set_system_gate(n, addr) set_gate(&idt[n], 15, 3, addr)
set_intr_gate(n, addr) set_gate(&idt[n], 14, 0, addr)
#define _set_gate(gate_addr,type,dpl,addr) \__asm__ __volatile__ ("movw %%dx,%%ax\n\t" \ "movw %2,%%dx\n\t" \ "movl %%eax,%0\n\t" \ "movl %%edx,%1" \ :"=m" (*((long *) (gate_addr))), \ "=m" (*(1+(long *) (gate_addr))) \ :"i" ((short) (0x8000+(dpl<<13)+(type<<8))), \ "d" ((char *) (addr)),"a" (KERNEL_CS << 16) \ :"ax","dx")
![Page 69: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/69.jpg)
SEGMENT SELECTOR OFFSET 15:0
OFFSET 31:24 DP P L
031
3263
TYPE 000 RESERVED
Descriptor in IDT
![Page 70: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/70.jpg)
mem_init
• Reserve kernel and I/O pages
• Return all unused pages to buddy system
![Page 71: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/71.jpg)
pg1-pgn
swap_cache
mem_map
free_area[].map
Console,PCI & FS
end_text
reserved
0x100000
0xA0000
data
code
start_mem
high_mem
start_low_mem4K
![Page 72: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/72.jpg)
void mem_init(unsigned long start_mem, unsigned long end_mem){ end_mem &= PAGE_MASK; high_memory = end_mem;
/* mark usable pages in the mem_map[] */ start_low_mem = PAGE_ALIGN(start_low_mem);
start_mem = PAGE_ALIGN(start_mem);
/* * IBM messed up *AGAIN* in their thinkpad: 0xA0000 -> 0x9F000. * They seem to have done something stupid with the floppy * controller as well.. */ while (start_low_mem < 0x9f000) { clear_bit(PG_reserved, &mem_map[MAP_NR(start_low_mem)].flags); start_low_mem += PAGE_SIZE; }
![Page 73: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/73.jpg)
while (start_mem < high_memory) { clear_bit(PG_reserved, &mem_map[MAP_NR(start_mem)].flags); start_mem += PAGE_SIZE; }
for (tmp = 0 ; tmp < high_memory ; tmp += PAGE_SIZE) { if (tmp >= MAX_DMA_ADDRESS) /* 16M */ clear_bit(PG_DMA, &mem_map[MAP_NR(tmp)].flags); if (PageReserved(mem_map+MAP_NR(tmp))) { if (tmp >= 0xA0000 && tmp < 0x100000) reservedpages++; else if (tmp < (unsigned long) &_etext) codepages++; else datapages++; continue; } mem_map[MAP_NR(tmp)].count = 1;
free_page(tmp); }
![Page 74: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/74.jpg)
tmp = nr_free_pages << PAGE_SHIFT;
printk("Memory: %luk/%luk available (%dk kernel code, %dk reserved, %dk data)\n", tmp >> 10, high_memory >> 10, codepages << (PAGE_SHIFT-10), reservedpages << (PAGE_SHIFT-10), datapages << (PAGE_SHIFT-10));
return;}
![Page 75: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/75.jpg)
#define free_page(addr) free_pages((addr),0)
void free_pages(unsigned long addr, unsigned long order){ unsigned long map_nr = MAP_NR(addr);
if (map_nr < MAP_NR(high_memory)) { mem_map_t * map = mem_map + map_nr; if (PageReserved(map)) return; if (atomic_dec_and_test(&map->count)) { delete_from_swap_cache(map_nr); free_pages_ok(map_nr, order); return; } }}
![Page 76: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/76.jpg)
static inline void free_pages_ok(unsigned long map_nr, unsigned long order){ struct free_area_struct *area = free_area + order; unsigned long index = map_nr >> (1 + order); unsigned long mask = (~0UL) << order;
cli();
#define list(x) (mem_map+(x)) map_nr &= mask;
nr_free_pages -= mask; /* -mask = 1+~mask */ while (mask + (1 << (NR_MEM_LISTS-1))) { if (!change_bit(index, area->map) ) break; remove_mem_queue(list(map_nr ^ -mask)); /* neighbor */ mask <<= 1; area++; index >>= 1; map_nr &= mask; } add_mem_queue(area, list(map_nr));#undef list}
![Page 77: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/77.jpg)
extern inline unsigned long get_free_page(int priority){ unsigned long page;
page = __get_free_page(priority); if (page) memset((void *) page, 0, PAGE_SIZE); return page;}
#define __get_free_page(priority) __get_free_pages((priority),0,0)
![Page 78: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/78.jpg)
unsigned long __get_free_pages(int priority, unsigned long order, int dma){ unsigned long flags; int reserved_pages;
if (order >= NR_MEM_LISTS) return 0; if (intr_count && priority != GFP_ATOMIC) { static int count = 0; if (++count < 5) { printk("gfp called nonatomically from interrupt %p\n", __builtin_return_address(0)); priority = GFP_ATOMIC; } } reserved_pages = 5; if (priority != GFP_NFS) reserved_pages = min_free_pages; if ((priority == GFP_BUFFER || priority == GFP_IO) && reserved_pages >= 48) reserved_pages -= (12 + (reserved_pages>>3)); save_flags(flags);
![Page 79: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/79.jpg)
repeat: cli(); if ((priority==GFP_ATOMIC) || nr_free_pages > reserved_pages) { RMQUEUE(order, dma); restore_flags(flags); return 0; } restore_flags(flags); if (priority != GFP_BUFFER && try_to_free_page(priority, dma, 1)) goto repeat; return 0;}
![Page 80: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/80.jpg)
/* * Some ugly macros to speed up __get_free_pages().. */#define MARK_USED(index, order, area) \ change_bit((index) >> (1+(order)), (area)->map)#define CAN_DMA(x) (PageDMA(x))#define ADDRESS(x) (PAGE_OFFSET + ((x) << PAGE_SHIFT))
![Page 81: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/81.jpg)
#define RMQUEUE(order, dma) \do { struct free_area_struct * area = free_area+order; \ unsigned long new_order = order; \ do { struct page *prev = memory_head(area), *ret; \ while (memory_head(area) != (ret = prev->next)) { \ if (!dma || CAN_DMA(ret)) { \ unsigned long map_nr = ret->map_nr; \ (prev->next = ret->next)->prev = prev; \ MARK_USED(map_nr, new_order, area); \ nr_free_pages -= 1 << order; \ EXPAND(ret, map_nr, order, new_order, area); \ restore_flags(flags); \ return ADDRESS(map_nr); \ } \ prev = ret; \ } \ new_order++; area++; \ } while (new_order < NR_MEM_LISTS); \} while (0)
![Page 82: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/82.jpg)
#define EXPAND(map,index,low,high,area) \do { unsigned long size = 1 << high; \ while (high > low) { \ area--; high--; size >>= 1; \ add_mem_queue(area, map); \ MARK_USED(index, high, area); \ index += size; \ map += size; \ } \ map->count = 1; \ map->age = PAGE_INITIAL_AGE; \} while (0)
![Page 83: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/83.jpg)
kernel_threadcall sys_clone();
if (StackIsChanged() /* new process */) { call fn(args); sys_exit();} else { /* do nothing */ /* task[0] goes through here*/}
CPU_idle()
sys_idle()
schedule()
![Page 84: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/84.jpg)
static inline pid_t kernel_thread(int (*fn)(void *), void * arg, unsigned long flags){ long retval;
__asm__ __volatile__( "movl %%esp,%%esi\n\t" "int $0x80\n\t" /* Linux/i386 system call */ "cmpl %%esp,%%esi\n\t" /* child or parent? */ "je 1f\n\t" /* parent - jump */ "pushl %3\n\t" /* push argument */ "call *%4\n\t" /* call fn */ "movl %2,%0\n\t" /* exit */ "int $0x80\n" "1:\t" :"=a" (retval) :"0" (__NR_clone), "i" (__NR_exit), "r" (arg), "r" (fn), "b" (flags | CLONE_VM) :"si"); return retval;}
![Page 85: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/85.jpg)
System Calls/* * This file contains the system call numbers. Unistd.h */
#define __NR_setup 0 /* used only by init, to get system going */#define __NR_exit 1#define __NR_fork 2#define __NR_read 3#define __NR_write 4#define __NR_open 5……..#define __NR_clone 120……..#define __NR_sched_rr_get_interval 161#define __NR_nanosleep 162#define __NR_mremap 163
![Page 86: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/86.jpg)
.data /* entry.S */ENTRY(sys_call_table) .long SYMBOL_NAME(sys_setup) /* 0 */ .long SYMBOL_NAME(sys_exit) .long SYMBOL_NAME(sys_fork) .long SYMBOL_NAME(sys_read) .long SYMBOL_NAME(sys_write) .long SYMBOL_NAME(sys_open) /* 5 */…….. .long SYMBOL_NAME(sys_clone) /* 120 */…….. .long SYMBOL_NAME(sys_sched_rr_get_interval) .long SYMBOL_NAME(sys_nanosleep) .long SYMBOL_NAME(sys_mremap) .long 0,0 .long SYMBOL_NAME(sys_vm86) .space (NR_syscalls-166)*4 /* 256 */
![Page 87: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/87.jpg)
Pseudo Code for System Call
if (sys_call_num >= NR_syscalls) return -ENOSYS;else { if (sys_call_table[sys_call_sum]==NULL) return -ENOSYS; if (PF_TRACESYS) { syscall_trace(); call sys_call_table[sys_call_num]; syscall_trace(); } else call sys_call_table[sys_call_num];
![Page 88: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/88.jpg)
ENTRY(system_call) pushl %eax # save orig_eax, for syscall_trace (strace) SAVE_ALL
0(%esp) - %ebx 4(%esp) - %ecx 8(%esp) - %edx C(%esp) - %esi 10(%esp) - %edi 14(%esp) - %ebp # SAVE_ALL 18(%esp) - %eax 1C(%esp) - %ds 20(%esp) - %es 24(%esp) - %fs 28(%esp) - %gs 2C(%esp) - orig_eax # pushl %eax 30(%esp) - %eip 34(%esp) - %cs # push by CPU, int 0x80 38(%esp) - %eflags 3C(%esp) - %oldesp # push by CPU, stack switching 40(%esp) - %oldss
STACK
![Page 89: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/89.jpg)
movl $-ENOSYS,EAX(%esp) cmpl $(NR_syscalls),%eax # EAX=SYS_CALL_NUM jae ret_from_sys_call movl SYMBOL_NAME(sys_call_table)(,%eax,4),%eax testl %eax,%eax je ret_from_sys_call
…….. testb $0x20,flags(%ebx) # PF_TRACESYS jne 1f call *%eax movl %eax,EAX(%esp) # save the return value jmp ret_from_sys_call ALIGN1: call SYMBOL_NAME(syscall_trace) movl ORIG_EAX(%esp),%eax call SYMBOL_NAME(sys_call_table)(,%eax,4) movl %eax,EAX(%esp) # save the return value
call SYMBOL_NAME(syscall_trace)
![Page 90: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/90.jpg)
sys_cloneasmlinkage int sys_clone(struct pt_regs regs){ unsigned long clone_flags; unsigned long newsp;
clone_flags = regs.ebx; newsp = regs.ecx; if (!newsp) newsp = regs.esp; return do_fork(clone_flags, newsp, ®s);}
![Page 91: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/91.jpg)
do_fork
• Copy process structure from parent
![Page 92: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/92.jpg)
int do_fork(unsigned long clone_flags, unsigned long usp, struct pt_regs *regs){ int nr; int error = -ENOMEM; unsigned long new_stack; struct task_struct *p;
p = (struct task_struct *) kmalloc(sizeof(*p), GFP_KERNEL); if (!p) goto bad_fork; new_stack = alloc_kernel_stack(); /* get_free_page(GFP_KERNEL) */ if (!new_stack) goto bad_fork_free_p; error = -EAGAIN; nr = find_empty_process(); if (nr < 0) goto bad_fork_free_stack;
*p = *current;
![Page 93: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/93.jpg)
if (p->exec_domain && p->exec_domain->use_count) (*p->exec_domain->use_count)++; if (p->binfmt && p->binfmt->use_count) (*p->binfmt->use_count)++;
p->did_exec = 0; p->swappable = 0; p->kernel_stack_page = new_stack; *(unsigned long *) p->kernel_stack_page = STACK_MAGIC; p->state = TASK_UNINTERRUPTIBLE; p->flags &= ~(PF_PTRACED|PF_TRACESYS|PF_SUPERPRIV); p->flags |= PF_FORKNOEXEC; p->pid = get_pid(clone_flags); p->next_run = NULL; p->prev_run = NULL; p->p_pptr = p->p_opptr = current; p->p_cptr = NULL; init_waitqueue(&p->wait_chldexit); p->signal = 0;
![Page 94: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/94.jpg)
p->it_real_value = p->it_virt_value = p->it_prof_value = 0; p->it_real_incr = p->it_virt_incr = p->it_prof_incr = 0; init_timer(&p->real_timer); p->real_timer.data = (unsigned long) p; p->leader = 0; /* session leadership doesn't inherit */ p->tty_old_pgrp = 0; p->utime = p->stime = 0; p->cutime = p->cstime = 0;
p->start_time = jiffies; task[nr] = p; SET_LINKS(p); nr_tasks++;
error = -ENOMEM; /* copy all the process information */ if (copy_files(clone_flags, p)) goto bad_fork_cleanup; if (copy_fs(clone_flags, p)) goto bad_fork_cleanup_files;
![Page 95: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/95.jpg)
if (copy_sighand(clone_flags, p)) goto bad_fork_cleanup_fs; if (copy_mm(clone_flags, p)) goto bad_fork_cleanup_sighand; copy_thread(nr, clone_flags, usp, p, regs); p->semundo = NULL;
/* ok, now we should be set up.. */ p->swappable = 1; p->exit_signal = clone_flags & CSIGNAL; p->counter = (current->counter >>= 1); wake_up_process(p); /* state=TASK_RUNNING insert into run_queue */ ++total_forks; return p->pid; /* error handler */}
![Page 96: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/96.jpg)
Process’s Virtual Memory
mm
Process’s Virtual Memory
countpgd
mmapmmap_avlmmap_sem
mm_struct
task_struct
vm_endvm_startvm_flagsvm_inodevm_ops
vm_next
vm_endvm_startvm_flagsvm_inodevm_ops
vm_next
vm_area_struct
code
data
nopagewppageswapout….
![Page 97: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/97.jpg)
struct mm_struct { int count; pgd_t * pgd; unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack, start_mmap; unsigned long arg_start, arg_end, env_start, env_end; unsigned long rss, total_vm, locked_vm; unsigned long def_flags; struct vm_area_struct * mmap; struct vm_area_struct * mmap_avl; struct semaphore mmap_sem;};#define INIT_MM { \ 1, \ swapper_pg_dir, \ 0, 0, 0, 0, \ 0, 0, 0, 0, \ 0, 0, 0, 0, \ 0, 0, 0, \ 0, \ &init_mmap, &init_mmap, MUTEX }
![Page 98: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/98.jpg)
struct vm_area_struct { struct mm_struct * vm_mm; /* VM area parameters */ unsigned long vm_start; unsigned long vm_end; pgprot_t vm_page_prot; unsigned short vm_flags;/* AVL tree of VM areas per task, sorted by address */ short vm_avl_height; struct vm_area_struct * vm_avl_left; struct vm_area_struct * vm_avl_right;/* linked list of VM areas per task, sorted by address */ struct vm_area_struct * vm_next;/* more */ struct vm_operations_struct * vm_ops; unsigned long vm_offset; struct inode * vm_inode; unsigned long vm_pte; /* shared mem */};
#define INIT_MMAP { &init_mm, 0, 0x40000000, PAGE_SHARED, VM_READ | VM_WRITE | VM_EXEC }
![Page 99: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/99.jpg)
copy_thread
Copy TSS from parent and set some private fields
![Page 100: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/100.jpg)
void copy_thread(int nr, unsigned long clone_flags, unsigned long esp, struct task_struct * p, struct pt_regs * regs){ int i; struct pt_regs * childregs;
p->tss.es = KERNEL_DS; p->tss.cs = KERNEL_CS; p->tss.ss = KERNEL_DS; p->tss.ds = KERNEL_DS; p->tss.fs = USER_DS; p->tss.gs = KERNEL_DS; p->tss.ss0 = KERNEL_DS; p->tss.esp0 = p->kernel_stack_page + PAGE_SIZE; p->tss.tr = _TSS(nr); childregs = ((struct pt_regs *) (p->kernel_stack_page + PAGE_SIZE)) - 1; p->tss.esp = (unsigned long) childregs; p->tss.eip = (unsigned long) ret_from_sys_call; *childregs = *regs;
![Page 101: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/101.jpg)
childregs->eax = 0; childregs->esp = esp; p->tss.back_link = 0; p->tss.eflags = regs->eflags & 0xffffcfff; /* iopl is always 0 for a new process */ p->tss.ldt = _LDT(nr); set_tss_desc(gdt+(nr<<1)+FIRST_TSS_ENTRY,&(p->tss));
p->tss.bitmap = offsetof(struct thread_struct,io_bitmap); for (i = 0; i < IO_BITMAP_SIZE+1 ; i++) /* IO bitmap is actually SIZE+1 */ p->tss.io_bitmap[i] = ~0;}
![Page 102: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/102.jpg)
ret_from_sys_call
• All slow interrupts and system calls end here
![Page 103: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/103.jpg)
ret_from_sys_call: cmpl $0,SYMBOL_NAME(intr_count) /* handle interrupts */ jne 2f9: movl SYMBOL_NAME(bh_mask),%eax andl SYMBOL_NAME(bh_active),%eax jne handle_bottom_half
1: sti cmpl $0,SYMBOL_NAME(need_resched) /* to see if we need reschedule*/ jne reschedule ………….
2: RESTORE_ALL
![Page 104: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/104.jpg)
#define RESTORE_ALL \ ………….. popl %ebx; \ popl %ecx; \ popl %edx; \ popl %esi; \ popl %edi; \ popl %ebp; \ popl %eax; \ pop %ds; \ pop %es; \ pop %fs; \ pop %gs; \ addl $4,%esp; \ iret
![Page 105: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/105.jpg)
schedule
• Task->count: dynamic priority
• Task->priority: static priority
• time interrupt: (100Hz)
jiffies++
if (current->count <= 0)
need_resched=1;
• run queue: links all RUNNABLE tasks
![Page 106: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/106.jpg)
asmlinkage void schedule(void){ int c; struct task_struct * p; struct task_struct * prev, * next; unsigned long timeout = 0;
/* check alarm, wake up any interruptible tasks that have got a signal */
allow_interrupts();
if (intr_count) goto scheduling_in_interrupt;
if (bh_active & bh_mask) { intr_count = 1; do_bottom_half(); intr_count = 0; }
![Page 107: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/107.jpg)
need_resched = 0; prev = current; cli(); /* move an exhausted RR process to be last.. */ if (!prev->counter && prev->policy == SCHED_RR) { prev->counter = prev->priority; move_last_runqueue(prev); } …………. p = init_task.next_run; sti(); c = -1000; next = idle_task; while (p != &init_task) { int weight = goodness(p, prev, this_cpu); if (weight > c) c = weight, next = p; p = p->next_run; }
![Page 108: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/108.jpg)
/* if all runnable processes have "counter == 0", re-calculate counters */ if (!c) { for_each_task(p) p->counter = (p->counter >> 1) + p->priority; } if (prev != next) { kstat.context_swtch++; ………….. switch_to(prev,next); } return;}
#define switch_to(prev,next) do { \__asm__("movl %2,"SYMBOL_NAME_STR(current_set)"\n\t" \ "ljmp %0\n\t" \ …………….. : /* no outputs */ \ :"m" (*(((char *)&next->tss.tr)-4)), \ "r" (prev), "r" (next)); \} while (0)
![Page 109: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/109.jpg)
process #1
int 80
system_call
ret_from_sys_call
need_reschedschedule
switch_to
return ret_from_sys_call
iret
process #2
Process Switching
![Page 110: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/110.jpg)
Page FaultWhen page fault occurs:
error_codeEIPCSEFLAGSold ESPold SS
U /S
W / R
P
CR2: contains fault address
Jump to interrupt handlingroutine for int 0x0Estack
![Page 111: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/111.jpg)
ENTRY(page_fault) pushl $ SYMBOL_NAME(do_page_fault) jmp error_code
![Page 112: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/112.jpg)
0(%esp) - %ebx 4(%esp) - %ecx 8(%esp) - %edx C(%esp) - %esi 10(%esp) - %edi 14(%esp) - %ebp # pushl ….. 18(%esp) - %eax 1C(%esp) - %ds 20(%esp) - %es 24(%esp) - %fs 28(%esp) - %gs 2C(%esp) - orig_eax # error_code pushed by CPU 30(%esp) - %eip 34(%esp) - %cs # push by CPU, int 0x80 38(%esp) - %eflags 3C(%esp) - %oldesp # push by CPU, stack switching 40(%esp) - %oldss
STACK
# addr. of do_page_fault
![Page 113: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/113.jpg)
error_code: push %fs push %es push %ds pushl %eax xorl %eax,%eax pushl %ebp pushl %edi pushl %esi pushl %edx decl %eax # eax = -1 pushl %ecx pushl %ebx cld xorl %ebx,%ebx # zero ebx xchgl %eax, ORIG_EAX(%esp) # orig_eax (get the error code. ) mov %gs,%bx # get the lower order bits of gs movl %esp,%edx xchgl %ebx, GS(%esp) # get the address and save gs. pushl %eax # push the error code (argument) pushl %edx
![Page 114: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/114.jpg)
movl $(KERNEL_DS),%edx mov %dx,%ds mov %dx,%es movl $(USER_DS),%edx mov %dx,%fs
movl SYMBOL_NAME(current_set),%eax
call *%ebx # call do_page_fault
addl $8,%esp # make a similar stack as system call
jmp ret_from_sys_call
![Page 115: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/115.jpg)
do_page_fault
• This routine handles page faults. It determines the address, and the problem, and then passes it off to one of the appropriate routines.
• error_code:
bit 0 == 0 means no page found,
1 means protection fault
bit 1 == 0 means read, 1 means write
bit 2 == 0 means kernel, 1 means user-mode
![Page 116: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/116.jpg)
asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code){ void (*handler)(struct task_struct *, struct vm_area_struct *, unsigned long, int); struct task_struct *tsk = current; struct mm_struct *mm = tsk->mm; struct vm_area_struct * vma; ….
/* get the address */ __asm__("movl %%cr2,%0":"=r" (address)); vma = find_vma(mm, address); if (!vma) goto bad_area; if (vma->vm_start <= address) goto good_area; …...
![Page 117: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/117.jpg)
/* * Something tried to access memory that isn't in our memory map.. * Fix it, but check if it's kernel or user first.. */bad_area: if (error_code & 4) { /* user mode, kill it */ tsk->tss.cr2 = address; tsk->tss.error_code = error_code; tsk->tss.trap_no = 14; force_sig(SIGSEGV, tsk); return; }
…...}
![Page 118: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/118.jpg)
good_area: handler = do_no_page; switch (error_code & 3) { default: /* 3: write, present */ handler = do_wp_page; /* fall through */ case 2: /* write, not present */ if (!(vma->vm_flags & VM_WRITE)) goto bad_area; break; case 1: /* read, present */ goto bad_area; case 0: /* read, not present */ if (!(vma->vm_flags & (VM_READ | VM_EXEC))) goto bad_area; } handler(tsk, vma, address, write); .….. return;
![Page 119: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/119.jpg)
not present present
write check if you can writedo_no_page do_wp_page
read check if you bad_area can read do_no_page
![Page 120: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/120.jpg)
do_no_page1. Address is present in memory, just return2. Address in swap area, call so_swap_page to swap it in
cr3
tskpage
disk
![Page 121: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/121.jpg)
3. If no nopage routine is defined in the vm_area_struct, get a free page and link. (uninitialized data)
4. If a nopage routine is defined in the vm_area_struct, call it (file_mmap_nopage, tries to share pages with other tasks)
cr3
tskpage
get_free_page
![Page 122: Operating System Design - Linux Instructor: Ching-Chi Hsu TA:Yung-Yu Chuang.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd5227/html5/thumbnails/122.jpg)
do_wp_page1. Address not present, return2. Page is PAGE_RW, return3. If the page is referenced by only one task (count==1), make it PAGE_RW.4. If the page is referenced by more than one task, copy a new page and make it PAGE_RW.
cr3
tsk1 page
cr3
tsk
New pageset PAGE_RW
copy