Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& •...
Transcript of Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& •...
![Page 1: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/1.jpg)
Disks: Structure and Scheduling
COMS W4118 Prof. Kaustubh R. Joshi [email protected]
hEp://www.cs.columbia.edu/~krj/os
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 1
References: OperaVng Systems Concepts (9e), Linux Kernel Development, previous W4118s Copyright no2ce: care has been taken to use only those web images deemed by the instructor to be in the public domain. If you see a copyrighted image on any slide and are the copyright owner, please contact the instructor. It will be removed.
![Page 2: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/2.jpg)
Outline
• Basic: enough to understand FS (now) – Disk characterisVcs – Disk scheduling
• Advanced (a]er studying file systems) – Disk Reliability – RAID – SSDs/Flash
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 2
![Page 3: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/3.jpg)
Disk Structure
• Range from 30GB to 3TB per drive
• Aluminum plaEers
with magneVc coaVng
• Commonly, 2-‐5 plaEers per drive
• Common plaEer sizes: 3.5”, 2.5”, and 1.8”
• MagneVc heads
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 3
![Page 4: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/4.jpg)
The First Commercial Disk Drive 1956 IBM RAMDAC computer included the IBM Model 350 disk storage system 5M (7 bit) characters 50 x 24” plaEers Access Vme = < 1 second
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 4
![Page 5: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/5.jpg)
Disk Interface • From FS perspecVve: disk is addressed as a one dimension
array of logical sectors • Disk controller maps logical sector to physical sector
idenVfied by track #, surface #, and sector #
• Note: Old drives allowed direct C/H/S (cylinder/head/sector)
addressing by OS. Modern drives export LBA (logical block address) and do the mapping to C/H/S internally.
0 1
2
3
5 6
7
11
9
4 8
10 12 13
14
15 16
17
18
19 20
21 22
23
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 5
![Page 6: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/6.jpg)
Disk Latencies
• RotaVonal delay: rotate disk to get to the right sector
• Seek Vme: move disk arm to get to the right track
• Transfer Vme: get bits off the disk
6 4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.
![Page 7: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/7.jpg)
Seek Time
• Must move arm to the right track • Can take a while (e.g., 5– 10ms)
– AcceleraVon, coasVng, seEling (can be significant, e.g., 2ms)
7
0 1
2
3
5 6
7
11
9
4 8
10 12 13
14
15 16
17
18
19 20
21 22
23
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.
![Page 8: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/8.jpg)
Transfer Time
• Transfer bits out of disk • Actually preEy fast (e.g., 125MB/s)
8
0 1
2
3
5 6
7
11
9
4 8
10 12 13
14
15 16
17
18
19 20
21 22
23
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.
![Page 9: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/9.jpg)
I/O Time (T) and Rate (R)
• T = RotaVonal delay + seek Vme + txfer Vme
• R = Size of transfer / T
• Workload 1: large sequenVal accesses?
• Workload 2: small random accesses?
9 4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.
![Page 10: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/10.jpg)
Example: I/O Time and Rate Barracuda Cheetah 15K.5
Capacity 1TB 300GB
RotaVonal speed 7200 RPM 15000 RPM
RotaVonal latency (ms) 4.2 2.0
Avg seek (ms) 9 4
Max Transfer 105 MB/s 125 MB/s
PlaEers 4 4
Connects via SATA SCSI
• Random 4KB read • Barracuda: T = 13.2ms, R = 0.31MB/s • Cheetah: T = 6ms, R = 0.66MB/s
• SequenVal 100 MB read • Barracuda: T = 950ms, R = 105 MB/s • Cheetah: T = 800ms, R = 125 MB/s
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 10
![Page 11: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/11.jpg)
Design Vp: Use Disks SequenVally!
• Disk performance differs by a factor of 200 or 300 for random v.s. sequenVal accesses
• When possible, access disks sequenVally
11 4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.
![Page 12: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/12.jpg)
Mapping of logical sectors to physical • Logical sector 0: the first sector of the first (outermost) track of the first surface
• Logical sector address incremented within track, then tracks within cylinder, then across cylinders, from outermost to innermost
• Track skew 0 1
2
3
5 6
7
11
9
4 8
10 12 13
14
15 16
17
18
19 20
21 22
23
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 12
![Page 13: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/13.jpg)
Parallel Reading from Heads
• All heads should point to same place on track – Why not read from all heads in parallel?
• Need perfectly aligned heads – Hard to do in pracVce – Heads not perfectly aligned because of
• Mechanical vibraVons • Thermal gradients • Mechanical imperfecVons
– High density makes problem worse – Needs high throughput read/write circuitry
• Consequence: most drives have a single acVve head at a Vme
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 13
![Page 14: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/14.jpg)
Pros and cons of default mapping
• Pros – Simple to program – Default mapping reduces seek Vme for sequenVal access
• Cons – FS can’t precisely see mapping – Reverse-‐engineer mapping in OS is difficult
• # of sectors per track changes • Disk silently remaps bad sectors
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 14
![Page 15: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/15.jpg)
Disk cache
• Internal memory (8MB-‐32MB) used as cache
• Read-‐ahead: “track buffer” – Read contents of enVre track into memory during rotaVonal delay
• Write caching with volaVle memory – Write back or immediate reporVng: claim wriEen to disk when not
• Faster, but data could be lost on power failure
– Write through: ack a]er data wriEen to plaEer
15 4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.
![Page 16: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/16.jpg)
Disk scheduling
• Goal: minimize posiVoning Vme – Performed by both OS and disk itself – Why?
• OS can control: – Sequence of workload requests
• Disk knows: – Geometry, accurate posiVoning Vmes
16 4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.
![Page 17: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/17.jpg)
FCFS Disk Scheduling
Total head movement of 640 cylinders
Schedule requests in order received (FCFS) Advantage: fair Disadvantage: high seek cost and rotaVon
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 17
![Page 18: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/18.jpg)
SSTF: Shortest Seek Time First • Shortest seek Vme first (SSTF):
– Form of Shortest Job First (SJF) scheduling – Handle nearest cylinder next – Advantage: reduces arm movement (seek Vme) – Disadvantage: unfair, can starve some requests
Total head movement of 236 cylinders
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 18
![Page 19: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/19.jpg)
Elevator (aka SCAN or C-‐SCAN)
• Disk arm sweeps across disk • If request comes for a block already serviced in this sweep, queue it for next sweep
19
0 1
2
3
5 6
7
11
9
4 8
10 12 13
14
15 16
17
18
19 20
21 22
23
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.
![Page 20: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/20.jpg)
SCAN (Elevator) Disk Scheduling Make up and down passes across all cylinders
Pros: efficient, simple Cons: Unfair. Oldest requests (furthest away) also wait longest.
Total head movement of 208 cylinders
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 20
![Page 21: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/21.jpg)
C-‐SCAN
• Provides a more uniform wait Vme than SCAN
• The head moves from one end of the disk to the other, servicing requests as it goes – When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip
• Treats the cylinders as a circular list that wraps around from the last cylinder to the first one
• Total number of cylinders?
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 21
![Page 22: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/22.jpg)
C-‐SCAN (Elevator) Scheduling Head reads in one direcVon only
Wrap around without reading when end is reached (like circular linked list) Provides a more uniform wait Vme than SCAN
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 22
![Page 23: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/23.jpg)
C-‐LOOK Scheduling
• In pracVce, don’t need to scan to ends of disk • Wrap around when no more requests • Wrap to earliest outstanding request
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 23
![Page 24: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/24.jpg)
Modern disk scheduling issues
• Elevator (or SSTF) ignores rotaVon! • Shortest posiVoning Vme first (SPTF) • OS + disk work together to implement
24
0 1
2
3
5 6
7
11
9
4 8
10 12 13
14
15 16
17
18
19 20
21 22
23
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.
![Page 25: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/25.jpg)
I/O Scheduling in PracVce • Simple: Linus Elevator scheduler
– Default unVl 2.4 – Variant of C-‐LOOK algorithm – Merge new request with exisVng where possible – Otherwise, insert in sorted order between exisVng requests – If no suitable locaVon found, insert at queue tail
• In pracVce situaVon is more complicated due to – InteracVons with filesystem (data layout) – InteracVons with caches (write-‐back vs. write-‐through) – Write and read requests have different priority – Delay sensiVve applicaVons such as mulVmedia – AddiVonal algorithms: Deadline, Completely Fair Queuing (CFQ)
• Will look in a bit more depth later
4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 25
![Page 26: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/26.jpg)
Disk technology trends • Data è more dense
– More bits per square inch – Disk head closer to surface – Create smaller disk with same capacity
• Disk geometry è smaller – Spin faster è Increase b/w, reduce rotaVonal delay – Faster seek – Lighter weight
• Disk price è cheaper
• Density improving more than speed (mechanical limitaVons)
26 4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.
![Page 27: Disks:&Structure&and&Scheduling&krj/os/lectures/L18-Disks.pdf · Disk&Structure& • Range&from&30GB&to& 3TBperdrive & • Aluminum&plaers& with&magneVc& coang& • Commonly,25 plaers&per&drive&](https://reader036.fdocuments.us/reader036/viewer/2022062415/5fe0a86d76795f3a721f36e7/html5/thumbnails/27.jpg)
New mass storage technologies • New memory-‐based mass storage technologies avoid seek
Vme and rotaVonal delay – No moving parts means more reliable, shock resistant – NAND Flash: ubiquitous in mobile devices – BaEery-‐backed DRAM (NVRAM)
• Disadvantages – Price: more expensive than same capacity disk – Reliability: more likely to lose data – Other significant quirk: cant rewrite easily
• Open research quesVon: how to effecVvely use flash in commercial storage systems
• Will look in more depth later
27 4/8/13 COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs.