CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding...
-
Upload
patience-oliver -
Category
Documents
-
view
214 -
download
0
Transcript of CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding...
![Page 1: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/1.jpg)
CPSC 231 D.H. 1
Learning Objectives
• Understanding of disk versus RAM performance gap.
• Understanding definition, design goals and design problems of file structure.
• Understanding of file structure research history.
• Understanding and naming key terms used in file structure.
![Page 2: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/2.jpg)
CPSC 231 D.H. 2
Secondary Storage in Computer Systems
• Data can be stored on:• hard disks
• floppy disks
• tapes
• CD-ROMs
• ZIP and JAZZ disks
• network servers
• Most data is stored on hard disks.
![Page 3: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/3.jpg)
CPSC 231 D.H. 3
Disks
• Disks provide enormous capacity to store information.
• Disks are orders of magnitude slower than main memory (a single disk access can take a quarter of million times longer than a single RAM access).
• DISK = LARGE and SLOW and CHEAP
• RAM = SMALL and FAST
![Page 4: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/4.jpg)
CPSC 231 D.H. 4
RAM versus DiskPerformance Gap
• Example:– 120 nanoseconds to access RAM (Main Memory)
– 30 milliseconds to access disk
• Analogy:– 20 seconds versus 58 days
• CONCLUSION:– Application programs have to spend a lot of
time waiting for data to be read from the disk or to be written to the disk.
![Page 5: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/5.jpg)
CPSC 231 D.H. 5
Questions• What is a millisecond, microsecond and
nanosecond?• Millisecond = 1/1000 s
• Microsecond = 1/1000000 s
• Nanosecond = 1/1000000000 s
• How many times is RAM access faster than disk access?
• Assume • 120 nanoseconds to access RAM (Main Memory)
• 30 milliseconds to access disk
![Page 6: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/6.jpg)
CPSC 231 D.H. 6
File Structure• Definition:
– A file structure is a combination of: • representation for data in files and
• of operations for accessing the data.
– A file structure allows applications to read, write and modify data.
– A good file structure design will give an application an efficient (fast) access to the needed data.
![Page 7: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/7.jpg)
CPSC 231 D.H. 7
File Structure Design Goals
• Minimize the total disk access time • by clustering related data together
• by keeping adjacent blocks close to each other on the disk
• ideally, get all the needed data in just ONE disk access
• Maximize the total disk space utilization• disk de-fragmentation procedures
• data compression
![Page 8: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/8.jpg)
CPSC 231 D.H. 8
Files structure design problems
• One of the most difficult problems in meeting the design goals of a file structure is the fact that files are quite dynamic, i.e. they:
• grow
• shrink
• change their data
• The design goals would be easier to meet if files were static. WHY?
![Page 9: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/9.jpg)
CPSC 231 D.H. 9
Historical view of file structure design
• Early work • presumed that files were located on tapes
• access was sequential
• Recent work• most files are stored on direct access devices (s.a.
hard disks, floppy disks, CD-ROMs, ZIP disks , etc.)
• large files required indexing
• indexes and keys allowed for speedy searches of data on the disk
![Page 10: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/10.jpg)
CPSC 231 D.H. 10
File structure history cont.
• Indexed files grew and became slow to access => tree structures emerged.
• Unfortunately some trees grew very unevenly resulting in slow (almost sequential) searches => AVL trees emerged (self-adjusting binary trees)
• AVL trees grew large and required multiple disk accesses => B-trees emerged.
![Page 11: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/11.jpg)
Tree File
CPSC 231 D.H. 11
![Page 13: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/13.jpg)
B - Tree
CPSC 231 D.H. 13
![Page 14: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/14.jpg)
CPSC 231 D.H. 14
File structure history cont.• B-trees provided excellent performance for
non-sequential files but sequential access was very slow => B+ trees emerged.
• B-trees and B+ trees became the basis for many commercial file systems, since they provide access times that grows in the proportion to logkN, where N is the number of entries in the file and k is the number of entries indexed in a single block of the B-tree.
![Page 15: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/15.jpg)
B+ Trees
CPSC 231 D.H. 15
![Page 16: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/16.jpg)
CPSC 231 D.H. 16
Hashing
• Hashing is a data access mechanism that is based on converting the search key into a storage address.
• A good hashing algorithm can significantly reduce the number of disk accesses.
• Extendible hashing is a hashing that works well with files that over time undergo substantial changes in size.
![Page 18: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/18.jpg)
CPSC 231 D.H. 18
Key terms.• AVL tree - self adjusting binary tree that
can guarantee good access times for data stored in memory (but not on the disk).
• B-tree - a tree structure that provides fast access to data stored in files. B-tree does NOT have to be a binary tree.
• B+ tree - a variation of the B-tree structure that provides for fast sequential access to data as well as indexed access.
![Page 19: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/19.jpg)
CPSC 231 D.H. 19
Key Terms Cont.• File structure
– the organization of data on secondary storage devices such as disks together with operations defined for the data
• Sequential access– access of data that takes records in serial order,
looking at the first, second, and so on.
• Random access– access of data that that takes records in any
order, not necessary serial.
![Page 20: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/20.jpg)
CPSC 231 D.H. 20
Physical files and logical files.• Files are collections of related information.
• Physical files exist on secondary storage devices. Operating systems are responsible for managing physical files.
• Logical files are visible to application programs. Application programs do not know about physical locations of the files (often they do not know if the data is coming from a file or from a keyboard)
![Page 21: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/21.jpg)
CPSC 231 D.H. 21
Association between physical and logical files
• Applications have to make an association between physical and logical file names. In C++ this can be done in the following way:
• ofstream outClientFile (“clients.dat”, ios:out)
• The application can write to outClientFile while the operating system sees clients.dat
![Page 22: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/22.jpg)
CPSC 231 D.H. 22
Special Characters in Files• All computer systems have reserved a
number of characters for specific system functions.
• Examples:– Control-Z indicates often end-of-file in MS-
DOS programs– Control-D indicates often end-of-file in Unix
programs– CR (Carriage return) and LF (Line Feed)
characters together indicate end-of-line
![Page 23: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/23.jpg)
CPSC 231 D.H. 23
Directory Structures
• Files are stored in directories. Thus directories are collections of files
• Most modern systems maintain a tree directory structure:(WHY?)
![Page 24: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/24.jpg)
CPSC 231 D.H. 24
I/O Redirection
• I/O redirection allows for changing the source of input to come from a file instead of a keyboard:
– program < file /* program reads input form a file /* instead of keyboard
• I/O redirection allows for directing the output to go a file instead of the screen
– program > file /* program writes to a file instead of /* the screenRedirection
operator
![Page 25: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/25.jpg)
CPSC 231 D.H. 25
Pipes
• An output of one program can be used as an input to another program be using pipes:
• Example:– program1 | program2
Pipe operator
![Page 26: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f315503460f94c4d52a/html5/thumbnails/26.jpg)
Pipe Operator
CPSC 231 D.H. 26