Allocation Methods

SURVEY ON ALLOCATION METHODSIntroductionDisk space in secondary storage used for physical storage of program for logical execution in main memory. Operating system allocates space for program in physical memory. The allocation of disk space has various phenomenon like search for range of space, indexing on data disk, retrieving and ordering of content which enables efficient utilization of disk space. Much of research work has been carried out earlier in terms of ordered pair of sets like buddy systems.Earlier work carried out on Contiguous allocation, Linked allocation and Indexed allocation where major contribution is on defining data structure and methods to enhance the effective utilization of disk space. The work carried on various allocation methods since 1970 given in detail as shown below: Prior to 1970, Disc allocation was administrated by means of a map held in core, with one bit per disc page managed by a method found by J.L. Smith.B. J. Austin retains all of Smith's procedure, except that the technique for searching the map has been improved. Allocation was dynamic that space given to file only as needed, not being reserved in advance. List also maintained a list of hole's in the disc map. This list was added to whenever a search for allocation found a string of adjacent free pages whose length was less than required for request in hand.File loading experiments have shown that proposed allocation algorithm has 80 tables when compared to previous allocation algorithms has 540 page tables. A programmer may produce a file in which the logical addressing space is non-contiguous. Such a file always has a page table. Time for file loading process has been considerably reduced due to causes; the number of disc operations is reduced, the time spent searching the map is reduced.The dumping time and reloading of file storage takes a little less than hours of machine time- The load takes about three quarters of an hour and the dump, which involves verification of the tapes produced, takes about an hour. With previous algorithm the number of page tables immediately after file load was 500 to 1,000 and during a day additional 100 to 200 would be formed.The System no longer spends a large amount of time searching the allocation map when the disc is practically full, but no quantitative evaluation of this effect is available. Algorithm seems to be satisfactory for both file loading and normal operations.D.E. Gold et.,al proposed a basic permutation algorithm and its variations. Their work deals with masking rotational latency. Algorithm for permutation of blocks in memory hierarchy has been proposed like Slow-to Fast Transition algorithm, Fast to Slow transition algorithm, Special cases of Permutation are addition and deletion of blocks. Arbitrary permutation algorithm, Standard permutation algorithm Permutation assignment algorithms were proposed. The preceding method is implemented for a head per- track disk system by performing row permutations in intermediate (bulk storage) random access memory as described in the earlier example. The first row permutation is accomplished by accumulating blocks in an output buffer as they come from the primary memory. When a row is accumulated, these blocks start to be output to the disk in their permuted order. The final row permutations are performed similarly by accumulating a row of blocks in the input buffer before transmission to primary memory in permuted order. Howard Lee Morgan proposed an optimization model for the assignment of flies to disk packs, and packs to either resident or nonresident status is presented. Heuristics are suggested for those cases in which it is inefficient to compute the actual optimum.When the amount of space required for file storage exceeds the amount which can be kept online, decisions must be made as to which files are to be permanently resident and which mountable. These decisions will affect the number of mount requests issued to the operators. This is often a bottleneck in a computing facility, and reducing the number of mounts thus decreases turnaround time. In summary, it is clear that the problem of optimally using disk storage devices is a complex one, effects contributed by queuing and scheduling as well as space allocation. This paper has attempted to describe where space allocation may fit in, and to prescribe some methods for handling this important problem. Test cases of multiple Knapsack algorithms executed.Kenneth K. Shen et...al, presented an extension of the buddy method, called the weighted buddy method, for dynamic storage allocation is presented. The weighted buddy method allows block sizes of 2k and 3.2k, whereas the original buddy method allowed only block sizes of 2k. This extension is achieved at an additional cost of only two bits per block. Simulation results are presented which compare this method with the buddy method. These results indicate that, for a uniform request distribution, the buddy system has less total memory fragmentation than the weighted buddy algorithm. However, the total fragmentation is smaller for the weighted buddy method when the requests are for exponentially distributed block sizes.James A. Hinds proposed a simple scheme for the determination of the location of a block of storage relative to other blocks is described. This scheme is applicable to buddy type storage allocation system.Warren Burton worked on generalization of the buddy system for storage allocation. The set of permitted block sizes {SIZEi}i=0 to n must satisfy the condition SIZE~ SIZE~_I ~- SIZE~_k~) where k may be any meaningful integral-valued function. This makes it possible to force logical storage blocks to coincide with physical storage blocks, such as tracks and cylinders.James L.Peterson presented two algorithms for implementing any of a class of buddy systems for dynamic storage allocation. Each buddy system corresponds to a set of recurrence relations which relate the block sizes provided to each other. Analyses of the internal fragmentation of the binary buddy system, the Fibonacci buddy system, and the weighted buddy system are given. Comparative simulation results are also presented for internal, external, and total fragmentation.The total fragmentation of the weighted buddy system is generally worse than that of the Fibonacci buddy binary system. The total fragmentation of the binary buddy Fibonacci system varies widely because of its internal fragmentation characteristics. Still the variation among these buddy systems is not great, and the lower execution time of the binary buddy would therefore seem to recommend it for general use, although the execution time of the Fibonacci buddy system is not much greater. The weighted buddy system seems to be less desirable than either the binary or the Fibonacci system owing to its higher execution time and greater external fragmentation, In conclusion then, we would recommend that the Fibonacci memory management module of a system be constructed as either a binary or Fibonacci buddy systemH. C. DU etal presented cartesian product files that have been shown to exhibit attractive properties for partial match queries. This paper considers the file allocation problem for Cartesian product files, which can be stated as follows: Given a k-attribute Cartesian product file and an m-disk system, allocate buckets among the m disks in such a way that, for all possible partial match queries, the concurrency of disk accesses is maximized. The Risk Modulo (DM) allocation method is described first, and it is shown to be strict optimal under many conditions commonly occurring in practice, including all possible partial match queries when the number of disks is 2 or 3. It is also shown that although it has good performance, the DM allocation method is not strict optimal for all possible partial match queries when the number of disks is greater than 3. The General Disk Modulo (GDM) allocation method is then described, and a sufficient but not necessary condition for strict optimality of the GDM method for all partial match queries and any number of disks is then derived. Simulation studies comparing the DM and random allocation methods in terms of the average number of disk accesses, in response to various classes of partial match queries, show the former to be significantly more effective even when the number of disks is greater than 3, that is, even in cases where the DM method is not strict optimal. The results that have been derived formally and shown by simulation can be used for more effective design of optimal file systems for partial match queries. When considering multiple-disk systems with independent access paths, it is important to ensure that similar records are clustered into the same or similar buckets, while similar buckets should be dispersed uniformly among the disks.Free disk space Management Matthew S. Hecht et..al, Schemes for managing free disk pages are so widely known that they must be considered part of the folklore of computer science. Two popular data structures are bitmaps and linked lists. The bitmap has bit position i set when disk page number i is free, and cleared when disk page number i is in use. The linked list contains the page numbers of free disk pages. Less widely known, however, are variations of such schemes that preserve the consistency of these data across failures. While recoverable schemes for managing free disk pages are all based on the principle of maintaining a copy (complete, or incremental with a base) on memory media with an independent failure mode, the details of such schemes vary considerably. The general problem we consider here is how to make the free-disk-space data structures survive two kinds of failures: (1) failure of main memory (e.g., loss of power) resulting in loss of its contents, and (2) failure of a disk transfer resulting in an unreadable disk page. This paper presents a programming technique, using a linked list for managing the free disk pages of a file system and using shadowing (also known as careful replacement [lo]) for failure recovery, which enjoys the following properties:(1) The state of allocation at the previous checkpoint (a consistent system state preserved on disk) is always maintained on the disk.(2) The data structure describing free space on disk is never copied during a checkpoint or recover (from a main-memory failure) operation.(3) A window of only two pages of main memory is required for accessing and maintaining the data describing free space.(4) System information need be written to disk only during a checkpoint, rather than every time it changes.Lorie [7] describes a scheme similar to ours that uses bitmaps and shadowing. Gray [l, 21 describes the update-in-place with logging paradigm that can be applied to the problem of managing free disk pages across failures. Sturgis, Mitchell, and Israel [9] (see also Mitchell and Dion [8]) describe an abstraction called stable storage whereby a page-allocation bitmap is recorded redundantly on disk; the second page is not written until the first has been written successfully.Distributed File System Bruce Walker et..al, presented LOCUS Is a distributed operating system which supports transparent access to data through a network wide flle system, permits automatic replication of storage supports transparent distributed process execution, supplies a number of high reliability functions such as nested transactions, and is upward compatible with Unix. Partitioned operation of subnet and their dynamic merge is also supported. The system has been operational for about two years at UCLA and extensive experience In its use has been obtained. The complete system architecture is outlined in this paper, and that experience is summarized. The most obvious conclusion to be drawn from the LOCUS work is that a high performance, network transparent, distributed file system which contains all of the various functions indicated throughout this paper, is feasible to design and implement, even in a small machine environment. Replication of storage is valuable, both from the user and the system's point of view. However, much of the work is in recovery and in dealing with the various races and failures that can exist. Nothing is free. In order to avoid performance degradation when resources are local, the cost has been converted into additional code and substantial care in implementation architecture. LOCUS is approximately a third bigger than Unix and certainly more complex. The difficulties involved in dynamically reconfiguring an operating system are both intrinsic to the problem, and dependent on the particular system. Rebuilding lock tables and synchronizing processes running in separate environments are problems of inherent difficulty. Most of the system dependent problems can be avoided, however, with careful design. The fact that LOCUS uses specialized protocols for operating system to operating system communication made it possible to control message traffic quite selectively. The ability to alter specific protocols to simplify the reconfiguration solution was particularly appreciated. The task of developing a protocol by which sites would agree about the membership of s partition proved to be surprisingly difficult. Balancing the needs of protocol synchronization and failure detection while maintaining good performance presented a considerable challenge. Since reconfiguration software is run precisely when the network is flaky, those problems are real, and not events that are unlikely. Nevertheless, it has been possible to design and implement a solution that exhibits reasonably high performance. Further work is still needed to assure that scaling to a large network will successfully maintain that performance characteristic, but our experience with the present solution makes us quite optimistic. In summary, however, use of LOCUS indicates the enormous value of a highly transparent, distributed operating system. Since file activity often is the dominant part of the operating system load, it seems clear that the LOCUS architecture, constructed on a distributed file system base, is rather attractive. PHILIP D. L. KOCH presented the buddy system is known for its speed and simplicity. However, high internal and external fragmentation have made it unattractive for use in operating system file layout. A variant of the binary buddy system that reduces fragmentation is described. Files are allocated on up to t extents, and inoptimally allocated files are periodically reallocated. The Dartmouth Time-Sharing System (DTSS) uses this method. Several installations of DTSS, representing different classes of workload, are studied to measure the methods performance. Internal fragmentation varies from 2-6 percent, and external fragmentation varies from O-10 percent for expected request sizes. Less than 0.1 percent of the CPU is spent executing the algorithm. In addition, most files are stored contiguously on disk. The mean number of extents per file is less than 1.5, and the upper bound is t. Compared to the tile layout method used by UNIX, the buddy system results in more efficient access but less efficient utilization of disk space. As disks become larger and less expensive per byte, strategies that achieve efficient I/O throughput at the expense of some storage loss become increasingly attractive. PHILIP D. L. KOCH presented that the purpose of a distributed file system (DFS) is to allow users of physically distributed computers to share data and storage resources by using a common file system. A typical configuration for a DFS is a collection of workstations and mainframes connected by a local area network (LAN). A DFS is implemented as part of the operating system of each of the connected computers. This paper establishes a viewpoint that emphasizes the dispersed structure and decentralization of both data and control in the design of such systems. It defines the concepts of transparency, fault tolerance, and scalability and discusses them in the context of DFSs. The paper claims that the principle of distributed operation is fundamental for a fault tolerant and scalable DFS design. It also presents alternatives for the semantics of sharing and methods for providing access to remote files. A survey of contemporary UNIX@-based systems, namely, UNIX United, Locus, Sprite, Suns Network File System, and ITCs Andrew, illustrates the concepts and demonstrates various implementations and design alternatives. Based on the assessment of these systems, the paper makes the point that a departure from the approach of extending centralized file systems over a communication network is necessary to accomplish sound distributed file system design.Khaled A. S. Abdel-Ghaffar, presented a coding-theoretic analysis of the disk allocation problem. We have shown the equivalence of the problem of strictly optimal disk allocation and the class of MDS codes. One main open problem in this area is the development of tight necessary and sufficient conditions for the existence of optimal disk allocation [8]. These results formalize the intuitive ideas developed by Faloutsos and Metaxas [6], as well as extend and generalize several other previous results, especially those presented by Sung [18]. Using coding theory, we have determined this minimum number for binary Cartesian product files that have up to 16 attributes, assuming that the number of disks is a power of 2. Sunil Prabakar et..al, presented a new scheme which provides good declustering for similarity searching. In particular, it does global declustering as opposed to local declustering, exploits the availability of extra disks and does not limit the partitioning of the data space. Our technique is based upon the Cyclic declustering schemes which were developed for range and partial match queries. We establish, in general, that Cyclic declustering techniques outperform previously proposed techniques. The problem of efficient similarity searching is becoming important for databases as non-textual information is stored. The problem reduces to one of finding nearest-neighbors in high-dimensional spaces. In this paper, a new disk allocation method for declustering high-dimensional data to optimize nearest-neighbor queries is developed. The new scheme, called cyclic allocation, is simple to implement and is a general allocation method in that it imposes no restrictions on the partitioning of the data space. Furthermore, it exploits the availability of any number of disks to improve performance. Finally, by varying the skip values the method can be adapted to yield allocations that are optimized for various criteria. We demonstrated the superior performance of the cyclic approach compared to existing schemes both those that were originally designed for range queries (FX, DM and HCAM) as well as those designed specifically for nearest-neighbors (NOD). The FX and DM schemes are found to be inappropriate for nearest-neighbor queries. HCAM performs reasonably well for odd numbers of disks, but extremely poorly for even numbers. NOD was found not to achieve as much parallelism as Cyclic for most cases, except when retrieving only direct neighbors with a small number of disks. NOD also has the potential to give better performance for some dimensions when the number of disks is close to that required to achieve near-optimality. On the other hand, NOD is restricted to 2-way partitioning of each dimension, and its cost remains the same even when more disks beyond those required for near-optimal declustering are available. This results in a saturation of the gains produced by NOD beyond this point. In contrast, the Cyclic approach is not restricted to 2-way partitioning and makes use of all available disks. In fact, its cost tracks the lower bound and reduces as the number of disks increases. Overall we observe that the Cyclic scheme gives the best performance for nearest-neighbor queries more consistently than any other scheme. Given the success of the cyclic schemes for two dimensional range queries [PAAE98], and the flexibility for nearest-neighbor queries, we expect that it will give good performance for systems that require both types of queries. AuthorMethodologiesResults

B.J.AustinAllocation map a dynamic allocation algorithm 80 tables compared to previous allocation algorithm has 540 tables

D.E. Gold et..al,Permutation algorithmArbitrary permutation algorithm, Standard permutation algorithm Permutation assignment algorithmsLatency may be masked using a small amount of buffer memory

Howard Lee Morgan et..alMultiple Knapsack algorithmMinimizes the expected number of mount requests which must be satisfied to process a set of jobs.

Kenneth K. Shen et...alWeighted Buddy MethodUniform request distribution, the buddy system has less total memory fragmentation than the weighted buddy algorithm

James A. Hinds.Relative block storage scheme

Warren BurtonGeneralization of the buddy system

James L.PetersonFibonacci buddy systemThe total fragmentation varies because of internal characteristics

H. C. DU etalCartesian product filesEffective design of optimal file system for partial match queries.

Phlip D. L. KocVariant of binary buddy systemReduces fragmentation.

Khaled A. S. Abdel-GhaffarCoding theoretic analysis of disk allocation problemResponse time of disk has improved.

Sunil Prabakar et..al,Cyclic de clustering Cyclic scheme gives best performance for nearest neighbor queries more consistently than any other scheme.

Table 1 Research work on disk allocation algorithm*The reference [1] [13] are on disk allocation methods, the rest of papers are on file systems.REFERENCES1. Gold, D. E., & Kuck, D. J. (1974). A model for masking rotational latency by dynamic disk allocation. Communications of the ACM, 17, 278288. doi:10.1145/360980.361006.2. Morgan, H. L. (1974). Optimal space allocation on disk storage devices. Communications of the ACM, 17(3), 139142. doi:10.1145/360860.360867.3. Shen, K. K., & Peterson, J. L. (1974). A weighted buddy method for dynamic storage allocation. Communications of the ACM, 17, 558562. doi:10.1145/355620.3611644. Burton, W. (1976). A buddy system variation for disk storage allocation. Communications of the ACM, 19(7), 416417. doi:10.1145/360248.3602595. Peterson, J. L., & Norman, T. a. (1977). Buddy systems. Communications of the ACM, 20, 421431. doi:10.1145/359605.3596266. Du, H. C., & Sobolewski, J. S. (1982). Disk allocation for Cartesian product files on multiple-disk systems. ACM Transactions on Database Systems, 7(1), 82101. doi:10.1145/319682.3196987. Hecht, M. S., & Gabbe, J. D. (1983). Shadowed management of free disk pages with a linked list. ACM Transactions on Database Systems, 8(4), 503514. doi:10.1145/319996.3200028. Distributed Operating System I Bruce Walker , Gerald Popek , Robert English , Charles Kline and Greg Thiel 2 University of California at Los Angeles. (1983), 4970.9. Koch, P. D. L. (1987). Disk file allocation based on the buddy system. ACM Transactions on Computer Systems, 5(4), 352370. doi:10.1145/29868.2987110. Levy, E., & Silberschatz, a. (1990). Distributed File Systems: Concepts and Examples, 22(4), 321374.11. Abdel-Ghaffar, K. a. S., & El Abbadi, A. (1993). Optimal disk allocation for partial match queries. ACM Transactions on Database Systems, 18(1), 132156. doi:10.1145/151284.15128812. Krieger, O., & Stumm, M. (1997). HFS: a performance-oriented flexible file system based on building-block compositions. ACM Transactions on Computer Systems, 15(3), 286321. doi:10.1145/263326.26335613. Prabhakar, S., Agrawal, D., El, A., & Barbara, S. (1998). Efficient Disk Allocation for Fast Similarity Searching Divyakant Agrawal Amr El Abbadi University of California, 7887.14. Hess, C. K., & Campbell, . R. H. (2003). An application of a context-aware file system, 339352. doi:10.1007/s00779-003-0250-y15. Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The Google file system. ACM SIGOPS Operating Systems Review, 37, 29. doi:10.1145/1165389.94545016. Gal, E., & Toledo, S. (2005). Algorithms and data structures for flash memories. ACM Computing Surveys, 37(2), 138163. doi:10.1145/1089733.108973517. Sivathanu, M., Prabhakaran, V., Arpaci-Dusseau, A. C., & Arpaci-Dusseau, R. H. (2005). Improving storage system availability with D-GRAID. ACM Transactions on Storage, 1(2), 133170. doi:10.1145/1063786.1063787.18. Kang, S., & Reddy, a. L. N. (2006). An approach to virtual allocation in storage systems. ACM Transactions on Storage, 2(4), 371399. doi:10.1145/1210596.121059719. Wang, A.-I. A., Kuenning, G., Reiher, P., & Popek, G. (2006). The Conquest file system. ACM Transactions on Storage, 2(3), 309348. doi:10.1145/1168910.116891420. Agrawal, N., Bolosky, W. J., Douceur, J. R., & Lorch, J. R. (2007). A Five-Year Study of File-System Metadata, 3(3).21. Cipar, J., Corner, M. D., & Berger, E. D. (2007). Contributing storage using the transparent file system. ACM Transactions on Storage, 3(3), 12es. doi:10.1145/1288783.128878722. Batsakis, A., Burns, R., Kanevsky, A., Lentini, J., & Talpey, T. (2009). Ca-Nfs. ACM Transactions on Storage, 5(4), 124. doi:10.1145/1629080.162908523. Thomasian, A., & Blaum, M. (2009). Higher reliability redundant disk arrays. ACM Transactions on Storage, 5(3), 159. doi:10.1145/1629075.162907624. Ryu, S., Lee, C., Yoo, S., & Seo, S. (2010). Flash-aware cluster allocation method based on filename extension for FAT file system. Proceedings of the 2010 ACM Symposium on Applied Computing - SAC 10, 502. doi:10.1145/1774088.177419225. Jung, J., Won, Y., Kim, E., Shin, H., & Jeon, B. (2010). Frash. ACM Transactions on Storage, 6(1), 125. doi:10.1145/1714454.171445726. Shin, D. I., Yu, Y. J., Kim, H. S., Eom, H., & Yeom, H. Y. (2011). Request Bridging and Interleaving. ACM Transactions on Storage, 7(2), 131. doi:10.1145/1970348.197034927. Wu, X., & Reddy, a. L. N. (2011). SCMFS: A file system for Storage Class Memory. 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 9(3), 111. doi:10.1145/2063384.206343628. Hsieh, J.-W., Wu, C.-H., & Chiu, G.-M. (2012). Mftl. ACM Transactions on Storage, 8(2), 129. doi:10.1145/2180905.218090829. Paulo, J., & Pereira, J. (2014). A Survey and Classification of Storage Deduplication Systems. ACM Computing Surveys, 47(1), 130. doi:10.1145/2611778

9

Allocation Methods

Documents

Transcript of Allocation Methods