Clustering. 2 Outline Introduction K-means clustering Hierarchical clustering: COBWEB.
Clustering Technology
description
Transcript of Clustering Technology
![Page 1: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/1.jpg)
Clustering Technology
![Page 2: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/2.jpg)
Clustering Schematic
![Page 3: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/3.jpg)
Cluster Components
• Cluster hardware (processor, main memory, hard disk, …)
• Cluster network (Fast Ethernet, Gigabit Ethernet, Myrinet, …)
• Cluster Software (operating system, programming environment, …)
![Page 4: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/4.jpg)
Cluster Operating System Characteristics
• Manageability: An absolute necessity is remote and intuitive system administration; this is often associated with a Single System Image (SSI) which can be realized on different levels, ranging from a high-level set of special scripts down to real state-sharing on the OS level.
• Stability: The most important characteristics are robustness against crashing processes, failure recovery by dynamic reconfiguration, and usability under heavy load.
• Performance: The performance critical parts of the OS, such as memory management, process and thread scheduler, file I/O and communication protocols should work in as efficiently as possible.
• Extensibility: The OS should allow the easy integration of cluster-specific extensions, which will most likely be related to the inter-node cooperation. A good example for this is the MOSIX system that is based on Linux.
• Scalability: The scalability of a cluster is mainly influenced by the provision of the contained nodes, which is dominated by the performance characteristics of the interconnect.
• Support: Many intelligent and technically superior approaches in computing failed due to the lack of support in its various aspects: which tools, hardware drivers and middleware environments are available.
• Heterogeneity: Clusters provide a dynamic and evolving environment in that they can be extended or updated with standard hardware just as the user needs to or can afford. Therefore, a cluster environment does not necessarily consist of homogenous hardware.
![Page 5: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/5.jpg)
Cluster Solution
• Hardware of Cluster nodes is typical PC’s (this selection reduces the cost per performance ratio).
• Fast Ethernet was used as Cluster interconnection (17 nodes connected by Fast Ethernet infrastructure).
• Linux was choosed as Cluster OS to provide our needs as much as possible (we can recompile and tune the kernel to meet our needs).
• Using VMware software for virtualizing our computing resources.
• Message Passing Interface was selected as the parallel programming environment.
![Page 6: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/6.jpg)
Configuring Cluster
• Configuring the Cluster nodes (network configuration, packages installation, …).
• Optimizing and securing the Linux OS to extract the maximum utilization from Cluster resources.
• Cluster administration (Samba service, ssh, rlogin, rcp, administration scripts and …).
![Page 7: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/7.jpg)
Algorithm Identification
![Page 8: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/8.jpg)
Integer Factorization
![Page 9: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/9.jpg)
Sieving
![Page 10: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/10.jpg)
Trial Division
![Page 11: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/11.jpg)
QS Algorithm
![Page 12: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/12.jpg)
MPQS Algorithm
![Page 13: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/13.jpg)
SIQS Algorithm
![Page 14: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/14.jpg)
Algorithm Complexity Improvement
QS MPQS SIQS NFS
![Page 15: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/15.jpg)
OptimizingSerial Implementation
![Page 16: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/16.jpg)
Optimizing Serial Implementation
• Algorithm level optimizations (the most important step for optimizing serial codes is to reduce the complexity of algorithm maximally).
• Code level optimizations (in this phase we use
• Compiler level optimizations
![Page 17: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/17.jpg)
Algorithm Optimization
• In computation-intensive software programs, we will often find that 99% of the CPU time is used in the innermost loop.
• Identifying the most critical part of your software is therefore necessary if you want to improve the speed of computation (by profilers).
• Study the algorithm used in the critical part of your code and see if it can be improved.
![Page 18: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/18.jpg)
Innermost loop(Conventional Sieving)
![Page 19: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/19.jpg)
Pentium4 Memory Access Times
![Page 20: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/20.jpg)
An Optimized Sieving Approach (1)
![Page 21: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/21.jpg)
An Optimized Sieving Approach (2)
![Page 22: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/22.jpg)
Code level optimizationtechniques
• Loop unrolling (Unrolling amortizes the branch overhead, since it eliminates branches and some of the code to manage induction variables. Unrolling allows you to aggressively schedule (or pipeline) the loop to hide latencies).
• Function inlining (We can instruct the compiler to insert the code of a function into the code of its callers, to the point where actually the call is to be made. inlining method reduces the function-call overhead. In a compiler, inlining a function exposes more opportunity for optimization).
• gcc inline assembly (Assembly routines written as inline functions. They are handy, speedy and very much useful).
![Page 23: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/23.jpg)
• Builtin gcc functions (__builtin_prefetch) (This function is used to minimize cache-miss latency by moving data into a cache before it is accessed).
• Using “unsigned int” type only (Use 32-bit integers instead of integers with smaller sizes (16-bit or 8-bit) to reduce the machine cycles needed).
• Division-free arithmetic (change division to use multiplication by reciprocals).
• Release allocated memory blocks.• and …
![Page 24: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/24.jpg)
Loop unrolling (code level opt.)
![Page 25: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/25.jpg)
gcc compiler optimizations
![Page 26: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/26.jpg)
Parallel Algorithm Design
![Page 27: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/27.jpg)
Parallel Algorithm Design Methodology
• Partitioning (Domain decomposition or Functional Decomposition)
• Communication
• Agglomeration
• Mapping
![Page 28: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/28.jpg)
Methodical Design (1)
• Partitioning: The computation that is to be performed and the data operated on by this computation are decomposed into small tasks. Practical issues such as the number of processors in the target computer are ignored, and attention is focused on recognizing opportunities for parallel execution.
• Communication: The communication required to coordinate task execution is determined, and appropriate communication structures and algorithms are defined.
• Agglomeration: The task and communication structures defined in the first two stages of a design are evaluated with respect to performance requirements and implementation costs. If necessary, tasks are combined into larger tasks to improve
performance or to reduce development costs. • Mapping: Each task is assigned to a processor in a manner that attempts to
satisfy the competing goals of maximizing processor utilization and minimizing communication costs. Mapping can be specified statically or determined at runtime by load-balancing algorithms.
![Page 29: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/29.jpg)
Methodical Design (2)
![Page 30: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/30.jpg)
Load Balancing Mechanism
• For load balancing Master/Slave mechanism was used.
• Master node sends the initial data and assign the jobs to slave nodes.
![Page 31: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/31.jpg)
Data Decomposition Algorithm
• Using SPMD model (SPMD program that creates exactly one task per processor).
• We can sieve with multiple polynomials in SIQS.• To generate these polynomials, we must first
compute ‘a’ factors.• Sieving with separated ‘a’ values can be done
independently on different processors.• Thus we need to build ‘a’ values on several
tasks without any coordination.• Duplicated ‘a’ factor conclude to weak
concurrency.
![Page 32: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/32.jpg)
Data Decomposition Algorithm (1)(Initialization Data)
![Page 33: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/33.jpg)
Data Decomposition Algorithm (2)(Determining the number
and size of ‘a’ value’s factors)
![Page 34: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/34.jpg)
Data Decomposition Algorithm (2)(Determining the number
and size of ‘a’ value’s factors)
![Page 35: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/35.jpg)
Data Decomposition Algorithm (3)(Computing the factors of ‘a’ values)
![Page 36: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/36.jpg)
Data Decomposition Algorithm (3)(Computing the factors of ‘a’ values)
![Page 37: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/37.jpg)
Master Node Algorithm
![Page 38: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/38.jpg)
Slave Node Algorithm
![Page 39: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/39.jpg)
Double Large PrimeVariation effects
![Page 40: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/40.jpg)
Master Node Algorithm (1)(Improved version)
![Page 41: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/41.jpg)
Master Node Algorithm (2) (Improved version)
![Page 42: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/42.jpg)
Slave Node Algorithm (Improved version)
![Page 43: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/43.jpg)
Cluster Benchmarks
![Page 44: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/44.jpg)
Performance Evaluation
• Speedup: Amdahl’s law gives the ideal speedup Sp:
• Efficiency: The efficiency, p, of a p-node computation with speed-up Sp is given by:
![Page 45: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/45.jpg)
Total Execution Time
![Page 46: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/46.jpg)
Sieving Execution Time
![Page 47: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/47.jpg)
Total Speedup
![Page 48: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/48.jpg)
Sieving Speedup
![Page 49: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/49.jpg)
Total Efficiency
![Page 50: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/50.jpg)
Sieving Efficiency
![Page 51: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/51.jpg)
Sources of inefficiency
• Static load balancing (overestimate our needs and waste Cluster resources).
• Communication overhead (Ethernet protocol and tcp/ip stack operations).
• Load imbalance.• Non parallelized stages of the application such
as Linear algebra stage (Admahl’s law).• Inefficient software and hardware technologies
(MPI overhead and OS inefficiencies).
![Page 52: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/52.jpg)
Conclusions and Future works
• The distributed computing by Cluster technology is an open problem yet. The Mosix project is one of the most successful solutions that builds an single system image and concrete Linux kernels to simulate one single system for running processes.
• But Mosix and other similar solutions not work optimally for every desired application.
• In fact supplying a SSI at the operating system level, while a definite boon in terms of manageability, drastically inhibits scalability.
• The availability of the source code in conjunction with the possibility to extend (and thus modify) the operating system on this base. This property has a negative influence on the stability and manageability of the system: over time, many variants of the operating system will develop, and the different extensions may conflict when there is no single supplier.
![Page 53: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/53.jpg)
Conclusions and Future works
• In this thesis our goal was to present an global approach to run computational applications on distributed systems optimally.
• Algorithm optimization is the major step in presented approach. In the computation-intensive software programs, we must in first step identify the most critical part of the software and try to optimize this algorithm as much as possible (most efficient step).
• In the second phase we implement the best optimized algorithm serially to exploit the maximum usage of local resources. In fact we do distributed computing when local computation cost is bigger than remote execution).
• In the last stage, parallelize the serial algorithm optimally. You must focus on most computational part of the serial algorithm and try to parallelize this part efficiently.
![Page 54: Clustering Technology](https://reader035.fdocuments.us/reader035/viewer/2022062518/568149b6550346895db6ee8d/html5/thumbnails/54.jpg)
Conclusions and Future works
• The performance benchmarks shows that our strategy works well, and therefore we can apply this method to every other application similarly.
• Using NFS algorithm for RSA key cracking.
• Parallelize linear algebra stage for factorization of very large numbers (greater than 120 digits).