Platform as a service standard for hadoop environment

26
Platform as a Service Standard for Hadoop Environment Presented By : Abhay Nitin Pai

Transcript of Platform as a service standard for hadoop environment

Page 1: Platform as a service standard for hadoop environment

Platform as a Service Standard for Hadoop Environment

Presented By : Abhay Nitin Pai

Page 2: Platform as a service standard for hadoop environment

Content

Why PaaS Standard for Hadoop ?

Literature Survey

My Proposal

Conclusion

References

Page 3: Platform as a service standard for hadoop environment

Why PaaS standard for Hadoop ?• Enterprise Level data is increasing rapidly, especially by Cloud Applications

• Storing, Managing and to perform a Mining on the Enterprise Data need tools like Hadoop

• Setting up Hadoop can become a tedious task.

• Currently Hadoop Environment is provided by AWS known as Amazon Elastic Map Reduce

• Other IaaS providers are yet to bring out Hadoop Environment as a service to the customers

• Thus this would be a right time to make out a standard for deploying HadoopEnvironment

Page 4: Platform as a service standard for hadoop environment
Page 5: Platform as a service standard for hadoop environment
Page 6: Platform as a service standard for hadoop environment

Literature Survey • Cloud Storage vs Traditional Storage [2]

• Cloud Storage provides High Performance computation, Transaction, Processing application and Multiple types of network storage services

• Cloud Storage mode can provide High security, High reliability, High Efficiency, Suitable for handling Large Scale users and complex Business network environment

• Cloud Storage mode not only provide traditional file access methods but also can support massive data management and provide public services support functions to facilitate cloud data storage system data Management and Maintenance.

Page 7: Platform as a service standard for hadoop environment

[2]

Page 8: Platform as a service standard for hadoop environment

[2]

Page 9: Platform as a service standard for hadoop environment
Page 10: Platform as a service standard for hadoop environment

[3]

Page 11: Platform as a service standard for hadoop environment

• MapReduce algorithm was developed by google and implemented by Yahoo. Inc. [1]

• Yahoo created their own Hadoop environment with 3500 Nodes and 25000 VM’s; on which they ran a MapReduce algorithm over 25 PB of data [1]

(1 000 000 000 000 000) Bytes !

• Well known Virtualization Tools are : Xen Hypervisor, VMWare Vspeare, KVM, Microsoft’s HyperV [1]

• Performance depends on number of VM’s, not on Physical Servers : Because of I/O controller on VM Hypervisor [1]

Page 12: Platform as a service standard for hadoop environment

• Performance of I/O intensive jobs is more sensitive to the virtualization overhead than that of CPU-Intensive [1]

• For I/O intensive jobs the best practice is to increase the number of VM’s [1]

• KVM has about : 7 % write and 0 % read degrade [1]• Xen has about : 15 % of degrade in both read and write [1]• VMWare reported some unknown performance improvement [1]• The point is………..

Page 13: Platform as a service standard for hadoop environment

My Proposal

• Server Side Daemon Process• Client Side Common Architecture• Cloud Controller• XML SOAP like Request and Response with CLI and

API features• VM Templates• Scripts for VM• Job Store• Job Templates• Benchmark Reports

Architectural Components

Page 14: Platform as a service standard for hadoop environment

Conclusion

• As of now, Hadoop is just another Platform to be provided• In future, we can take any kind of applications and provide it as a PaaS• This Architecture for PaaS will give a general standard for developing PaaS Clusters• For a company or an individual, setting up a platform to deploy application is a

tedious job• Also testing out each and every PaaS service by various cloud providers is not a

feasible task• Shifting of PaaS would lead to porting to the entire application• Thus a common architecture would fulfill the need

Page 15: Platform as a service standard for hadoop environment

References1. “Design and Performance Evaluation for Hadoop Clusters on Virtualized

Environments” by Masakuni Ishii, Jungkyu Han and Hiroyuki Makino, ICOIN 2013, IEEE 978-1-4673-5742-5

2. ”Research on Hadoop Based Enterprise File Cloud Storage System”, Da-Wei Zhan,Fu-Quan Sun,Xu Cheng and Chao Liu

3. “MapReduce : Simplified Data Processing on Large Clusters”, by Jeffrey Dean and Sanjay Ghemawat, Google Inc.

4. https://developer.yahoo.com/hadoop/tutorial/

Page 16: Platform as a service standard for hadoop environment
Page 17: Platform as a service standard for hadoop environment

Some Interesting Facts

Page 18: Platform as a service standard for hadoop environment
Page 19: Platform as a service standard for hadoop environment
Page 20: Platform as a service standard for hadoop environment
Page 21: Platform as a service standard for hadoop environment
Page 22: Platform as a service standard for hadoop environment
Page 23: Platform as a service standard for hadoop environment
Page 24: Platform as a service standard for hadoop environment
Page 25: Platform as a service standard for hadoop environment
Page 26: Platform as a service standard for hadoop environment