Big Data in the Cloud: An example using Ansible, R, RHadoop, and AppScale to deploy a big data...
-
Upload
eucalyptus-systems-inc -
Category
Technology
-
view
1.215 -
download
5
Transcript of Big Data in the Cloud: An example using Ansible, R, RHadoop, and AppScale to deploy a big data...
Big Data in the Clouds
An example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus
Big Data Environment
●Why? Why R? Why AppScale? Why AWS/Eucalyptus?●Environments needing to process “big data” are in high-demand●Flexibility in deploying big data environments - AWS has Elastic MapReduce; Eucalyptus has ?
Goals
●Deploy open source big data environment on IaaS●Same deployment method can be used on both public and private IaaS (hybrid?)
Ansible
●http://www.ansibleworks.com/●Open Source Configuration Management using SSH●Flexible, powerful, efficient, secure●http://ansible.cc/docs/
R and RHadoop
●http://www.r-project.org/● open source statistics
software; very flexible, and powerful
●http://www.revolutionanalytics.com/
● Provides enterprise analytics software using R
●https://github.com/RevolutionAnalytics/RHadoop/wiki
AppScale
●http://www.appscale.com●PaaS that implements Google App Engine APIs on different public/private IaaS, and virtual environments.●http://www.slideshare.net/shatteredNirvana/intro-to-app-engine-and-appscale●Ships with Cloudera for back-end support of Google App Engine MapReduce API implementation
AWS EC2/Eucalyptus
●http://aws.amazon.com●Cloud API that has pretty much become a standard●http://www.eucalyptus.com●Closely follows AWS APIs for EC2, S3, IAM (soon ELB, CloudWatch, and AutoScaling)
AWS/Eucalyptus●Account/User Credentials
● EC2_ACCESS_KEY● EC2_SECRET_KEY● EC2_URL
●IAM policy for EC2 policies to launch instances, create security groups, authorize ports, image management (bundle, upload, and register)
AppScale● Pre-built AppScale Images
● AWS - ami-4e472227● Eucalyptus - AppScale
image found @ http://emis-catalog.s3.amazonaws.com/index.html
● appscale-tools - https://github.com/AppScale/appscale-tools
● appscale init cloud● edit AppScaleFile● appscale up
Ansible, R, RHadoop●Use git to grab Ansible playbook - https://github.com/hspencer77/ansible-r-appscale-playbook●Playbook installs R, and grabs rhdfs and rmr2 from RHadoop
● https://github.com/downloads/RevolutionAnalytics/RHadoop/rhdfs_1.0.5.tar.gz
● https://github.com/downloads/RevolutionAnalytics/RHadoop/rmr2_2.0.2.tar.gz
●Test deployment using wordcount program written in R - wordcount.R●SSH into head node, pull out wordcount.R file - tar zxf rmr2_2.0.2.tar.gz rmr2/tests/wordcount.R●Execute it - Rscript rmr2/tests/wordcount.R
Test - Wordcount.R