Cloud Computing: What it is, DOs and DON'Ts
-
Upload
svet-ivantchev -
Category
Technology
-
view
1.380 -
download
1
description
Transcript of Cloud Computing: What it is, DOs and DON'Ts
Cloud Computing: What it is, DOs and DON'Ts
Svet Ivantchev, eFaber
Fourth Workshop on Advanced Computing Techniques in the Microworld,
April 2011
domingo 1 de mayo de 2011
Our plan for today
• What Is Cloud Computing?
• Enabling technologies
• Public vs Private Clouds
• Idea of MapReduce with two examples
domingo 1 de mayo de 2011
Our plan for tomorrow
• Create a HPC cluster with:
• 184 GB RAM
• 13 TB local disk space and 800 GB persistent storage
• 64 cores @ 2.9 GHz, Intel Nehalem = 268 ECUs (~268 2007 1.2 GHz Xeons)
• 10 GB network connection between them
domingo 1 de mayo de 2011
(Kind of) Evolution
• Grid Computing
• Utility Computing
• Cloud Computing
• Software as a Service (SaaS)
domingo 1 de mayo de 2011
Grid Computing
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that
involve a large number of files.
http://en.wikipedia.org/wiki/Grid_computing
domingo 1 de mayo de 2011
Utility Computing
Utility Computing is the packaging of computing resources, such as computation, storage and services, as a metered service similar to a traditional public utility (such as electricity, water, natural gas, or telephone network).
http://en.wikipedia.org/wiki/Utility_computing
domingo 1 de mayo de 2011
Cloud Computing
McKinsey & Co. Report
domingo 1 de mayo de 2011
Cloud Computing
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service
provider interaction.
NIST
domingo 1 de mayo de 2011
Cloud Computing
1. The illusion of infinite computing resources... 2. The elimination of an up-front commitment...
3. The ability to pay for use ... as needed.
UC Berkeley RAD Labs
domingo 1 de mayo de 2011
So, what it is?
• Pay-per-use
• Resources are abstracted (virtualized)
• Upscale and downscale on demand
• Self service interface (API included)
domingo 1 de mayo de 2011
Enabling technologies
• Virtualisation
• Virtualised Storage
• Web Services
domingo 1 de mayo de 2011
Virtualisation
• Xen
• KVM
• WMware
• more...
domingo 1 de mayo de 2011
Abstracted Storage
• Distributed File Systems; examples:
• Amazon S3
• RackSpace’s CloudFiles
• HDFS
domingo 1 de mayo de 2011
Stack
Software as a Service (SaaS)
Platform as a Service (PaaS)
Infrastructure as a Service (IaaS)
Cloud Enabler(s)
Hardware
domingo 1 de mayo de 2011
Public Cloud Services
• Amazon EC2
• RackSpace
• 100s more ...
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
Amazon Web Services (AWS)
domingo 1 de mayo de 2011
AWS EC2 Prices
• on demand instances
• reserved instances
• spot instances
domingo 1 de mayo de 2011
AWS EC2 prices
domingo 1 de mayo de 2011
Spot Instances
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
Private
• Eucalyptus
• OpenNebula
• Nimbus
• OpenStack
• Hadoop & friends
domingo 1 de mayo de 2011
Public or private?Better mixed
domingo 1 de mayo de 2011
MapReduce
• High level vs low level languages
• Example: MPI/PVM vs MapReduce
domingo 1 de mayo de 2011
MRs “Hello world” Unix-style
“en un lugar de la Mancha de cuyo nombre no quiero acordarme no ha mucho tiempo que vivía un hidalgo ...”
$ cat i.txt | tr ' ' '\n' | sort | uniq -c
1 Mancha 1 acordarme 1 cuyo 2 de ...
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
Google Books
• 129 000 000 books are publshed so far
• 15 000 000 books scanned (1700-2010)
• 5 000 000 classified and with metadataScience, Vol. 331, no 6014, pp. 176-182 (Jan 14, 2011):
domingo 1 de mayo de 2011
http://ngrams.googlelabs.com/
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
MapReduce
map: (k1, v1) ! list (k2, v2)
reduce: (k2, list(v2)) ! list (v2)
domingo 1 de mayo de 2011
MapReduce: Mapper
map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, 1);
“en un lugar de la Mancha de cuyo nombre no quiero acordarme no ha mucho tiempo que vivía un hidalgo”
“en”, 1“un”, 1 “lugar”, 1 “de”, 1 “la”, 1 “Mancha”, 1 “de”, 1...
domingo 1 de mayo de 2011
MapReduce: Reducer
reduce(String key, Iterator values): // key: a word // values: a list of counts result = 0; for each v in values: result += v; Emit(result);
“en”, [1] “un”, [1,1] “lugar”, [1] “de”, [1] ...
“en”, 1“un”, 2 “lugar”, 1 “de”, 1 ...
domingo 1 de mayo de 2011
Dean, J and Ghemawat, S, Comm. ACM, Vol 51, pp. 107--113, (2008)
domingo 1 de mayo de 2011
Our input
$ ls -l donquijote_s?.txt-rw-r--r-- 1 svet staff 1037413 23 abr 18:26 donquijote_s1.txt-rw-r--r-- 1 svet staff 1099078 23 abr 18:22 donquijote_s2.txt
$ head -6 donquijote_s1.txt
El ingenioso hidalgo don Quijote de la Mancha
TASA
Yo, Juan Gallo de Andrada, escribano de Camara del Rey nuestro senor, de los que residen en su Consejo, certifico y doy fe que, habiendo visto por los senores del un libro
domingo 1 de mayo de 2011
Python Mapper
#!/usr/bin/python import sysimport re def main(argv): line = sys.stdin.readline() pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*") try: while line: for word in pattern.findall(line): print "LongValueSum:" + word.lower() + "\t" + "1" line = sys.stdin.readline() except "end of file": return Noneif __name__ == "__main__": main(sys.argv)
domingo 1 de mayo de 2011
Test the mapper
$ cat donquijote_s1.txt | ./wsplit.py
LongValueSum:el 1LongValueSum:ingenioso 1LongValueSum:hidalgo 1LongValueSum:don 1LongValueSum:quijote 1LongValueSum:de 1LongValueSum:la 1LongValueSum:mancha 1LongValueSum:tasa 1LongValueSum:yo 1LongValueSum:juan 1LongValueSum:gallo 1LongValueSum:de 1LongValueSum:andrada 1
domingo 1 de mayo de 2011
Preparing the S3
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
Run
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
Final result$ awk '{print $2 " " $1}' part-00000 | sort -r -n
21477 que18297 de18189 y10363 la9824 a9490 el8243 en6335 no5079 se4748 los4202 con3940 por3468 las3461 lo3398 le
3352 su2647 don2623 del2539 como2345 me2312 si2284 mas2207 mi2175 quijote2148 sancho2142 es2077 yo1938 un1808 dijo1740 al1463 para1400 porque
domingo 1 de mayo de 2011
CL alternative
$ elastic-mapreduce --create \ --stream \ --input s3n://mrbg/input \ --mapper s3://mrbg/prog/wsplit.py \ --output s3n://mgbr/output/run2
$ elastic-mapreduce --create
domingo 1 de mayo de 2011
MapReduce, ex 2
Pi = 4*M/Ndomingo 1 de mayo de 2011
MapReduce: Mapper#!/usr/bin/ruby
ARGF.each do |line| mcsteps = line.strip unless mcsteps.length == 0 begin inside = 0 mcsteps.to_i.times do x, y = rand, rand inside += 1 if Math.hypot(x,y) < 1.0 end puts inside.to_s rescue # couldn't parse mc steps end end end
domingo 1 de mayo de 2011
Pi
$ cat mcs.txt
1000
... create more mcs.txts:
200_000_000
$ cat mcs.txt | ./mc-pi-mr.rb
776
200_000_000domingo 1 de mayo de 2011
MapReduce: Reducer
#!/usr/bin/ruby
count = 0ARGF.each do |line| count += line.to_iend
puts "#{count} points inside"
domingo 1 de mayo de 2011
Prepare the EMR
• upload mcsnn.txt to mrbg/mcinput/
• upload mc-mapper.rb to mrbg/prog/
• upload mc-reducer.rb to mrbg/prog/
domingo 1 de mayo de 2011
domingo 1 de mayo de 2011
est: 109955955/140000000*4=3.14159871domingo 1 de mayo de 2011
• Hadoop Common
• HDFS
• MapReduce
domingo 1 de mayo de 2011
Thank you
Q & A
domingo 1 de mayo de 2011