CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle,...

19
CS15-319 / 15-619 Cloud Computing Recitation 3 September 10 th & 13 th , 2013

Transcript of CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle,...

Page 1: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

CS15-319 / 15-619 Cloud Computing

Recitation 3

September 10th & 13th, 2013

Page 2: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Bugs!

• Checkpoint 1 grading

– If you suspect a bug in a question, do let us know.

– We have manually graded some questions in Checkpoint Quiz 1

• Encounter a general bug:

– Post on Piazza

• Encounter a grading bug:

– Post Privately on Piazza

Page 3: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

UNIT 1: Checkpoint Quiz 1 • Stats

– Average is 89%

– Grades:

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 6

11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96

10

1

10

6

11

1

11

6

12

1

12

6

13

1

13

6

14

1

14

6

15

1

15

6

16

1

16

6

Page 4: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Unit 2: Data Centers • Start reading first 2 modules:

– Module 3: Data Center Trends

– Module 4: Data Center Components

– Module 5: Design Considerations

– Unit 2: Checkpoint Quiz

UNIT 2: Data Centers

Module 3: Data Center Trends

Module 4: Data Center Components

Module 5: Design Considerations

Quiz 2: Data Centers Checkpoint

Not yet assigned Due date TBD by instructor

Page 5: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Amazon Web Services (AWS) Account

• For students who just enrolled:

– Create your AWS account

– Send you account information to Jason Boles (jboles@cmu) with Email Subject: Request to add account to Consolidated Bill

Page 6: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Student Questions on Piazza

• Why was my script killed?

– Running out of memory

– Use larger instances? Maybe, maybe not.

Please ask public questions when possible.

Page 7: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Project 1 Student Progress

• Introduction to Big Data:

– Sequential Analysis: Average is: 98%

– Elastic MapReduce: this week

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%1 6

11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96

10

1

10

6

11

1

11

6

12

1

12

6

13

1

13

6

14

1

14

6

15

1

15

6

Page 8: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Primer on MapReduce

• The idea of MapReduce

Please tell me how many times does the word “Apple” appear in these books?

Page 9: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Primer on MapReduce

• The idea of MapReduce

How many times does the word “Apple” appear in these books?

I heard 6 “Apple”s !

Apple,1

Apple,1 Apple,1 Apple,1

Apple,1 Apple,1

Page 10: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Primer on MapReduce

• The idea of MapReduce

Orange,1 Blueberry,1 Blueberry,1 Apple,1

Apple,1 Apple,1 Apple,1 Orange,1

Apple,1 Apple,1 Orange,1 Blueberry,1

Apple ?

Blueberry ?

Orange ?

How Do I know Who is the “Apple” Man? No, you Don’t!

Page 11: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Primer on MapReduce

• The idea of MapReduce

Orange,1 Blueberry,1 Blueberry,1 Apple,1

Apple,1 Apple,1 Apple,1 Orange,1

Apple,1 Apple,1 Orange,1 Blueberry,1

Apple ?

Blueberry ?

Orange ?

Magic Box (Shuffle,

sort, merge)

Map Phase Reduce Phase

Mapper

Reducer

Page 12: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Primer on MapReduce

• The idea of MapReduce

A full, simplified view of the phases, stages, tasks, data input, data output and data flow in the MapReduce analytics engine. (from Unit 4 Page 102)

Page 13: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Primer on MapReduce

• The idea of MapReduce

Apple ?

Blueberry ?

Orange ?

Map Phase Reduce Phase

Black Box (Shuffle,

sort, merge)

Orange,1 Blueberry,1 Blueberry,1 Apple,1

Apple,1 Apple,1 Apple,1 Orange,1

Apple,1 Apple,1 Orange,1 Blueberry,1

Page 14: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Primer on MapReduce

• Mapper – Input: lines in files in our project

– Output: key-value pairs • Keys are used in Shuffling and Merge to find the

Reducer that handles the intermediate output for that specific key. (in our example, Apple, Orange and Blueberry are keys)

• Values are messages sent from mapper to reducer (in our case it is always 1)

• Mappers’ output is intermediate because reducers will receive the key-value pairs and take them as input.

Page 15: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Primer on MapReduce

• Reducer

– Input: key-value pairs

– Output: the final result we need

• Depends on what we want, our code should process the value in the key-value pairs that we got accordingly (in the word count example, we just add up all the values).

Page 16: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Primer on MapReduce

• In the projects of this course we are going to run MapReduce on 2 Platforms

1. Amazon Elastic MapReduce (EMR)

2. Hadoop (later projects)

Page 17: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Demos

• Wordcount in MapReduce

• S3 buckets and permissions

• Introduction to EMR

• Debugging EMR using logs

Page 18: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Discussions

• Your questions…

Page 19: CS15-319 / 15-619 Cloud Computingmsakr/15319-f13/... · Map Phase Reduce Phase Black Box (Shuffle, sort, merge) Orange,1 Blueberry,1 Blueberry,1 Apple,1 Apple,1 Apple,1 ... Amazon

Upcoming Deadlines

Project 1

Introduction to Big Data Analysis

Sequential Analysis Checkpoint

Available Now Due 9/8/13 11:59 PM

Elastic MapReduce Checkpoint

Available 9/9/13 12:01 AM Due 9/15/13 11:59 PM

UNIT 2: Data Centers

Module 3: Data Center Trends

Module 4: Data Center Components

Module 5: Design Considerations

Quiz 2: Data Centers Checkpoint

Available 9/16/13 12:01 AM

Due 9/19/13 11:59 PM