CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday!...
Transcript of CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday!...
![Page 1: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/1.jpg)
1
CSEP 544: Lecture 06
Parallel DB and MR, Transactions Part 1 (Recovery)
CSEP544 - Fall 2015
![Page 2: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/2.jpg)
Outline
• Finish parallel databases and MapReduce
• Begin transactions
CSEP544 - Fall 2015 2
![Page 3: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/3.jpg)
Big Data
• Gartner report* – High Volume – High Variety – High Velocity
• Stonebraker: – Big volumes, small analytics – Big analytics, on big volumes – Big velocity – Big variety
* http://www.gartner.com/newsroom/id/1731916
![Page 4: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/4.jpg)
Famous Example of Big Data Analysis
Kumar et al., The Web as a Graph
• Question 1: is the Web like a “random graph”? – Random Graphs introduced by Erdos and Reny in the
1940s – Extensively studied in mathematics, well understood – If the Web is a “random graph”, then we have
mathematical tools to understand it: clusters, communities, diameter, etc
• Question 2: how does the Web graph look like?
CSEP544 - Fall 2015 4
![Page 5: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/5.jpg)
Announcement
• Homework 3 (AWS) due this Friday!
• Remember to turn your instances off!
CSEP544 - Fall 2015 5
![Page 6: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/6.jpg)
Graph Databases
Many large databases are graphs • Give examples in class
CSEP544 - Fall 2015 6
b d
e c
f g
a
Source Target
a b
b a
a f
b f
b e
b d
d e
d c
e g
g c
c g
![Page 7: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/7.jpg)
Graph Databases
Many large databases are graphs • Give examples in class • The Web • The Internet • Social Networks • Flights between airports • Etc.
CSEP544 - Fall 2015 7
b d
e c
f g
a
Source Target
a b
b a
a f
b f
b e
b d
d e
d c
e g
g c
c g
![Page 8: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/8.jpg)
Data Analytics on Big Graphs Queries expressible in SQL: • How many nodes (edges)? • How many nodes have > 4
neighbors? • Which are “most connected nodes”? Queries requiring recursion: • Is the graph connected? • What is the diameter of the graph? • Compute PageRank • Compute the Centrality of each node
CSEP544 - Fall 2015 8
b d
e c
f g
a
Source Target
a b
b a
a f
b f
b e
b d
d e
d c
e g
g c
c g
![Page 9: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/9.jpg)
Example: the Histogram of a Graph
• Outdegree of a node = number of outgoing edges
• For each d, let n(d) = number of nodes with oudegree d
• The outdegree histogram of a graph = the scatterplot (d, n(d))
CSEP544 - Fall 2015 9
0
2
4 2
1
1
1
d n(d) 0 1 1 3 2 2 3 0 4 1
0
1
2
3
4
0 1 2 3 4 5
d
n
Outdegree 1 is seen at 3 nodes
![Page 10: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/10.jpg)
Histograms Tell Us Something About the Graph
CSEP544 - Fall 2015 10
What can you say about these graphs?
0 20 40 60 80
100 120
0 5 10
x 10
000
0 20 40 60 80
100 120
0 5 10
x 10
000 0
20 40 60 80
100 120
0 5 10
x 10
000
![Page 11: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/11.jpg)
Exponential Distribution
• n(d) ≅ c/2d (generally, cxd, for some x < 1) • A random graph has exponential distribution • Best seen when n is on a log scale
CSEP544 - Fall 2015 11
1 10
100 1000
10000 100000
1000000
0 5 10
n
0 200000 400000 600000 800000
1000000 1200000
0 5 10
n
Quickly vanishing
# nodes with degree d
![Page 12: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/12.jpg)
Long tail
Power Law Distribution (Zipf)
• n(d) ≅ 1/dx, for some value x>0 • Human-generated data follows power law:
letters in alphabet, words in vocabulary, etc. • Best seen in a log-log scale
CSEP544 - Fall 2015 12
1000
10000
100000
1 4 16
n
10
100
1000
10000
100000
0 2 4 6 8 10 12 14 16 18
n
![Page 13: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/13.jpg)
The Histogram of the Web
CSEP544 - Fall 2015 13
Late 1990’s 200M Webpages
Exponential ? Power Law?
![Page 14: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/14.jpg)
The Bowtie Structure of the Web
14
![Page 15: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/15.jpg)
Executing a Large MapReduce Job
CSEP544 - Fall 2015 15
![Page 16: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/16.jpg)
Anatomy of a Query Execution
• Running problem #4
• 20 nodes = 1 master + 19 workers
• Using PARALLEL 50
CSEP544 - Fall 2015 16
![Page 17: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/17.jpg)
March 2013 3/9/13 Hadoop job_201303091944_0001 on domU-12-31-39-06-75-A1
localhost:9100/jobdetails.jsp?jobid=job_201303091944_0001&refresh=30 1/3
Hadoop job_201303091944_0001 on domU-12-31-39-
06-75-A1
User: hadoopJob Name: PigLatin:DefaultJobNameJob File:
hdfs://10.208.122.79:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201303091944_0001/job.xmlSubmit Host: domU-12-31-39-06-75-A1.compute-1.internalSubmit Host Address: 10.208.122.79Job-ACLs: All users are allowed
Job Setup: SuccessfulStatus: SucceededStarted at: Sat Mar 09 19:49:21 UTC 2013Finished at: Sat Mar 09 23:33:14 UTC 2013Finished in: 3hrs, 43mins, 52secJob Cleanup: SuccessfulBlack-listed TaskTrackers: 1
Kind % Complete Num Tasks Pending Running Complete KilledFailed/Killed
Task Attempts
map 100.00% 7908 0 0 7908 0 14 / 16
reduce 100.00% 50 0 0 50 0 0 / 8
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 454,162,761
Launched reduce tasks 0 0 58
Total time spent by all reduceswaiting after reserving slots(ms)
0 0 0
Rack-local map tasks 0 0 7,938
Total time spent by all mapswaiting after reserving slots(ms)
0 0 0
Launched map tasks 0 0 7,938
SLOTS_MILLIS_REDUCES 0 0 239,044,219
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 530,555,718,547 0 530,555,718,547
FILE_BYTES_READ 44,900,010,884 2,044,310,266 46,944,321,150
HDFS_BYTES_READ 2,797,236 0 2,797,236
FILE_BYTES_WRITTEN 15,198,970,239 2,053,439,376 17,252,409,615
![Page 18: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/18.jpg)
Some other time (March 2012)
• Let’s see what happened…
CSEP544 - Fall 2015 18
![Page 19: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/19.jpg)
Reduce input records 0 0 0Reduce input groups 0 0 0
Combine output records 173,820,131 9,112,575 182,932,706
Physical memory (bytes)snapshot 1,912,514,703,360 3,980,988,416 1,916,495,691,776
Reduce output records 0 0 0
Virtual memory (bytes)snapshot 2,975,862,571,008 11,173,437,440 2,987,036,008,448
Map output records 805,225,193 0 805,225,193
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 1hrs, 16mins, 33secJob Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 33.17% 15816 10549 38 5229 0 0 / 0
reduce 4.17% 50 31 19 0 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 164,620,372
Launched reduce tasks 0 0 19
Rack-local map tasks 0 0 5,267
Launched map tasks 0 0 5,267
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 175,523,148,980 0 175,523,148,980
HDFS_BYTES_READ 1,845,837 0 1,845,837
FILE_BYTES_WRITTEN 3,206,602,012 145,356,233 3,351,958,245
Map-ReduceFramework
Map output materializedbytes 2,444,314,273 0 2,444,314,273
Map input records 805,225,193 0 805,225,193
Reduce shuffle bytes 0 909,468,723 909,468,723
Spilled Records 173,820,131 0 173,820,131
Map output bytes 62,732,457,803 0 62,732,457,803
CPU time spent (ms) 55,277,520 2,656,940 57,934,460
Total committed heap usage(bytes) 1,956,086,312,960 3,042,803,712 1,959,129,116,672
Combine input records 805,225,193 62,442,816 867,668,009
SPLIT_RAW_BYTES 1,845,837 0 1,845,837
1h 16min
![Page 20: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/20.jpg)
Reduce input records 0 0 0Reduce input groups 0 0 0
Combine output records 173,820,131 9,112,575 182,932,706
Physical memory (bytes)snapshot 1,912,514,703,360 3,980,988,416 1,916,495,691,776
Reduce output records 0 0 0
Virtual memory (bytes)snapshot 2,975,862,571,008 11,173,437,440 2,987,036,008,448
Map output records 805,225,193 0 805,225,193
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 1hrs, 16mins, 33secJob Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 33.17% 15816 10549 38 5229 0 0 / 0
reduce 4.17% 50 31 19 0 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 164,620,372
Launched reduce tasks 0 0 19
Rack-local map tasks 0 0 5,267
Launched map tasks 0 0 5,267
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 175,523,148,980 0 175,523,148,980
HDFS_BYTES_READ 1,845,837 0 1,845,837
FILE_BYTES_WRITTEN 3,206,602,012 145,356,233 3,351,958,245
Map-ReduceFramework
Map output materializedbytes 2,444,314,273 0 2,444,314,273
Map input records 805,225,193 0 805,225,193
Reduce shuffle bytes 0 909,468,723 909,468,723
Spilled Records 173,820,131 0 173,820,131
Map output bytes 62,732,457,803 0 62,732,457,803
CPU time spent (ms) 55,277,520 2,656,940 57,934,460
Total committed heap usage(bytes) 1,956,086,312,960 3,042,803,712 1,959,129,116,672
Combine input records 805,225,193 62,442,816 867,668,009
SPLIT_RAW_BYTES 1,845,837 0 1,845,837
Only 19 reducers active, out of 50. Why?
When will the other 31 reducers be scheduled? Copying by 19 reducers
in parallel with mappers.
1h 16min
![Page 21: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/21.jpg)
Reduce input records 0 0 0Reduce input groups 0 0 0
Combine output records 173,820,131 9,112,575 182,932,706
Physical memory (bytes)snapshot 1,912,514,703,360 3,980,988,416 1,916,495,691,776
Reduce output records 0 0 0
Virtual memory (bytes)snapshot 2,975,862,571,008 11,173,437,440 2,987,036,008,448
Map output records 805,225,193 0 805,225,193
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 3hrs, 50mins, 12secJob Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 100.00% 15816 0 0 15816 0 0 / 18
reduce 32.42% 50 31 19 0 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 495,799,522
Launched reduce tasks 0 0 19
Rack-local map tasks 0 0 15,834
Launched map tasks 0 0 15,834
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 530,591,875,823 0 530,591,875,823
FILE_BYTES_READ 0 309,198,848 309,198,848
HDFS_BYTES_READ 5,587,893 0 5,587,893
FILE_BYTES_WRITTEN 9,616,982,133 850,567,984 10,467,550,117
HDFS_BYTES_WRITTEN 0 946,814,498 946,814,498
Map output materializedbytes 7,311,305,131 0 7,311,305,131
Map input records 2,501,793,030 0 2,501,793,030
Reduce shuffle bytes 0 2,755,605,871 2,755,605,871
Spilled Records 465,817,710 0 465,817,710
Map output bytes 199,575,247,017 0 199,575,247,017
CPU time spent (ms) 165,894,080 9,129,070 175,023,150
Map-ReduceFramework
Total committed heap usage(bytes)
5,922,097,602,560 3,008,761,856 5,925,106,364,416
Combine input records 2,501,793,030 168,420,895 2,670,213,925
SPLIT_RAW_BYTES 5,587,893 0 5,587,893
Reduce input records 0 21,039,080 21,039,080
Reduce input groups 0 13,593,157 13,593,157
Combine output records 465,817,710 47,802,630 513,620,340
Physical memory (bytes)snapshot 5,790,488,764,416 4,018,405,376 5,794,507,169,792
Reduce output records 0 13,593,139 13,593,139
Virtual memory (bytes)snapshot 9,001,329,868,800 11,175,534,592 9,012,505,403,392
Map output records 2,501,793,030 0 2,501,793,030
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 1hrs, 16mins, 33secJob Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 33.17% 15816 10549 38 5229 0 0 / 0
reduce 4.17% 50 31 19 0 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 164,620,372
Launched reduce tasks 0 0 19
Rack-local map tasks 0 0 5,267
Launched map tasks 0 0 5,267
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 175,523,148,980 0 175,523,148,980
HDFS_BYTES_READ 1,845,837 0 1,845,837
FILE_BYTES_WRITTEN 3,206,602,012 145,356,233 3,351,958,245
Map-ReduceFramework
Map output materializedbytes 2,444,314,273 0 2,444,314,273
Map input records 805,225,193 0 805,225,193
Reduce shuffle bytes 0 909,468,723 909,468,723
Spilled Records 173,820,131 0 173,820,131
Map output bytes 62,732,457,803 0 62,732,457,803
CPU time spent (ms) 55,277,520 2,656,940 57,934,460
Total committed heap usage(bytes) 1,956,086,312,960 3,042,803,712 1,959,129,116,672
Combine input records 805,225,193 62,442,816 867,668,009
SPLIT_RAW_BYTES 1,845,837 0 1,845,837
Only 19 reducers active, out of 50. Why?
When will the other 31 reducers be scheduled? Copying by 19 reducers
in parallel with mappers.
1h 16min 3h 50min
![Page 22: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/22.jpg)
Reduce input records 0 0 0Reduce input groups 0 0 0
Combine output records 173,820,131 9,112,575 182,932,706
Physical memory (bytes)snapshot 1,912,514,703,360 3,980,988,416 1,916,495,691,776
Reduce output records 0 0 0
Virtual memory (bytes)snapshot 2,975,862,571,008 11,173,437,440 2,987,036,008,448
Map output records 805,225,193 0 805,225,193
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 3hrs, 50mins, 12secJob Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 100.00% 15816 0 0 15816 0 0 / 18
reduce 32.42% 50 31 19 0 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 495,799,522
Launched reduce tasks 0 0 19
Rack-local map tasks 0 0 15,834
Launched map tasks 0 0 15,834
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 530,591,875,823 0 530,591,875,823
FILE_BYTES_READ 0 309,198,848 309,198,848
HDFS_BYTES_READ 5,587,893 0 5,587,893
FILE_BYTES_WRITTEN 9,616,982,133 850,567,984 10,467,550,117
HDFS_BYTES_WRITTEN 0 946,814,498 946,814,498
Map output materializedbytes 7,311,305,131 0 7,311,305,131
Map input records 2,501,793,030 0 2,501,793,030
Reduce shuffle bytes 0 2,755,605,871 2,755,605,871
Spilled Records 465,817,710 0 465,817,710
Map output bytes 199,575,247,017 0 199,575,247,017
CPU time spent (ms) 165,894,080 9,129,070 175,023,150
Map-ReduceFramework
Total committed heap usage(bytes)
5,922,097,602,560 3,008,761,856 5,925,106,364,416
Combine input records 2,501,793,030 168,420,895 2,670,213,925
SPLIT_RAW_BYTES 5,587,893 0 5,587,893
Reduce input records 0 21,039,080 21,039,080
Reduce input groups 0 13,593,157 13,593,157
Combine output records 465,817,710 47,802,630 513,620,340
Physical memory (bytes)snapshot 5,790,488,764,416 4,018,405,376 5,794,507,169,792
Reduce output records 0 13,593,139 13,593,139
Virtual memory (bytes)snapshot 9,001,329,868,800 11,175,534,592 9,012,505,403,392
Map output records 2,501,793,030 0 2,501,793,030
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 1hrs, 16mins, 33secJob Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 33.17% 15816 10549 38 5229 0 0 / 0
reduce 4.17% 50 31 19 0 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 164,620,372
Launched reduce tasks 0 0 19
Rack-local map tasks 0 0 5,267
Launched map tasks 0 0 5,267
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 175,523,148,980 0 175,523,148,980
HDFS_BYTES_READ 1,845,837 0 1,845,837
FILE_BYTES_WRITTEN 3,206,602,012 145,356,233 3,351,958,245
Map-ReduceFramework
Map output materializedbytes 2,444,314,273 0 2,444,314,273
Map input records 805,225,193 0 805,225,193
Reduce shuffle bytes 0 909,468,723 909,468,723
Spilled Records 173,820,131 0 173,820,131
Map output bytes 62,732,457,803 0 62,732,457,803
CPU time spent (ms) 55,277,520 2,656,940 57,934,460
Total committed heap usage(bytes) 1,956,086,312,960 3,042,803,712 1,959,129,116,672
Combine input records 805,225,193 62,442,816 867,668,009
SPLIT_RAW_BYTES 1,845,837 0 1,845,837
Only 19 reducers active, out of 50. Why?
Speculative Execution
When will the other 31 reducers be scheduled?
Completed. Sorting, and the rest of Reduce may
proceed now
Copying by 19 reducers in parallel with mappers.
1h 16min 3h 50min
![Page 23: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/23.jpg)
Map-ReduceFramework
CPU time spent (ms) 165,894,080 10,013,680 175,907,760Total committed heap usage(bytes) 5,922,097,602,560 3,008,761,856 5,925,106,364,416
Combine input records 2,501,793,030 168,420,895 2,670,213,925
SPLIT_RAW_BYTES 5,587,893 0 5,587,893
Reduce input records 0 49,680,950 49,680,950
Reduce input groups 0 39,612,536 39,612,536
Combine output records 465,817,710 47,802,630 513,620,340
Physical memory (bytes)snapshot 5,790,488,764,416 4,020,133,888 5,794,508,898,304
Reduce output records 0 39,612,527 39,612,527
Virtual memory (bytes)snapshot 9,001,329,868,800 11,175,473,152 9,012,505,341,952
Map output records 2,501,793,030 0 2,501,793,030
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 3hrs, 51mins, 19secJob Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 100.00% 15816 0 0 15816 0 0 / 18
reduce 37.72% 50 19 22 9 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 495,799,522
Launched reduce tasks 0 0 31
Rack-local map tasks 0 0 15,834
Launched map tasks 0 0 15,834
SLOTS_MILLIS_REDUCES 0 0 118,328,830
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 530,591,875,823 0 530,591,875,823
FILE_BYTES_READ 0 754,835,408 754,835,408
HDFS_BYTES_READ 5,587,893 0 5,587,893
FILE_BYTES_WRITTEN 9,616,982,133 850,567,984 10,467,550,117
HDFS_BYTES_WRITTEN 0 3,400,371,086 3,400,371,086
Map output materializedbytes 7,311,305,131 0 7,311,305,131
Map input records 2,501,793,030 0 2,501,793,030
Reduce shuffle bytes 0 2,755,605,871 2,755,605,871
Spilled Records 465,817,710 26,163,538 491,981,248
Map output bytes 199,575,247,017 0 199,575,247,017
3h 51min
![Page 24: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/24.jpg)
Map-ReduceFramework
CPU time spent (ms) 165,894,080 10,013,680 175,907,760Total committed heap usage(bytes) 5,922,097,602,560 3,008,761,856 5,925,106,364,416
Combine input records 2,501,793,030 168,420,895 2,670,213,925
SPLIT_RAW_BYTES 5,587,893 0 5,587,893
Reduce input records 0 49,680,950 49,680,950
Reduce input groups 0 39,612,536 39,612,536
Combine output records 465,817,710 47,802,630 513,620,340
Physical memory (bytes)snapshot 5,790,488,764,416 4,020,133,888 5,794,508,898,304
Reduce output records 0 39,612,527 39,612,527
Virtual memory (bytes)snapshot 9,001,329,868,800 11,175,473,152 9,012,505,341,952
Map output records 2,501,793,030 0 2,501,793,030
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 3hrs, 51mins, 19secJob Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 100.00% 15816 0 0 15816 0 0 / 18
reduce 37.72% 50 19 22 9 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 495,799,522
Launched reduce tasks 0 0 31
Rack-local map tasks 0 0 15,834
Launched map tasks 0 0 15,834
SLOTS_MILLIS_REDUCES 0 0 118,328,830
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 530,591,875,823 0 530,591,875,823
FILE_BYTES_READ 0 754,835,408 754,835,408
HDFS_BYTES_READ 5,587,893 0 5,587,893
FILE_BYTES_WRITTEN 9,616,982,133 850,567,984 10,467,550,117
HDFS_BYTES_WRITTEN 0 3,400,371,086 3,400,371,086
Map output materializedbytes 7,311,305,131 0 7,311,305,131
Map input records 2,501,793,030 0 2,501,793,030
Reduce shuffle bytes 0 2,755,605,871 2,755,605,871
Spilled Records 465,817,710 26,163,538 491,981,248
Map output bytes 199,575,247,017 0 199,575,247,017
…Next Batch of Reducers started
Some of the 19 reducers have finished…
3h 51min
![Page 25: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/25.jpg)
Map-ReduceFramework
CPU time spent (ms) 165,894,080 10,013,680 175,907,760Total committed heap usage(bytes) 5,922,097,602,560 3,008,761,856 5,925,106,364,416
Combine input records 2,501,793,030 168,420,895 2,670,213,925
SPLIT_RAW_BYTES 5,587,893 0 5,587,893
Reduce input records 0 49,680,950 49,680,950
Reduce input groups 0 39,612,536 39,612,536
Combine output records 465,817,710 47,802,630 513,620,340
Physical memory (bytes)snapshot 5,790,488,764,416 4,020,133,888 5,794,508,898,304
Reduce output records 0 39,612,527 39,612,527
Virtual memory (bytes)snapshot 9,001,329,868,800 11,175,473,152 9,012,505,341,952
Map output records 2,501,793,030 0 2,501,793,030
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Map-ReduceFramework
CPU time spent (ms) 165,894,080 10,725,020 176,619,100Total committed heap usage(bytes) 5,922,097,602,560 9,412,485,120 5,931,510,087,680
Combine input records 2,501,793,030 175,243,866 2,677,036,896
SPLIT_RAW_BYTES 5,587,893 0 5,587,893
Reduce input records 0 54,940,866 54,940,866
Reduce input groups 0 44,756,179 44,756,179
Combine output records 465,817,710 48,604,128 514,421,838
Physical memory (bytes)snapshot 5,790,488,764,416 11,311,841,280 5,801,800,605,696
Reduce output records 0 44,756,179 44,756,179
Virtual memory (bytes)snapshot 9,001,329,868,800 21,805,244,416 9,023,135,113,216
Map output records 2,501,793,030 0 2,501,793,030
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 3hrs, 52mins, 51secJob Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 100.00% 15816 0 0 15816 0 0 / 18
reduce 42.35% 50 11 20 19 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 495,799,522
Launched reduce tasks 0 0 39
Rack-local map tasks 0 0 15,834
Launched map tasks 0 0 15,834
SLOTS_MILLIS_REDUCES 0 0 250,004,109
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 530,591,875,823 0 530,591,875,823
FILE_BYTES_READ 0 847,821,126 847,821,126
HDFS_BYTES_READ 5,587,893 0 5,587,893
FILE_BYTES_WRITTEN 9,616,982,133 864,512,016 10,481,494,149
HDFS_BYTES_WRITTEN 0 3,967,197,533 3,967,197,533
Map output materializedbytes 7,311,305,131 0 7,311,305,131
Map input records 2,501,793,030 0 2,501,793,030
Reduce shuffle bytes 0 3,489,678,276 3,489,678,276
Spilled Records 465,817,710 54,940,866 520,758,576
Map output bytes 199,575,247,017 0 199,575,247,017
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 3hrs, 51mins, 19secJob Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 100.00% 15816 0 0 15816 0 0 / 18
reduce 37.72% 50 19 22 9 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 495,799,522
Launched reduce tasks 0 0 31
Rack-local map tasks 0 0 15,834
Launched map tasks 0 0 15,834
SLOTS_MILLIS_REDUCES 0 0 118,328,830
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 530,591,875,823 0 530,591,875,823
FILE_BYTES_READ 0 754,835,408 754,835,408
HDFS_BYTES_READ 5,587,893 0 5,587,893
FILE_BYTES_WRITTEN 9,616,982,133 850,567,984 10,467,550,117
HDFS_BYTES_WRITTEN 0 3,400,371,086 3,400,371,086
Map output materializedbytes 7,311,305,131 0 7,311,305,131
Map input records 2,501,793,030 0 2,501,793,030
Reduce shuffle bytes 0 2,755,605,871 2,755,605,871
Spilled Records 465,817,710 26,163,538 491,981,248
Map output bytes 199,575,247,017 0 199,575,247,017
…Next Batch of Reducers started
Some of the 19 reducers have finished…
Next Batch of 19 reducers
3h 52min 3h 51min
![Page 26: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/26.jpg)
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 4hrs, 18mins, 22secJob Cleanup: PendingBlack-listed TaskTrackers: 1
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 99.88% 15816 2638 30 13148 0 15 / 3337
reduce 48.42% 50 15 16 19 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 520,840,319
Launched reduce tasks 0 0 39
Rack-local map tasks 0 0 16,530
Launched map tasks 0 0 16,530
SLOTS_MILLIS_REDUCES 0 0 250,004,109
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 441,403,920,262 0 441,403,920,262
FILE_BYTES_READ 0 847,821,126 847,821,126
HDFS_BYTES_READ 4,650,415 0 4,650,415
FILE_BYTES_WRITTEN 8,001,044,946 1,403,559,708 9,404,604,654
HDFS_BYTES_WRITTEN 0 3,967,197,533 3,967,197,533
Map output materializedbytes 6,082,144,011 0 6,082,144,011
Map input records 2,078,999,323 0 2,078,999,323
Reduce shuffle bytes 0 5,045,223,844 5,045,223,844
Spilled Records 389,005,699 54,940,866 443,946,565
Map output bytes 165,741,477,602 0 165,741,477,602
Map-ReduceFramework
CPU time spent (ms) 137,792,860 20,822,400 158,615,260
Total committed heap usage(bytes) 4,923,491,106,816 9,237,303,296 4,932,728,410,112
Combine input records 2,077,586,535 308,803,126 2,386,389,661
SPLIT_RAW_BYTES 4,650,415 0 4,650,415
Reduce input records 0 54,940,866 54,940,866
Reduce input groups 0 44,756,179 44,756,179
Combine output records 389,005,699 83,268,384 472,274,083
Physical memory (bytes)snapshot 4,811,045,253,120 11,161,067,520 4,822,206,320,640
Reduce output records 0 44,756,179 44,756,179
Virtual memory (bytes)snapshot 7,488,476,110,848 20,624,834,560 7,509,100,945,408
Map output records 2,079,000,720 0 2,079,000,720
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
Several servers failed: “fetch error”. Their map tasks need to be
rerun. All reducers are waiting….
4h 18min
![Page 27: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/27.jpg)
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 4hrs, 18mins, 22secJob Cleanup: PendingBlack-listed TaskTrackers: 1
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 99.88% 15816 2638 30 13148 0 15 / 3337
reduce 48.42% 50 15 16 19 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 520,840,319
Launched reduce tasks 0 0 39
Rack-local map tasks 0 0 16,530
Launched map tasks 0 0 16,530
SLOTS_MILLIS_REDUCES 0 0 250,004,109
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 441,403,920,262 0 441,403,920,262
FILE_BYTES_READ 0 847,821,126 847,821,126
HDFS_BYTES_READ 4,650,415 0 4,650,415
FILE_BYTES_WRITTEN 8,001,044,946 1,403,559,708 9,404,604,654
HDFS_BYTES_WRITTEN 0 3,967,197,533 3,967,197,533
Map output materializedbytes 6,082,144,011 0 6,082,144,011
Map input records 2,078,999,323 0 2,078,999,323
Reduce shuffle bytes 0 5,045,223,844 5,045,223,844
Spilled Records 389,005,699 54,940,866 443,946,565
Map output bytes 165,741,477,602 0 165,741,477,602
Map-ReduceFramework
CPU time spent (ms) 137,792,860 20,822,400 158,615,260
Total committed heap usage(bytes) 4,923,491,106,816 9,237,303,296 4,932,728,410,112
Combine input records 2,077,586,535 308,803,126 2,386,389,661
SPLIT_RAW_BYTES 4,650,415 0 4,650,415
Reduce input records 0 54,940,866 54,940,866
Reduce input groups 0 44,756,179 44,756,179
Combine output records 389,005,699 83,268,384 472,274,083
Physical memory (bytes)snapshot 4,811,045,253,120 11,161,067,520 4,822,206,320,640
Reduce output records 0 44,756,179 44,756,179
Virtual memory (bytes)snapshot 7,488,476,110,848 20,624,834,560 7,509,100,945,408
Map output records 2,079,000,720 0 2,079,000,720
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
Several servers failed: “fetch error”. Their map tasks need to be
rerun. All reducers are waiting….
4h 18min
Why did we lose some reducers?
![Page 28: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/28.jpg)
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 4hrs, 18mins, 22secJob Cleanup: PendingBlack-listed TaskTrackers: 1
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 99.88% 15816 2638 30 13148 0 15 / 3337
reduce 48.42% 50 15 16 19 0 0 / 0
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 520,840,319
Launched reduce tasks 0 0 39
Rack-local map tasks 0 0 16,530
Launched map tasks 0 0 16,530
SLOTS_MILLIS_REDUCES 0 0 250,004,109
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 441,403,920,262 0 441,403,920,262
FILE_BYTES_READ 0 847,821,126 847,821,126
HDFS_BYTES_READ 4,650,415 0 4,650,415
FILE_BYTES_WRITTEN 8,001,044,946 1,403,559,708 9,404,604,654
HDFS_BYTES_WRITTEN 0 3,967,197,533 3,967,197,533
Map output materializedbytes 6,082,144,011 0 6,082,144,011
Map input records 2,078,999,323 0 2,078,999,323
Reduce shuffle bytes 0 5,045,223,844 5,045,223,844
Spilled Records 389,005,699 54,940,866 443,946,565
Map output bytes 165,741,477,602 0 165,741,477,602
Map-ReduceFramework
CPU time spent (ms) 137,792,860 20,822,400 158,615,260
Total committed heap usage(bytes) 4,923,491,106,816 9,237,303,296 4,932,728,410,112
Combine input records 2,077,586,535 308,803,126 2,386,389,661
SPLIT_RAW_BYTES 4,650,415 0 4,650,415
Reduce input records 0 54,940,866 54,940,866
Reduce input groups 0 44,756,179 44,756,179
Combine output records 389,005,699 83,268,384 472,274,083
Physical memory (bytes)snapshot 4,811,045,253,120 11,161,067,520 4,822,206,320,640
Reduce output records 0 44,756,179 44,756,179
Virtual memory (bytes)snapshot 7,488,476,110,848 20,624,834,560 7,509,100,945,408
Map output records 2,079,000,720 0 2,079,000,720
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: RunningStarted at: Sun Mar 04 19:08:29 UTC 2012Running for: 7hrs, 10mins, 54secJob Cleanup: PendingBlack-listed TaskTrackers: 3
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 100.00% 15816 0 0 15816 0 26 / 5968
reduce 94.15% 50 0 6 44 0 0 / 8
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 676,845,552
Launched reduce tasks 0 0 62
Rack-local map tasks 0 0 21,810
Launched map tasks 0 0 21,810
SLOTS_MILLIS_REDUCES 0 0 390,018,556
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 530,591,952,796 0 530,591,952,796
FILE_BYTES_READ 0 1,921,632,609 1,921,632,609
HDFS_BYTES_READ 5,587,893 0 5,587,893
FILE_BYTES_WRITTEN 9,616,982,133 2,051,943,740 11,668,925,873
HDFS_BYTES_WRITTEN 0 9,411,137,927 9,411,137,927
Map output materializedbytes 7,311,305,131 0 7,311,305,131
Map input records 2,501,793,030 0 2,501,793,030
Reduce shuffle bytes 0 7,226,095,915 7,226,095,915
Spilled Records 465,817,710 122,997,587 588,815,297
Map output bytes 199,575,247,017 0 199,575,247,017
Map-ReduceFramework
CPU time spent (ms) 165,059,320 36,329,450 201,388,770
Total committed heap usage(bytes) 5,920,284,372,992 15,076,560,896 5,935,360,933,888
Combine input records 2,501,793,030 437,117,972 2,938,911,002
SPLIT_RAW_BYTES 5,587,893 0 5,587,893
Reduce input records 0 126,918,315 126,918,315
Reduce input groups 0 106,505,013 106,505,013
Combine output records 465,817,710 117,266,617 583,084,327
Physical memory (bytes)snapshot 5,781,194,698,752 17,890,435,072 5,799,085,133,824
Reduce output records 0 106,505,011 106,505,011
Virtual memory (bytes)snapshot 8,999,333,040,128 29,498,195,968 9,028,831,236,096
Map output records 2,501,793,030 0 2,501,793,030
Map Completion Graph - close
0 1582 3164 4746 6328 7910 9492 11074 12656 14238 15820
100908070605040302010
0
Reduce Completion Graph - close
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
Several servers failed: “fetch error”. Their map tasks need to be
rerun. All reducers are waiting….
Mappers finished, reducers resumed.
7h 10min 4h 18min
Why did we lose some reducers?
![Page 29: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/29.jpg)
0 5 10 15 20 25 30 35 40 45 50
100908070605040302010
0
copy
sort
reduce
Go back to JobTracker
This is Apache Hadoop release 0.20.205
Hadoop job_201203041905_0001 on ip-10-203-30-146User: hadoopJob Name: PigLatin:DefaultJobNameJob File:hdfs://10.203.30.146:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201203041905_0001/job.xmlSubmit Host: ip-10-203-30-146.ec2.internalSubmit Host Address: 10.203.30.146Job-ACLs: All users are allowedJob Setup: SuccessfulStatus: SucceededStarted at: Sun Mar 04 19:08:29 UTC 2012Finished at: Mon Mar 05 02:28:39 UTC 2012Finished in: 7hrs, 20mins, 10secJob Cleanup: SuccessfulBlack-listed TaskTrackers: 3
Kind % Complete Num Tasks Pending Running Complete Killed Failed/KilledTask Attempts
map 100.00% 15816 0 0 15816 0 26 / 5968
reduce 100.00% 50 0 0 50 0 0 / 14
Counter Map Reduce Total
Job Counters
SLOTS_MILLIS_MAPS 0 0 676,850,579
Launched reduce tasks 0 0 64
Total time spent by all reduceswaiting after reserving slots(ms)
0 0 0
Rack-local map tasks 0 0 21,810
Total time spent by all mapswaiting after reserving slots(ms)
0 0 0
Launched map tasks 0 0 21,810
SLOTS_MILLIS_REDUCES 0 0 397,936,187
File Output FormatCounters Bytes Written 0 0 0
File Input FormatCounters Bytes Read 0 0 0
FileSystemCounters
S3N_BYTES_READ 530,591,952,796 0 530,591,952,796
FILE_BYTES_READ 0 2,112,335,501 2,112,335,501
HDFS_BYTES_READ 5,587,893 0 5,587,893
FILE_BYTES_WRITTEN 9,616,982,133 2,119,564,091 11,736,546,224
HDFS_BYTES_WRITTEN 0 10,432,880,333 10,432,880,333
Success! 7hrs, 20mins.
7h 20min
![Page 30: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/30.jpg)
Parallel Query Processing How do we compute these operations on a shared-nothing parallel db?
• Selection: σA=123(R) (that’s easy, won’t discuss…)
• Group-by: γA,sum(B)(R)
• Join: R ⋈ S
Before we answer that: how do we store R (and S) on a shared-nothing parallel db?
CSEP544 - Fall 2015 30
![Page 31: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/31.jpg)
Review
• Shared memory / disk / nothing
• Speedup / Scaleup
• Interquery-, intraquery-, intraoperator parallelism
• Horizontal data partitioning
CSEP544 - Fall 2015 31
![Page 32: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/32.jpg)
Horizontal Data Partitioning
CSEP544 - Fall 2015 32
1 2 P . . .
Data: Servers:
K A B … …
![Page 33: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/33.jpg)
Horizontal Data Partitioning
CSEP544 - Fall 2015 33
K A B … …
1 2 P . . .
Data: Servers:
K A B
… …
K A B
… …
K A B
… …
![Page 34: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/34.jpg)
Horizontal Data Partitioning
CSEP544 - Fall 2015 34
K A B … …
1 2 P . . .
Data: Servers:
K A B
… …
K A B
… …
K A B
… …
Which tuples go to what server?
![Page 35: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/35.jpg)
Horizontal Data Partitioning • Block Partition:
– Partition tuples arbitrarily s.t. size(R1)≈ … ≈ size(RP)
• Hash partitioned on attribute A: – Tuple t goes to chunk i, where i = h(t.A) mod P + 1
• Range partitioned on attribute A: – Partition the range of A into -∞ = v0 < v1 < … < vP = ∞ – Tuple t goes to chunk i, if vi-1 < t.A < vi
35 CSEP544 - Fall 2015
![Page 36: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/36.jpg)
Parallel Hash-Partitioned GroupBy
Data: R(K,A,B,C) Query: γA,sum(C)(R) Discuss in class how to compute in each case:
• R is hash-partitioned on A
• R is block-partitioned
• R is hash-partitioned on K
36 CSEP544 - Fall 2015
![Page 37: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/37.jpg)
Parallel Hash-Partitioned GroupBy
Data: R(K,A,B,C) Query: γA,sum(C)(R) • R is block-partitioned or hash-partitioned
on K
37
R1 R2 RP . . .
R1’ R2’ RP’
. . .
Reshuffle R on attribute A
CSEP544 - Fall 2015
![Page 38: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/38.jpg)
Parallel Hash-Partitioned Join
• Data: R(K1,A, B), S(K2, B, C) • Query: R(K1,A,B) ⋈ S(K2,B,C)
38 CSEP544 - Fall 2015
Initially, both R and S are horizontally partitioned on K1 and K2
R1, S1 R2, S2 RP, SP
![Page 39: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/39.jpg)
Parallel Hash-Partitioned Join
• Data: R(K1,A, B), S(K2, B, C) • Query: R(K1,A,B) ⋈ S(K2,B,C)
39
R1, S1 R2, S2 RP, SP . . .
R’1, S’1 R’2, S’2 R’P, S’P . . .
Reshuffle R on R.B and S on S.B
Each server computes the join locally
CSEP544 - Fall 2015
Initially, both R and S are horizontally partitioned on K1 and K2
![Page 40: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/40.jpg)
Speedup and Scaleup
• Consider: – Query: γA,sum(C)(R) – Runtime: dominated by reading chunks from
disk • If we double the number of nodes P, what
is the new running time?
• If we double both P and the size of R, what is the new running time?
CSEP544 - Fall 2015 40
![Page 41: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/41.jpg)
Speedup and Scaleup
• Consider: – Query: γA,sum(C)(R) – Runtime: dominated by reading chunks from disk
• If we double the number of nodes P, what is the new running time? – Half (each server holds ½ as many chunks)
• If we double both P and the size of R, what is the new running time? – Same (each server holds the same # of chunks)
CSEP544 - Fall 2015 41
![Page 42: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/42.jpg)
Uniform Data v.s. Skewed Data
• Let R(K,A,B,C); which of the following partition methods may result in skewed partitions?
• Block partition
• Hash-partition – On the key K – On the attribute A
CSEP544 - Fall 2015 42
![Page 43: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/43.jpg)
Uniform Data v.s. Skewed Data
• Let R(K,A,B,C); which of the following partition methods may result in skewed partitions?
• Block partition
• Hash-partition – On the key K – On the attribute A
Uniform
Uniform
May be skewed
Assuming good hash function
E.g. when all records have the same value of the attribute A, then all records end up in the same partition
CSEP544 - Fall 2015 43
![Page 44: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/44.jpg)
Broadcast Join
• Data: R(K1,A, B), S(K2, B, C) • Query: R(K1,A,B) ⋈ S(K2,B,C)
44
R1, S1 R2, S2 RP, SP . . .
R1, S R2, S RP, S . . .
Keep R in place Broadcast S
Each server computes the join locally
CSEP544 - Fall 2015
Initially, both R and S are horizontally partitioned on K1 and K2
![Page 45: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/45.jpg)
45
Example: Teradata – Loading
AMP = “Access Module Processor” = unit of parallelism
CSEP544 - Fall 2015
![Page 46: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/46.jpg)
46
Example: Teradata – Query Execution
SELECT * FROM Order o, Line i WHERE o.item = i.item AND o.date = today()
join
select
scan scan
date = today()
o.item = i.item
Order o Item i
Find all orders from today, along with the items ordered
CSEP544 - Fall 2015
Order(oid, item, date), Line(item, …)
![Page 47: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/47.jpg)
Query Execution
CSEP544 - Fall 2015 47
AMP 1 AMP 2 AMP 3
select date=today()
select date=today()
select date=today()
scan Order o
scan Order o
scan Order o
hash h(o.item)
hash h(o.item)
hash h(o.item)
AMP 1 AMP 2 AMP 3
join
select
scan
date = today()
o.item = i.item
Order o
Order(oid, item, date), Line(item, …)
![Page 48: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/48.jpg)
Query Execution
CSEP544 - Fall 2015 48
AMP 1 AMP 2 AMP 3
scan Item i
AMP 1 AMP 2 AMP 3
hash h(i.item)
scan Item i
hash h(i.item)
scan Item i
hash h(i.item)
join
scan date = today()
o.item = i.item
Order o Item i
Order(oid, item, date), Line(item, …)
![Page 49: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/49.jpg)
Query Execution
CSEP544 - Fall 2015 49
AMP 1 AMP 2 AMP 3
join join join o.item = i.item o.item = i.item o.item = i.item
contains all orders and all lines where hash(item) = 1
contains all orders and all lines where hash(item) = 2
contains all orders and all lines where hash(item) = 3
Order(oid, item, date), Line(item, …)
![Page 50: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/50.jpg)
MapReduce
CSEP544 - Fall 2015 50
![Page 51: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/51.jpg)
Cluster Computing
• Commodity servers, high speed network • Servers à Racks à Data centers • Massive parallelism:
– 100s, or 1000s, or 10000s servers – Many hours
• Failure: – If medium-time-between-failure is 1 year – Then 10000 servers have one failure / hour
CSEP544 - Fall 2015 51
![Page 52: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/52.jpg)
Distributed File System (DFS) • For very large files: TBs, PBs
• File is partitioned into chunks, e.g. 64MB
• Each chunk is replicated, e.g. 3 times
• Implementations: – Google’s DFS: GFS, proprietary – Hadoop’s DFS: HDFS, open source
CSEP544 - Fall 2015 52
![Page 53: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/53.jpg)
Map Reduce
• Google: paper published 2004 • Free variant: Hadoop
• Map-reduce = high-level programming model and implementation for large-scale parallel data processing
53 CSEP544 - Fall 2015
![Page 54: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/54.jpg)
Data Model
Files !
A file = a bag of (key, value) pairs
A MapReduce program: • Input: a bag of (inputkey, value)pairs • Output: a bag of (outputkey, value)pairs
54 CSEP544 - Fall 2015
![Page 55: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/55.jpg)
Step 1: the MAP Phase
User provides the MAP-function: • Input: (input key, value) • Ouput:
bag of (intermediate key, value)
System applies the map function in parallel to all (input key, value) pairs in the input file
55 CSEP544 - Fall 2015
![Page 56: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/56.jpg)
Step 2: the REDUCE Phase
User provides the REDUCE function: • Input: (intermediate key, bag of values)
• Output: bag of output (values) System groups all pairs with the same
intermediate key, and passes the bag of values to the REDUCE function
56 CSEP544 - Fall 2015
![Page 57: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/57.jpg)
Example
• Counting the number of occurrences of each word in a large collection of documents
• Each Document – The key = document id (did) – The value = set of words (word)
map(String key, String value): // key: document name // value: document contents for each word w in value:
EmitIntermediate(w, “1”);
reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values:
result += ParseInt(v); Emit(AsString(result));
![Page 58: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/58.jpg)
MAP REDUCE
(Bob,1)
(the,1)
(Bob,1)
…
(of,1)
(to,1)
…
(did1,v1)
(did2,v2)
(did3,v3)
. . . .
(of, (1,1,1,…,1))
(the, (1,1,…))
(Bob,(1…))
…
…
…
…
(of, 25)
(the, 77)
(Bob, 12)
…
…
…
…
Shuffle
58
![Page 59: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/59.jpg)
Jobs v.s. Tasks
What is the difference? • A MapReduce Job
– One single “query”, e.g. count the words in all docs – More complex queries may consists of multiple jobs
• A Map Task, or a Reduce Task – A group of instantiations of the map-, or reduce-
function, which are scheduled on a single worker
CSEP544 - Fall 2015 59
![Page 60: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/60.jpg)
Jobs v.s. Tasks
What is the difference? • A MapReduce Job
– One single “query”, e.g. count the words in all docs – More complex queries may consists of multiple jobs
• A Map Task, or a Reduce Task – A group of instantiations of the map-, or reduce-
function, which are scheduled on a single worker
CSEP544 - Fall 2015 60
![Page 61: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/61.jpg)
Map Tasks, Reduce Tasks
• What are they?
• How is their number determined?
• What are the pros/cons in having small/large number of tasks?
CSEP544 - Fall 2015 61
![Page 62: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/62.jpg)
MAP Tasks REDUCE Tasks
(Bob,1)
(the,1)
(Bob,1)
…
(of,1)
(to,1)
…
(Bob,1)
(did1,v1)
(did2,v2)
(did3,v3)
. . . .
(of, (1,1,1,…,1))
(the, (1,1,…))
(Bob,(1…))
…
…
…
…
(of, 25)
(the, 77)
(Bob, 12)
…
…
…
…
Shuffle
MapReduce Job
![Page 63: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/63.jpg)
MapReduce Execution Details
CSEP544 - Fall 2015 63
Map
(Shuffle)
Reduce
Datanotnecessarilylocal
Intermediatedatagoestolocaldisk
Outputtodisk,replicatedincluster
Filesystem:GFSorHDFS
Task
Task
![Page 64: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/64.jpg)
Workers
• A worker is a process that executes one task at a time
• Typically there is one worker per processor, hence 4 or 8 per node
CSEP544 - Fall 2015 64
![Page 65: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/65.jpg)
Localstorage`
MR Phases
• Each Map and Reduce task has multiple phases:
65 CSEP544 - Fall 2015
![Page 66: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/66.jpg)
Implementation
• There is one master node • Master partitions input file into M splits, by key • Master assigns workers (=servers) to the M
map tasks, keeps track of their progress • Workers write their output to local disk,
partition into R regions • Master assigns workers to the R reduce tasks • Reduce workers read regions from the map
workers’ local disks 66 CSEP544 - Fall 2015
![Page 67: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/67.jpg)
Interesting Implementation Details
Worker failure:
• Master pings workers periodically,
• If down then reassigns the task to another worker
67 CSEP544 - Fall 2015
![Page 68: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/68.jpg)
Interesting Implementation Details Backup tasks: • Straggler = a machine that takes unusually
long time to complete one of the last tasks. Eg: – Bad disk forces frequent correctable errors
(30MB/s à 1MB/s) – The cluster scheduler has scheduled other tasks
on that machine • Stragglers are a main reason for slowdown • Solution: pre-emptive backup execution of
the last few remaining in-progress tasks
68 CSEP544 - Fall 2015
![Page 69: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/69.jpg)
MapReduce Summary
• Hides scheduling and parallelization details
• However, very limited queries – Difficult to write more complex queries – Need multiple MapReduce jobs
• Solution: declarative query language – PigLatin, Dremel (SQL), HiveQL (SQL)
69 CSEP544 - Fall 2015
![Page 70: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/70.jpg)
Hash Join in MR map([String key], String value):
// value.relation is either ‘User’ or ‘Page’
reduce(String user, Iterator values): User = empty; Page = empty; Relying entirely on
the MR system to do the hashing
User(name, age) ⋈ Page(user, url)
![Page 71: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/71.jpg)
Hash Join in MR map([String key], String value):
// value.relation is either ‘User’ or ‘Page’ if value.relation=‘User’:
EmitIntermediate(value.name, (1, value)); else // value.relation=‘Page’:
EmitIntermediate(value.user, (2, value));
reduce(String user, Iterator values): User = empty; Page = empty; for each v in values:
if v.type = 1: User.insert(v) else Page.insert(v); foreach v1 in User, v2 in Page
Emit(v1,v2);
Relying entirely on the MR system to do the hashing
User(name, age) ⋈name=user Page(user, url)
![Page 72: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/72.jpg)
Hash Join in MR
Controlling the hash function
reduce(String user, Iterator values): User = empty; Page = empty; foreach v in values:
if v.type = 1: User.insert(v) else Page.insert(v); foreach v1 in User, v2 in Page if v1.name=v2.user: Emit(v1,v2);
map([String key], String value): // value.relation is either ‘User’ or ‘Page’ if value.relation=‘User’:
EmitIntermediate(h(value.name), (1, value)); else // value.relation=‘Page’:
EmitIntermediate(h(value.user), (2, value));
User(name, age) ⋈name=user Page(user, url)
![Page 73: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/73.jpg)
Broadcast Join in MR Assume Page is huge, User is smaller Broadcast join does not shuffle Page; instead broadcasts User Sketch the Map and Reduce functions (in class):
User(name, age) ⋈name=user Page(user, url)
![Page 74: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/74.jpg)
Transactions
CSEP544 - Fall 2015 74
![Page 75: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/75.jpg)
75
Outline
• Transaction basics
• Recovery – Start today, continue next week
• Concurrency control
CSEP544 - Fall 2015
![Page 76: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/76.jpg)
Reading Material for Lectures 6/7
Textbook (Ramakrishnan): Ch. 16, 17, 18 Second textbook (Garcia-Molina) • Ch. 17.2, 17.3, 17.4 • Ch. 18.1, 18.2, 18.3, 18.8, 18.9
Optional: M. Franklin, Concurrency Control and Recovery
![Page 77: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/77.jpg)
Transaction
CSEP544 - Fall 2015 77
BEGIN TRANSACTION [SQL statements] COMMIT or ROLLBACK (=ABORT)
May be omitted: first SQL query
starts txn
In ad-hoc SQL: each statement = one transaction
Definition: a transaction is a sequence of updates to the database with the property that either all complete, or none completes (all-or-nothing).
![Page 78: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/78.jpg)
Implementing Transactions • System crash
– Software failure (e.g. division by 0 – Hardware failure (e.g power failure)
• Interferences with other users – “Anomalies” – 3 have famous names
78 CSEP544 - Fall 2015
![Page 79: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/79.jpg)
79
System Crash
Client 1: BEGIN TRANSACTION UPDATE Account1 SET balance= balance – 500 UPDATE Account2 SET balance = balance + 500 COMMIT
Crash !
CSEP544 - Fall 2015
![Page 80: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/80.jpg)
80
1st Famous Anomaly: Lost Update
Client 1: BEGIN TRANSACTION UPDATE Account1 SET balance= balance+$100 COMMIT
Client 2: BEGIN TRANSACTION UPDATE Account1 SET balance=balance-$100 COMMIT
Lost update: two TXN’s update the same element, but only one succeeds.
![Page 81: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/81.jpg)
81
2nd Famous Anomaly: Inconsistent Read
Client 1: transfer $100 BEGIN TRANSACTION UPDATE Account1 SET balance= balance+$100 UPDATE Account2 SET balance= balance+$100 COMMIT
Client 2: check total balance BEGIN TRANSACTION SELECT sum(balance) FROM All_Accounts COMMIT
Inconsistent read: TXN sees some updates by another TXN, but not all updates.
![Page 82: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/82.jpg)
3rd Famous Anomaly: Dirty Reads
Dirty read: TXN reads a value written by another transaction that later aborts.
-- Client 1: BEGIN TRANSACTION UPDATE Account1 SET balance= balance+$100 . . . ROLLBACK
-- Client 2: get cash $100 BEGIN TRANSACTION X = Account1.balance If (X>=100) { …dispense money… COMMIT } . . .
![Page 83: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/83.jpg)
83
ACID Properties • Atomic
– State shows either all the effects of txn, or none of them
• Consistent – Txn moves from a state where integrity holds, to
another where integrity holds • Isolated
– Effect of txns is the same as txns running one after another (ie looks like batch mode)
• Durable – Once a txn has committed, its effects remain in the
database CSEP544 - Fall 2015
![Page 84: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/84.jpg)
Outline
• Recovery from failures (the A in ACID) – Today and next week
• Concurrency Control (the C in ACID) – Next week
CSEP544 - Fall 2015 84
![Page 85: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/85.jpg)
85
Log-based Recovery
Basics (based on Garcia-Molina Ch. 17.2, 17.3, 17.4)
• Undo logging • Redo logging Aries: (Ramakrishnan Ch. 18)
CSEP544 - Fall 2015
![Page 86: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/86.jpg)
86
Transaction Abstraction
• Database is composed of elements.
• 1 element can be either: – 1 page = physical logging – 1 record = logical logging
• Aries uses both (will discuss later)
CSEP544 - Fall 2015
![Page 87: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/87.jpg)
87
Primitive Operations of Transactions
• READ(X,t) – copy element X to transaction local variable t
• WRITE(X,t) – copy transaction local variable t to element X
• INPUT(X) – read element X to memory buffer
• OUTPUT(X) – write element X to disk
CSEP544 - Fall 2015
![Page 88: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/88.jpg)
88
Running Example
Initially, A=B=8. Atomicity requires that either (1) T commits and A=B=16, or (2) T does not commit and A=B=8.
CSEP544 - Fall 2015
BEGIN TRANSACTION READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t) COMMIT;
![Page 89: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/89.jpg)
89
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Buffer pool Disk Transaction
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t)
![Page 90: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/90.jpg)
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Is this bad ?
Crash !
![Page 91: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/91.jpg)
Is this bad ?
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Yes it’s bad: A=16, B=8….
Crash !
![Page 92: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/92.jpg)
Is this bad ?
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT Crash !
![Page 93: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/93.jpg)
Is this bad ?
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Yes it’s bad: A=B=16, but not committed
Crash !
![Page 94: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/94.jpg)
Is this bad ?
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Crash !
![Page 95: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/95.jpg)
Is this bad ?
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
No: that’s OK
Crash !
![Page 96: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/96.jpg)
Typically, OUTPUT is after COMMIT (why?)
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
![Page 97: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/97.jpg)
Typically, OUTPUT is after COMMIT (why?)
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
Crash !
![Page 98: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/98.jpg)
Atomic Transactions
• FORCE or NO-FORCE – Should all updates of a transaction be forced to
disk before the transaction commits? • STEAL or NO-STEAL
– Can an update made by an uncommitted transaction overwrite the most recent committed value of a data item on disk?
CSEP544 - Fall 2015 98
![Page 99: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/99.jpg)
Force/No-steal
• FORCE: Pages of committed transactions must be forced to disk before commit
• NO-STEAL: Pages of uncommitted transactions cannot be written to disk
CSEP544 - Fall 2015 99
Easy to implement (how?) and ensures atomicity
![Page 100: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/100.jpg)
No-Force/Steal
• NO-FORCE: Pages of committed transactions need not be written to disk
• STEAL: Pages of uncommitted transactions may be written to disk
CSEP544 - Fall 2015 100
In either case, Atomicity is violated; need WAL
![Page 101: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/101.jpg)
101
Write-Ahead Log The Log: append-only file containing log records • Records every single action of every TXN • Force log entry to disk • After a system crash, use log to recover Three types: UNDO, REDO, UNDO-REDO
CSEP544 - Fall 2015
![Page 102: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/102.jpg)
UNDO Log
CSEP544 - Fall 2015 102
FORCE and STEAL
![Page 103: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/103.jpg)
103
Undo Logging Log records • <START T>
– transaction T has begun • <COMMIT T>
– T has committed • <ABORT T>
– T has aborted • <T,X,v>
– T has updated element X, and its old value was v
CSEP544 - Fall 2015
![Page 104: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/104.jpg)
104
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
![Page 105: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/105.jpg)
105 WHAT DO WE DO ?
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
Crash !
![Page 106: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/106.jpg)
106 WHAT DO WE DO ?
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
We UNDO by setting B=8 and A=8
Crash !
![Page 107: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/107.jpg)
107
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
What do we do now ? Crash !
![Page 108: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/108.jpg)
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
What do we do now ? Crash ! Nothing: log contains COMMIT
![Page 109: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/109.jpg)
109
Recovery with Undo Log … … <T6,X6,v6> … … <START T5> <START T4> <T1,X1,v1> <T5,X5,v5> <T4,X4,v4> <COMMIT T5> <T3,X3,v3> <T2,X2,v2>
Question1: Which updates are undone ? Question 2: How far back do we need to read in the log ? Question 3: What happens if there is a second crash, during recovery ?
Crash !
![Page 110: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/110.jpg)
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
When must we force pages to disk ?
![Page 111: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/111.jpg)
111
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
RULES: log entry before OUTPUT before COMMIT
FORCE
![Page 112: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/112.jpg)
112
Undo-Logging Rules
U1: If T modifies X, then <T,X,v> must be written to disk before OUTPUT(X)
U2: If T commits, then OUTPUT(X) must
be written to disk before <COMMIT T> • Hence: OUTPUTs are done early,
before the transaction commits CSEP544 - Fall 2015
FORCE
![Page 113: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/113.jpg)
REDO Log
CSEP544 - Fall 2015 113
NO-FORCE and NO-STEAL
![Page 114: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/114.jpg)
114
Action t Mem A Mem B Disk A Disk B
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
Is this bad ?
Crash !
![Page 115: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/115.jpg)
115
Action t Mem A Mem B Disk A Disk B
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
Is this bad ? Yes, it’s bad: A=16, B=8
Crash !
![Page 116: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/116.jpg)
116
Action t Mem A Mem B Disk A Disk B
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
Is this bad ?
Crash !
![Page 117: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/117.jpg)
117
Action t Mem A Mem B Disk A Disk B
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
Is this bad ?
Crash !
Yes, it’s bad: lost update
![Page 118: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/118.jpg)
118
Action t Mem A Mem B Disk A Disk B
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
Is this bad ?
Crash !
![Page 119: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/119.jpg)
119
Action t Mem A Mem B Disk A Disk B
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
Is this bad ? No: that’s OK.
Crash !
![Page 120: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/120.jpg)
120
Redo Logging
One minor change to the undo log:
• <T,X,v>= T has updated element X, and its new value is v
CSEP544 - Fall 2015
![Page 121: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/121.jpg)
121
Action t Mem A Mem B Disk A Disk B REDO Log
<START T>
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,16>
COMMIT <COMMIT T>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
![Page 122: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/122.jpg)
122
Action t Mem A Mem B Disk A Disk B REDO Log
<START T>
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,16>
COMMIT <COMMIT T>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
How do we recover ?
Crash !
![Page 123: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/123.jpg)
123
Action t Mem A Mem B Disk A Disk B REDO Log
<START T>
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,16>
COMMIT <COMMIT T>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
How do we recover ? We REDO by setting A=16 and B=16
Crash !
![Page 124: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/124.jpg)
124
Recovery with Redo Log <START T1> <T1,X1,v1> <START T2> <T2, X2, v2> <START T3> <T1,X3,v3> <COMMIT T2> <T3,X4,v4> <T1,X5,v5>
CSEP544 - Fall 2015
Show actions during recovery
Crash !
![Page 125: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/125.jpg)
125
Action t Mem A Mem B Disk A Disk B REDO Log
<START T>
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,16>
COMMIT <COMMIT T>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
When must we force pages to disk ?
![Page 126: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/126.jpg)
126
Action t Mem A Mem B Disk A Disk B REDO Log
<START T>
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,16>
COMMIT <COMMIT T>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
RULE: OUTPUT after COMMIT
NO-STEAL
![Page 127: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/127.jpg)
127
Redo-Logging Rules
R1: If T modifies X, then both <T,X,v> and <COMMIT T> must be written to disk before OUTPUT(X)
• Hence: OUTPUTs are done late
CSEP544 - Fall 2015
NO-STEAL
![Page 128: CSEP 544: Lecture 06 · CSEP544 - Fall 2015 4 . Announcement • Homework 3 (AWS) due this Friday! ... Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters](https://reader034.fdocuments.us/reader034/viewer/2022050604/5fab53968bff9b491d26010a/html5/thumbnails/128.jpg)
128
Comparison Undo/Redo
• Undo logging: OUTPUT must be done early: – Inefficient
• Redo logging: OUTPUT must be done late: – Inflexible
CSEP544 - Fall 2015