Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example...
Transcript of Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example...
![Page 1: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/1.jpg)
Database Management SystemsCSEP 544
Lecture 7: Parallel Data processing
Conceptual Design
1CSEP 544 - Fall 2017
![Page 2: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/2.jpg)
2
Class overview• Data models
– Relational: SQL, RA, and Datalog– NoSQL: SQL++
• RDMBS internals– Query processing and optimization– Physical design
• Parallel query processing– Spark and Hadoop
• Conceptual design– E/R diagrams– Schema normalization
• Transactions– Locking and schedules– Writing DB applications
CSEP 544 - Fall 2017
Data models
UsingDBMS
Query Processing
![Page 3: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/3.jpg)
Parallel Data Processing @ 1990
CSEP 544 - Fall 2017 3
![Page 4: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/4.jpg)
CSEP 544 - Fall 2017 4
Data: R(A, B), S(C, D)Query: R(A,B) ⋈B=C S(C,D)
Broadcast Join
R1 R2 RP. . .
R’1, S R’2, S R’P, S. . .
Reshuffle R on R.B
Broadcast S
S
Why would you want to do this?
![Page 5: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/5.jpg)
Parallel Execution of RA Operators:Partitioned Hash-Join
• Data: R(K1, A, B), S(K2, B, C)• Query: R(K1, A, B) ⋈ S(K2, B, C)
– Initially, both R and S are partitioned on K1 and K2
5
R1, S1 R2, S2 RP, SP . . .
R’1, S’1 R’2, S’2 R’P, S’P . . .
Reshuffle R on R.Band S on S.B
Each server computesthe join locally
CSEP 544 - Fall 2017
![Page 6: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/6.jpg)
HyperCube Join• Have P number of servers (say P=27 or P=1000)• How do we compute this Datalog query in one step?
Q(x,y,z) = R(x,y),S(y,z),T(z,x)• Organize the P servers into a cube with side P⅓
– Thus, each server is uniquely identified by (i,j,k), i,j,k≤P⅓
• Step 1:– Each server sends R(x,y) to all servers (h(x),h(y),*)– Each server sends S(y,z) to all servers (*,h(y),h(z))– Each server sends T(x,z) to all servers (h(x),*,h(z))
• Final output:– Each server (i,j,k) computes the query R(x,y),S(y,z),T(z,x) locally
• Analysis: each tuple R(x,y) is replicated at most P⅓ times
i
j
CSEP 544 - Fall 2017 6
![Page 7: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/7.jpg)
Parallel Data Processing @ 2000
CSEP 544 - Fall 2017 7
![Page 8: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/8.jpg)
Example
• Counting the number of occurrences of each word in a large collection of documents
• Each Document– The key = document id (did)– The value = set of words (word)
map(String key, String value):// key: document name// value: document contentsfor each word w in value:
EmitIntermediate(w, “1”);8
reduce(String key, Iterator values):// key: a word// values: a list of countsint result = 0;for each v in values:
result += ParseInt(v);Emit(AsString(result));
![Page 9: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/9.jpg)
Interesting Implementation Details
Worker failure:
• Master pings workers periodically,
• If down then reassigns the task to another worker
CSEP 544 - Fall 2017 9
![Page 10: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/10.jpg)
Interesting Implementation Details
Backup tasks:• Straggler = a machine that takes unusually long
time to complete one of the last tasks. E.g.:– Bad disk forces frequent correctable errors (30MB/s à
1MB/s)– The cluster scheduler has scheduled other tasks on
that machine• Stragglers are a main reason for slowdown• Solution: pre-emptive backup execution of the
last few remaining in-progress tasks
CSEP 544 - Fall 2017 10
![Page 11: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/11.jpg)
Straggler Example
CSEP 544 - Fall 2017 11
time
Worker 3
Worker 2
Worker 1
Straggler
Backup execution
Killed
Killed
![Page 12: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/12.jpg)
Using MapReduce in Practice:
Implementing RA Operators in MR
![Page 13: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/13.jpg)
Relational Operators in MapReduce
Given relations R(A,B) and S(B, C) compute:
• Selection: σA=123(R)
• Group-by: γA,sum(B)(R)
• Join: R ⋈ S
CSEP 544 - Fall 2017 13
![Page 14: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/14.jpg)
Selection σA=123(R)
14
map(String value):if value.A = 123:
EmitIntermediate(value.key, value);
reduce(String k, Iterator values):for each v in values:
Emit(v);
![Page 15: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/15.jpg)
Selection σA=123(R)
15
map(String value):if value.A = 123:
EmitIntermediate(value.key, value);
reduce(String k, Iterator values):for each v in values:
Emit(v);No need for reduce.But need system hacking in Hadoopto remove reduce from MapReduce
![Page 16: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/16.jpg)
Group By γA,sum(B)(R)
16
map(String value):EmitIntermediate(value.A, value.B);
reduce(String k, Iterator values):s = 0for each v in values:
s = s + vEmit(k, s);
![Page 17: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/17.jpg)
Join
Two simple parallel join algorithms:
• Partitioned hash-join (we saw it, will recap)
• Broadcast join
CSEP 544 - Fall 2017 17
![Page 18: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/18.jpg)
Partitioned Hash-Join
CSEP 544 - Fall 2017 18
R1, S1 R2, S2 RP, SP . . .
R’1, S’1 R’2, S’2 R’P, S’P . . .
Reshuffle R on R.Band S on S.B
Each server computesthe join locally
Initially, both R and S are horizontally partitioned
R(A,B) ⋈B=C S(C,D)
![Page 19: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/19.jpg)
Partitioned Hash-Join
19
map(String value):case value.relationName of
‘R’: EmitIntermediate(value.B, (‘R’, value));‘S’: EmitIntermediate(value.C, (‘S’, value));
reduce(String k, Iterator values):R = empty; S = empty;for each v in values:
case v.type of:‘R’: R.insert(v)‘S’: S.insert(v);
for v1 in R, for v2 in SEmit(v1,v2);
R(A,B) ⋈B=C S(C,D)
![Page 20: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/20.jpg)
Broadcast Join
CSEP 544 - Fall 2017 20
R1 R2 RP. . .
R’1, S R’2, S R’P, S. . .
Reshuffle R on R.B
Broadcast S
S
R(A,B) ⋈B=C S(C,D)
![Page 21: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/21.jpg)
Broadcast Join
21
map(String value):open(S); /* over the network */hashTbl = new()for each w in S:
hashTbl.insert(w.B, w)close(S);
for each v in value:for each w in hashTbl.find(v.B)
Emit(v,w); reduce(…):/* empty: map-side only */
map should readseveral records of R:value = some group
of records
Read entire table S,build a Hash Table
R(A,B) ⋈B=C S(C,D)
![Page 22: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/22.jpg)
HW6
• HW6 will ask you to write SQL queries and MapReduce tasks using Spark
• You will get to “implement” SQL using MapReduce tasks– Can you beat Spark’s implementation?
![Page 23: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/23.jpg)
Conclusions
• MapReduce offers a simple abstraction, and handles distribution + fault tolerance
• Speedup/scaleup achieved by allocating dynamically map tasks and reduce tasks to available server. However, skew is possible (e.g., one huge reduce task)
• Writing intermediate results to disk is necessary for fault tolerance, but very slow.
• Spark replaces this with “Resilient Distributed Datasets” = main memory + lineage
![Page 24: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/24.jpg)
SparkA Case Study of the MapReduce
Programming Paradigm
CSEP 544 - Fall 2017 24
![Page 25: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/25.jpg)
Parallel Data Processing @ 2010
CSEP 544 - Fall 2017 25
![Page 26: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/26.jpg)
Issues with MapReduce
• Difficult to write more complex queries
• Need multiple MapReduce jobs: dramatically slows down because it writes all results to disk
26CSEP 544 - Fall 2017
![Page 27: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/27.jpg)
Spark
• Open source system from UC Berkeley• Distributed processing over HDFS• Differences from MapReduce:
– Multiple steps, including iterations– Stores intermediate results in main memory– Closer to relational algebra (familiar to you)
• Details: http://spark.apache.org/examples.html
![Page 28: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/28.jpg)
Spark• Spark supports interfaces in Java, Scala, and
Python– Scala: extension of Java with functions/closures
• We will illustrate use the Spark Java interface in this class
• Spark also supports a SQL interface (SparkSQL), and compiles SQL to its native Java interface
CSEP 544 - Fall 2017 28
![Page 29: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/29.jpg)
Resilient Distributed Datasets• RDD = Resilient Distributed Datasets
– A distributed, immutable relation, together with its lineage
– Lineage = expression that says how that relation was computed = a relational algebra plan
• Spark stores intermediate results as RDD• If a server crashes, its RDD in main memory
is lost. However, the driver (=master node) knows the lineage, and will simply recomputethe lost partition of the RDD
CSEP 544 - Fall 2017 29
![Page 30: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/30.jpg)
Programming in Spark• A Spark program consists of:
– Transformations (map, reduce, join…). Lazy– Actions (count, reduce, save...). Eager
• Eager: operators are executed immediately
• Lazy: operators are not executed immediately– A operator tree is constructed in memory instead– Similar to a relational algebra tree
• What are the benefits of lazy execution?
![Page 31: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/31.jpg)
The RDD Interface
![Page 32: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/32.jpg)
Programming in Spark
• RDD<T> = an RDD collection of type T– Partitioned, recoverable (through lineage), not
nested
• Seq<T> = a sequence– Local to a server, may be nested
![Page 33: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/33.jpg)
ExampleGiven a large log file hdfs://logfile.logretrieve all lines that:• Start with “ERROR”• Contain the string “sqlite”
s = SparkSession.builder()...getOrCreate();
lines = s.read().textFile(“hdfs://logfile.log”);
errors = lines.filter(l -> l.startsWith(“ERROR”));
sqlerrors = errors.filter(l -> l.contains(“sqlite”));
sqlerrors.collect();
![Page 34: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/34.jpg)
ExampleGiven a large log file hdfs://logfile.logretrieve all lines that:• Start with “ERROR”• Contain the string “sqlite”
s = SparkSession.builder()...getOrCreate();
lines = s.read().textFile(“hdfs://logfile.log”);
errors = lines.filter(l -> l.startsWith(“ERROR”));
sqlerrors = errors.filter(l -> l.contains(“sqlite”));
sqlerrors.collect();
lines, errors, sqlerrorshave type JavaRDD<String>
![Page 35: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/35.jpg)
s = SparkSession.builder()...getOrCreate();
lines = s.read().textFile(“hdfs://logfile.log”);
errors = lines.filter(l -> l.startsWith(“ERROR”));
sqlerrors = errors.filter(l -> l.contains(“sqlite”));
sqlerrors.collect();
TransformationsNot executed yet…TransformationsNot executed yet…Transformation:Not executed yet…
Action:triggers executionof entire program
Given a large log file hdfs://logfile.logretrieve all lines that:• Start with “ERROR”• Contain the string “sqlite”
Example
![Page 36: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/36.jpg)
errors = lines.filter(l -> l.startsWith(“ERROR”));
Recall: anonymous functions (lambda expressions) starting in Java 8
Example
class FilterFn implements Function<Row, Boolean>{ Boolean call (Row r) { return r.startsWith(“ERROR”); }
}
errors = lines.filter(new FilterFn());
is the same as:
![Page 37: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/37.jpg)
s = SparkSession.builder()...getOrCreate();
sqlerrors = s.read().textFile(“hdfs://logfile.log”).filter(l -> l.startsWith(“ERROR”)).filter(l -> l.contains(“sqlite”)).collect();
Given a large log file hdfs://logfile.logretrieve all lines that:• Start with “ERROR”• Contain the string “sqlite”
Example
“Call chaining” style
![Page 38: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/38.jpg)
MapReduce Again…
Steps in Spark resemble MapReduce:• col.filter(p) applies in parallel the
predicate p to all elements x of the partitioned collection, and returns collection with those x where p(x) = true
• col.map(f) applies in parallel the function f to all elements x of the partitioned collection, and returns a new partitioned collection
38
![Page 39: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/39.jpg)
Persistence
If any server fails before the end, then Spark must restart
lines = s.read().textFile(“hdfs://logfile.log”);errors = lines.filter(l->l.startsWith(“ERROR”));sqlerrors = errors.filter(l->l.contains(“sqlite”));sqlerrors.collect();
![Page 40: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/40.jpg)
lines = s.read().textFile(“hdfs://logfile.log”);errors = lines.filter(l->l.startsWith(“ERROR”));sqlerrors = errors.filter(l->l.contains(“sqlite”));sqlerrors.collect();
Persistence
If any server fails before the end, then Spark must restart
hdfs://logfile.log
result
filter(...startsWith(“ERROR”)filter(...contains(“sqlite”)
RDD:
![Page 41: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/41.jpg)
Persistence
If any server fails before the end, then Spark must restart
hdfs://logfile.log
result
filter(...startsWith(“ERROR”)filter(...contains(“sqlite”)
Spark can recompute the result from errors
RDD:
lines = s.read().textFile(“hdfs://logfile.log”);errors = lines.filter(l->l.startsWith(“ERROR”));sqlerrors = errors.filter(l->l.contains(“sqlite”));sqlerrors.collect();
lines = s.read().textFile(“hdfs://logfile.log”);errors = lines.filter(l->l.startsWith(“ERROR”));errors.persist();sqlerrors = errors.filter(l->l.contains(“sqlite”));sqlerrors.collect()
New RDD
![Page 42: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/42.jpg)
Persistence
If any server fails before the end, then Spark must restart
hdfs://logfile.log
result
Spark can recompute the result from errors
hdfs://logfile.log
errors
filter(..startsWith(“ERROR”)
result
filter(...contains(“sqlite”)
RDD:
filter(...startsWith(“ERROR”)filter(...contains(“sqlite”)
lines = s.read().textFile(“hdfs://logfile.log”);errors = lines.filter(l->l.startsWith(“ERROR”));errors.persist();sqlerrors = errors.filter(l->l.contains(“sqlite”));sqlerrors.collect()
New RDD
lines = s.read().textFile(“hdfs://logfile.log”);errors = lines.filter(l->l.startsWith(“ERROR”));sqlerrors = errors.filter(l->l.contains(“sqlite”));sqlerrors.collect();
![Page 43: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/43.jpg)
Example
44
SELECT count(*) FROM R, SWHERE R.B > 200 and S.C < 100 and R.A = S.A
R(A,B)S(A,C)
R = s.read().textFile(“R.csv”).map(parseRecord).persist();S = s.read().textFile(“S.csv”).map(parseRecord).persist();
Parses each line into an object
persisting on disk
![Page 44: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/44.jpg)
Example
45
SELECT count(*) FROM R, SWHERE R.B > 200 and S.C < 100 and R.A = S.A
R(A,B)S(A,C)
R = s.read().textFile(“R.csv”).map(parseRecord).persist();S = s.read().textFile(“S.csv”).map(parseRecord).persist();RB = R.filter(t -> t.b > 200).persist();SC = S.filter(t -> t.c < 100).persist();J = RB.join(SC).persist();J.count();
R
RB
filter((a,b)->b>200)
S
SC
filter((b,c)->c<100)
J
join
action
transformationstransformations
![Page 45: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/45.jpg)
Recap: Programming in Spark
• A Spark/Scala program consists of:– Transformations (map, reduce, join…). Lazy– Actions (count, reduce, save...). Eager
• RDD<T> = an RDD collection of type T– Partitioned, recoverable (through lineage), not
nested• Seq<T> = a sequence
– Local to a server, may be nested
![Page 46: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/46.jpg)
Transformations:map(f : T -> U): RDD<T> -> RDD<U>
flatMap(f: T -> Seq(U)): RDD<T> -> RDD<U>
filter(f:T->Bool): RDD<T> -> RDD<T>
groupByKey(): RDD<(K,V)> -> RDD<(K,Seq[V])>
reduceByKey(F:(V,V)-> V): RDD<(K,V)> -> RDD<(K,V)>
union(): (RDD<T>,RDD<T>) -> RDD<T>
join(): (RDD<(K,V)>,RDD<(K,W)>) -> RDD<(K,(V,W))>
cogroup(): (RDD<(K,V)>,RDD<(K,W)>)-> RDD<(K,(Seq<V>,Seq<W>))>
crossProduct(): (RDD<T>,RDD<U>) -> RDD<(T,U)>
Actions:count(): RDD<T> -> Longcollect(): RDD<T> -> Seq<T>reduce(f:(T,T)->T): RDD<T> -> Tsave(path:String): Outputs RDD to a storage system e.g., HDFS
![Page 47: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/47.jpg)
Spark 2.0
The DataFrame and Dataset Interfaces
![Page 48: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/48.jpg)
DataFrames• Like RDD, also an immutable distributed
collection of data
• Organized into named columns rather than individual objects– Just like a relation– Elements are untyped objects called Row’s
• Similar API as RDDs with additional methods– people = spark.read().textFile(…);
ageCol = people.col(“age”);ageCol.plus(10); // creates a new DataFrame
![Page 49: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/49.jpg)
Datasets• Similar to DataFrames, except that elements must be typed
objects
• E.g.: Dataset<People> rather than Dataset<Row>
• Can detect errors during compilation time
• DataFrames are aliased as Dataset<Row> (as of Spark 2.0)
• You will use both Datasets and RDD APIs in HW6
![Page 50: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/50.jpg)
Datasets API: Sample Methods• Functional API
– agg(Column expr, Column... exprs)Aggregates on the entire Dataset without groups.
– groupBy(String col1, String... cols)Groups the Dataset using the specified columns, so that we can run aggregation on them.
– join(Dataset<?> right)Join with another DataFrame.
– orderBy(Column... sortExprs)Returns a new Dataset sorted by the given expressions.
– select(Column... cols)Selects a set of column based expressions.
• “SQL” API– SparkSession.sql(“select * from R”);
• Look familiar?
![Page 51: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/51.jpg)
An Example Application
![Page 52: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/52.jpg)
PageRank
• Page Rank is an algorithm that assigns to each page a score such that pages have higher scores if more pages with high scores link to them
• Page Rank was introduced by Google, and, essentially, defined Google
CSEP 544 - Fall 2017 53
![Page 53: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/53.jpg)
PageRank
CSEP 544 - Fall 2017 54
PageRank toy example
A B C
.33 .33 .33
.17.17
.33.17Superstep 0
.17
.17 .50 .34
.09.09
.34.25Superstep 1
.25
.25 .43 .34
.13.13
.34.22Superstep 2
.22
Input graph
http://www.slideshare.net/sscdotopen/large-scale/20
![Page 54: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/54.jpg)
PageRank
55
for i = 1 to n:r[i] = 1/n
repeatfor j = 1 to n: contribs[j] = 0for i = 1 to n:
k = links[i].length()for j in links[i]:
contribs[j] += r[i] / kfor i = 1 to n: r[i] = contribs[i]
until convergence/* usually 10-20 iterations */
Random walk interpretation:
Start at a random node iAt each step, randomly choosean outgoing link and follow it.
Repeat for a very long time
r[i] = prob. that we are at node i
![Page 55: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/55.jpg)
PageRank
56
for i = 1 to n:r[i] = 1/n
repeatfor j = 1 to n: contribs[j] = 0for i = 1 to n:
k = links[i].length()for j in links[i]:
contribs[j] += r[i] / kfor i = 1 to n: r[i] = contribs[i]
until convergence/* usually 10-20 iterations */
r[i] = a/N + (1-a)*contribs[i]
where a ∈(0,1)is the restartprobability
Random walk interpretation:
Start at a random node iAt each step, randomly choosean outgoing link and follow it.
Improvement: with small prob. arestart at a random node.
![Page 56: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/56.jpg)
PageRank
57
for i = 1 to n:r[i] = 1/n
repeatfor j = 1 to n: contribs[j] = 0for i = 1 to n:
k = links[i].length()for j in links[i]:
contribs[j] += r[i] / kfor i = 1 to n: r[i] = a/N + (1-a)*contribs[i]
until convergence/* usually 10-20 iterations */
// spark
links = spark.read().textFile(..)...ranks = // RDD of (URL, 1/n) pairs
for (k = 1 to ITERATIONS) {
// Build RDD of (targetURL, float) pairs// with contributions sent by each pagecontribs = links.join(ranks).flatMap {
(url, lr) -> // lr: a (link, rank) pairlinks.map(dest ->
(dest, lr._2/outlinks.size))}
// Sum contributions by URL and get new ranksranks = contribs.reduceByKey((x,y) -> x+y)
.mapValues(sum -> a/n + (1-a)*sum)}
links: RDD<url:string, outlinks:SEQ<string>>ranks: RDD<url:string, rank:float>
![Page 57: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/57.jpg)
PageRank
58
for i = 1 to n:r[i] = 1/n
repeatfor j = 1 to n: contribs[j] = 0for i = 1 to n:
k = links[i].length()for j in links[i]:
contribs[j] += r[i] / kfor i = 1 to n: r[i] = a/N + (1-a)*contribs[i]
until convergence/* usually 10-20 iterations */
// spark
links = spark.read().textFile(..).map(...);ranks = // RDD of (URL, 1/n) pairs
for (k = 1 to ITERATIONS) {
// Build RDD of (targetURL, float) pairs// with contributions sent by each pagecontribs = links.join(ranks).flatMap {
(url, lr) -> // lr: a (link, rank) pairlinks.map(dest ->
(dest, lr._2/outlinks.size))}
// Sum contributions by URL and get new ranksranks = contribs.reduceByKey((x,y) -> x+y)
.mapValues(sum -> a/n + (1-a)*sum)}
links: RDD<url:string, outlinks:SEQ<string>>ranks: RDD<url:string, rank:float>
Key: url1, Value: ([outlink1, outlink2, …], rank1)
![Page 58: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/58.jpg)
PageRank
59
for i = 1 to n:r[i] = 1/n
repeatfor j = 1 to n: contribs[j] = 0for i = 1 to n:
k = links[i].length()for j in links[i]:
contribs[j] += r[i] / kfor i = 1 to n: r[i] = a/N + (1-a)*contribs[i]
until convergence/* usually 10-20 iterations */
// spark
links = spark.read().textFile(..)...ranks = // RDD of (URL, 1/n) pairs
for (k = 1 to ITERATIONS) {
// Build RDD of (targetURL, float) pairs// with contributions sent by each pagecontribs = links.join(ranks).flatMap {
(url, lr) -> // lr: a (link, rank) pairlinks.map(dest ->
(dest, lr._2/outlinks.size()))}
// Sum contributions by URL and get new ranksranks = contribs.reduceByKey((x,y) -> x+y)
.mapValues(sum -> a/n + (1-a)*sum)}
links: RDD<url:string, outlinks:SEQ<string>>ranks: RDD<url:string, rank:float>
Key: url1, Value: rank1/outlink1.size)
![Page 59: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/59.jpg)
Conclusions
• Parallel databases– Predefined relational operators– Optimization– Transactions
• MapReduce– User-defined map and reduce functions– Must implement/optimize manually relational ops– No updates/transactions
• Spark– Predefined relational operators– Must optimize manually– No updates/transactions
60
![Page 60: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/60.jpg)
Conceptual Design
CSEP 544 - Fall 2017 61
![Page 61: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/61.jpg)
62
Class overview• Data models
– Relational: SQL, RA, and Datalog– NoSQL: SQL++
• RDBMS internals– Query processing and optimization– Physical design
• Parallel query processing– Spark and Hadoop
• Conceptual design– E/R diagrams– Schema normalization
• Transactions– Locking and schedules– Writing DB applications
CSEP 544 - Fall 2017
Data models
UsingDBMS
Query Processing
![Page 62: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/62.jpg)
Database Design
What it is:• Starting from scratch, design the database
schema: relation, attributes, keys, foreign keys, constraints etc
Why it’s hard• The database will be in operation for a very
long time (years). Updating the schema while in production is very expensive (why?)
CSEP 544 - Fall 2017 63
![Page 63: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/63.jpg)
Database Design
• Consider issues such as:– What entities to model– How entities are related– What constraints exist in the domain
• Several formalisms exists– We discuss E/R diagrams– UML, model-driven architecture
• Reading: Sec. 4.1-4.6
CSEP 544 - Fall 2017 64
![Page 64: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/64.jpg)
Database Design Processcompanymakesproduct
name
price name address
Conceptual Model:
Relational Model:Tables + constraintsAnd also functional dep.
Normalization:Eliminates anomalies
Conceptual Schema
Physical SchemaPhysical storage details
![Page 65: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/65.jpg)
Entity / Relationship Diagrams
• Entity set = a class– An entity = an object
• Attribute
• Relationship
CSEP 544 - Fall 2017 66
Product
city
makes
![Page 66: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/66.jpg)
Person
CompanyProduct
buys
makes
employs
name CEO
price
address name ssn
address
name
67
![Page 67: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/67.jpg)
Keys in E/R Diagrams
• Every entity set must have a key
Product
name
price
CSEP 544 - Fall 2017 68
![Page 68: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/68.jpg)
What is a Relation ?
• A mathematical definition:– if A, B are sets, then a relation R is a subset of A X B
• A={1,2,3}, B={a,b,c,d},A X B = {(1,a),(1,b), . . ., (3,d)}R = {(1,a), (1,c), (3,b)}
• makes is a subset of Product X Company:
1
2
3
a
b
c
d
A=
B=
makes CompanyProduct
CSEP 544 - Fall 2017 69
![Page 69: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/69.jpg)
Multiplicity of E/R Relations
• one-one:
• many-one
• many-many
123
abcd
123
abcd
123
abcd
CSEP 544 - Fall 2017 70
![Page 70: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/70.jpg)
Person
CompanyProduct
buys
makes
employs
name CEO
price
address name ssn
address
name
71
What doesthis say ?
![Page 71: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/71.jpg)
Multi-way RelationshipsHow do we model a purchase relationship between buyers,products and stores?
Purchase
Product
Person
Store
Can still model as a mathematical set (How?)72As a set of triples ⊆ Person X Product X Store
![Page 72: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/72.jpg)
Q: What does the arrow mean ?
Arrows in Multiway Relationships
A: A given person buys a given product from at most one store
Purchase
Product
Person
Store
73
[Fine print: Arrow pointing to E means that if we select one entity from each of the other entity sets in the relationship, those entities are related to at most one entity in E]
CSEP 544 - Fall 2017
![Page 73: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/73.jpg)
Q: What does the arrow mean ?
Arrows in Multiway Relationships
A: A given person buys a given product from at most one storeAND every store sells to every person at most one product
Purchase
Product
Person
Store
CSEP 544 - Fall 2017 74
![Page 74: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/74.jpg)
Converting Multi-way Relationships to Binary
Purchase
Person
Store
Product
StoreOf
ProductOf
BuyerOf
date
Arrows go in which direction? 75
![Page 75: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/75.jpg)
Converting Multi-way Relationships to Binary
Purchase
Person
Store
Product
StoreOf
ProductOf
BuyerOf
date
Make sure you understand why! 76
![Page 76: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/76.jpg)
3. Design Principles
PurchaseProduct Person
What’s wrong?
President PersonCountry
Moral: Be faithful to the specifications of the application!
CSEP 544 - Fall 2017 77
![Page 77: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/77.jpg)
Design Principles:What’s Wrong?
Purchase
Product
Store
date
personNamepersonAddr
Moral: pick the rightkind of entities.
CSEP 544 - Fall 2017 78
![Page 78: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/78.jpg)
Design Principles:What’s Wrong?
Purchase
Product
Person
Store
dateDates
Moral: don’t complicate life morethan it already is.
79
![Page 79: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/79.jpg)
From E/R Diagramsto Relational Schema
• Entity set à relation• Relationship à relation
CSEP 544 - Fall 2017 80
![Page 80: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/80.jpg)
Entity Set to Relation
Product
prod-ID category
price
Product(prod-ID, category, price)
prod-ID category priceGizmo55 Camera 99.99Pokemn19 Toy 29.99 81
![Page 81: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/81.jpg)
N-N Relationships to Relations
Orders
prod-ID cust-ID
date
Shipment Shipping-Co
address
namedate
Represent this in relationsCSEP 544 - Fall 2017 82
![Page 82: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/82.jpg)
prod-ID cust-ID name date
Gizmo55 Joe12 UPS 4/10/2011
Gizmo55 Joe12 FEDEX 4/9/2011
N-N Relationships to Relations
Orders
prod-ID cust-ID
date
Shipment Shipping-Co
address
name
Orders(prod-ID,cust-ID, date)Shipment(prod-ID,cust-ID, name, date)Shipping-Co(name, address)
date
![Page 83: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/83.jpg)
N-1 Relationships to Relations
Orders
prod-ID cust-ID
date
Shipment Shipping-Co
address
namedate
Represent this in relationsCSEP 544 - Fall 2017 84
![Page 84: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/84.jpg)
N-1 Relationships to Relations
Orders
prod-ID cust-ID
date
Shipment Shipping-Co
address
name
Orders(prod-ID,cust-ID, date1, name, date2) Shipping-Co(name, address)
date
85Remember: no separate relations for many-one relationship
![Page 85: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/85.jpg)
Multi-way Relationships to Relations
Purchase
Product
Person
Storeprod-ID price
ssn name
name address
86
Purchase(prod-ID, ssn, name) CSEP 544 - Fall 2017
Try this at home!
![Page 86: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/86.jpg)
Modeling Subclasses
Some objects in a class may be special• define a new class• better: define a subclass
Products
Software products
Educational products
So --- we define subclasses in E/RCSEP 544 - Fall 2017 87
![Page 87: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/87.jpg)
Product
name category
price
isa isa
Educational ProductSoftware Product
Age Groupplatforms
Subclasses
CSEP 544 - Fall 2017
![Page 88: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/88.jpg)
Subclasses to Relations
Product
name category
price
isa isa
Educational ProductSoftware Product
Age Groupplatforms
Name Price Category
Gizmo 99 gadget
Camera 49 photo
Toy 39 gadget
Name platforms
Gizmo unix
Product
Sw.Product
Ed.Product
Other ways to convert are possibleCSEP 544 - Fall 2017 89
Name Age Group
Gizmo toddler
Toy retired
![Page 89: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/89.jpg)
Modeling Union Types with Subclasses
FurniturePiece
Person Company
Say: each piece of furniture is owned either by a person or by a company
CSEP 544 - Fall 2017 90
![Page 90: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/90.jpg)
Modeling Union Types with Subclasses
Say: each piece of furniture is owned either by a person or by a companySolution 1. Acceptable but imperfect (What’s wrong ?)
FurniturePiecePerson Company
ownedByPerson ownedByComp.
CSEP 544 - Fall 2017 91
![Page 91: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/91.jpg)
Modeling Union Types with Subclasses
Solution 2: better, more laborious
isa
FurniturePiece
Person CompanyownedBy
Owner
isa
CSEP 544 - Fall 2017 92
![Page 92: Database Management Systems CSEP 544 · 2017-11-21 · CSEP 544 - Fall 2017 10. Straggler Example CSEP 544 - Fall 2017 11 time Worker 3 Worker 2 Worker 1 Straggler Backup execution](https://reader034.fdocuments.us/reader034/viewer/2022042808/5f8b98c75f195334736d8a37/html5/thumbnails/92.jpg)
93
Weak Entity SetsEntity sets are weak when their key comes from otherclasses to which they are related.
UniversityTeam affiliation
numbersport name
Team(sport, number, universityName)University(name)
CSEP 544 - Fall 2017