Big Data Management and Analytics Assignment 8
Transcript of Big Data Management and Analytics Assignment 8
![Page 1: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/1.jpg)
DATABASESYSTEMSGROUP
Big Data Management and AnalyticsAssignment 8
1
![Page 2: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/2.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
Finding similar items
Suppose that the universal set is given by {1,β¦,10}. Construct minhashsignatures for the following sets:
(a) π1 = {3,6,9}
(b) π2 = {2,4,6,8}
(c) π3 = {2,3,4}
1. Construct the signatures for the sets using the following list ofpermutations:
β’ (1,2,3,4,5,6,7,8,9,10)
β’ (10,8,6,4,2,9,7,5,3,1)
β’ (4,7,2,9,1,5,3,10,6,8)
2
![Page 3: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/3.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
i. Create first a characteristic matrix foreach permutation
(a) Set one column with the shingles
(b) Set columns for each document
(c) Fill the document columns by setting a 1 where there is an occurence for each shingle, and 0 else
3
Element π1 π2 π3
1 0 0 0
2 0 1 1
3 1 0 1
4 0 1 1
5 0 0 0
6 1 1 0
7 0 0 0
8 0 1 0
9 1 0 0
10 0 0 0
![Page 4: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/4.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
i. Create first a characteristic matrix for each permutation
4
Element π1 π2 π3
1 0 0 0
2 0 1 1
3 1 0 1
4 0 1 1
5 0 0 0
6 1 1 0
7 0 0 0
8 0 1 0
9 1 0 0
10 0 0 0
Element π1 π2 π3
10 0 0 0
8 0 1 0
6 1 1 0
4 0 1 1
2 0 1 1
9 1 0 0
7 0 0 0
5 0 0 0
3 1 0 1
1 0 0 0
Element π1 π2 π3
4 0 1 1
7 0 0 0
2 0 1 1
9 1 0 0
1 0 0 0
5 0 1 0
3 1 0 1
10 0 1 0
6 1 1 0
8 0 1 0
1st permutation 2nd permutation 3rd permutation
![Page 5: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/5.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
ii. Compute the minhash for each permutation
5
Element π1 π2 π3
1 0 0 0
2 0 1 1
3 1 0 1
4 0 1 1
5 0 0 0
6 1 1 0
7 0 0 0
8 0 1 0
9 1 0 0
10 0 0 0
1st permutation
Select the first occurence of 1 per setand get the elements at which they canbe found
Which leads to the following minhash:β π1 = 3β π2 = 2β π3 = 2
![Page 6: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/6.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
ii. Compute the minhash for each permutation
6
The same procedure for the 2nd permutationβ¦
β π1 = 6β π2 = 8β π3 = 4
Element π1 π2 π3
10 0 0 0
8 0 1 0
6 1 1 0
4 0 1 1
2 0 1 1
9 1 0 0
7 0 0 0
5 0 0 0
3 1 0 1
1 0 0 0
2nd permutation
![Page 7: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/7.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
ii. Compute the minhash for each permutation
7
β¦and for the 3rd permutation
Which leads to the following minhash:β π1 = 9β π2 = 4β π3 = 4
Element π1 π2 π3
4 0 1 1
7 0 0 0
2 0 1 1
9 1 0 0
1 0 0 0
5 0 1 0
3 1 0 1
10 0 1 0
6 1 1 0
8 0 1 0
3rd permutation
This yields the following signatures:ππΌπΊ π1 = {3,6,9}ππΌπΊ π2 = {2,8,4}ππΌπΊ π3 = {2,4,4}
![Page 8: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/8.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
2. Instead of using the previously given permutations use hash functions:
β1 π₯ = π₯ πππ 10β2 π₯ = (2π₯ + 1) πππ 10β3 π₯ = (3π₯ + 2) πππ 10
8
![Page 9: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/9.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
2. Instead of using the previously given permutations use hash functions:
i. Set up a table:
9
Element πΊπ πΊπ πΊπ ππ(π) ππ(π) ππ(π)
1 0 0 0
2 0 1 1
3 1 0 1
4 0 1 1
5 0 0 0
6 1 1 0
7 0 0 0
8 0 1 0
9 1 0 0
10 0 0 0
![Page 10: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/10.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
ii. Compute the hash values (except for zero-rows):
10
Element πΊπ πΊπ πΊπ ππ(π) ππ(π) ππ(π)
1 0 0 0 - - -
2 0 1 1 2 5 8
3 1 0 1 3 7 1
4 0 1 1 4 9 4
5 0 0 0 - - -
6 1 1 0 6 3 0
7 0 0 0 - - -
8 0 1 0 8 7 6
9 1 0 0 9 9 9
10 0 0 0 - - -
![Page 11: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/11.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
iii. Create a table for all hash functions and sets and initialize them withinfinite distance:
11
πΊπ πΊπ πΊπ
ππ β β β
ππ β β β
ππ β β β
e πΊπ πΊπ πΊπ ππ(π) ππ(π) ππ(π)
1 0 0 0 - - -
2 0 1 1 2 5 8
3 1 0 1 3 7 1
4 0 1 1 4 9 4
5 0 0 0 - - -
6 1 1 0 6 3 0
7 0 0 0 - - -
8 0 1 0 8 7 6
9 1 0 0 9 9 9
10 0 0 0 - - -
![Page 12: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/12.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
iii. Create a table for all hash functions and sets and initialize them withinfinite distance:
12
πΊπ πΊπ πΊπ
ππ β 2 2
ππ β 5 5
ππ β 8 8
e πΊπ πΊπ πΊπ ππ(π) ππ(π) ππ(π)
1 0 0 0 - - -
2 0 1 1 2 5 8
3 1 0 1 3 7 1
4 0 1 1 4 9 4
5 0 0 0 - - -
6 1 1 0 6 3 0
7 0 0 0 - - -
8 0 1 0 8 7 6
9 1 0 0 9 9 9
10 0 0 0 - - -
Update of 2nd row:
Update only π2, π3!
π2:min β, 2 = 2min β, 5 = 5min β, 8 = 8
π3:min β, 2 = 2min β, 5 = 5min β, 8 = 8
![Page 13: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/13.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
iii. Create a table for all hash functions and sets and initialize them withinfinite distance:
13
πΊπ πΊπ πΊπ
ππ 3 2 2
ππ 7 5 5
ππ 1 8 1
e πΊπ πΊπ πΊπ ππ(π) ππ(π) ππ(π)
1 0 0 0 - - -
2 0 1 1 2 5 8
3 1 0 1 3 7 1
4 0 1 1 4 9 4
5 0 0 0 - - -
6 1 1 0 6 3 0
7 0 0 0 - - -
8 0 1 0 8 7 6
9 1 0 0 9 9 9
Update of 3rd row:
Update only π1, π3!
π1:min β, 3 = 3min β, 7 = 7min β, 1 = 1
π3:min 2,3 = 2min 5,7 = 5min 8,1 = 1
![Page 14: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/14.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
iii. Create a table for all hash functions and sets and initialize them withinfinite distance:
14
πΊπ πΊπ πΊπ
ππ 3 2 2
ππ 7 5 5
ππ 1 4 1
e πΊπ πΊπ πΊπ ππ(π) ππ(π) ππ(π)
1 0 0 0 - - -
2 0 1 1 2 5 8
3 1 0 1 3 7 1
4 0 1 1 4 9 4
5 0 0 0 - - -
6 1 1 0 6 3 0
7 0 0 0 - - -
8 0 1 0 8 7 6
9 1 0 0 9 9 9
10 0 0 0 - - -
Update of 4th row:
Update only π2, π3!
π2:min 2,4 = 2min 5,9 = 5min 8,4 = 4
π3:min 2,4 = 2min 5,9 = 5min 1,4 = 1
![Page 15: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/15.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
iii. Create a table for all hash functions and sets and initialize them withinfinite distance:
15
πΊπ πΊπ πΊπ
ππ 3 2 2
ππ 3 3 5
ππ 0 0 1
e πΊπ πΊπ πΊπ ππ(π) ππ(π) ππ(π)
1 0 0 0 - - -
2 0 1 1 2 5 8
3 1 0 1 3 7 1
4 0 1 1 4 9 4
5 0 0 0 - - -
6 1 1 0 6 3 0
7 0 0 0 - - -
8 0 1 0 8 7 6
9 1 0 0 9 9 9
10 0 0 0 - - -
Update of 6th row:
Update only π1, π2!
π1:min 3,6 = 3min 7,3 = 3min 1,0 = 0
π2:min 2,6 = 2min 5,3 = 3min 4,0 = 0
![Page 16: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/16.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
iii. Create a table for all hash functions and sets and initialize them withinfinite distance:
16
πΊπ πΊπ πΊπ
ππ 3 2 2
ππ 3 3 5
ππ 0 0 1
e πΊπ πΊπ πΊπ ππ(π) ππ(π) ππ(π)
1 0 0 0 - - -
2 0 1 1 2 5 8
3 1 0 1 3 7 1
4 0 1 1 4 9 4
5 0 0 0 - - -
6 1 1 0 6 3 0
7 0 0 0 - - -
8 0 1 0 8 7 6
9 1 0 0 9 9 9
10 0 0 0 - - -
Update of 8th row:
Update only π2!
π2:min 2,8 = 2min 3,7 = 3min 0,6 = 0
![Page 17: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/17.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
iii. Create a table for all hash functions and sets and initialize them withinfinite distance:
17
πΊπ πΊπ πΊπ
ππ 3 2 2
ππ 3 3 5
ππ 0 0 1
e πΊπ πΊπ πΊπ ππ(π) ππ(π) ππ(π)
1 0 0 0 - - -
2 0 1 1 2 5 8
3 1 0 1 3 7 1
4 0 1 1 4 9 4
5 0 0 0 - - -
6 1 1 0 6 3 0
7 0 0 0 - - -
8 0 1 0 8 7 6
9 1 0 0 9 9 9
10 0 0 0 - - -
Update of 8th row:
Update only π1!
π1:min 3,9 = 3min 3,9 = 3min 0,9 = 0
![Page 18: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/18.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
iii. Create a table for all hash functions and sets and initialize them withinfinite distance:
This yields the following signatures:ππΌπΊ π1 = 3,3,0ππΌπΊ π2 = 2,3,0ππΌπΊ π3 = 2,5,1
18
πΊπ πΊπ πΊπ
ππ 3 2 2
ππ 3 3 5
ππ 0 0 1
![Page 19: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/19.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
3. How does the estimated Jaccard similarity from (1.) and (2.) comparewith the true Jaccard similarity of the original data? How to reducedeviations in the approximated Jaccard similarities?
19
RECAP: Jaccard similarity:
ππ½ππππππ π·1, π·2 = π·1β©π·2π·1βͺπ·2
![Page 20: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/20.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
i. Actual Jaccard similarity:
20
RECAP: Jaccard similarity:
ππ½ππππππ π·1, π·2 = π·1β©π·2π·1βͺπ·2
πΊπ πΊπ πΊπ
πΊπ - 1 6 1 5
πΊπ - - 2 5
πΊπ - - -
(a) π1 = {3,6,9}(b) π2 = {2,4,6,8}(c) π3 = {2,3,4}
![Page 21: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/21.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
ii. Permutation estimated Jaccard similarity:
21
πΊπ πΊπ πΊπ
πΊπ - 0 6 0 5
πΊπ - - 2 3
πΊπ - - -
(a) π1 = {3,6,9}(b) π2 = {2,8,4}(c) π3 = {2,4,4}
![Page 22: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/22.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
iii. Hash estimated Jaccard similarity:
22
πΊπ πΊπ πΊπ
πΊπ - 2 3 0 5
πΊπ - - 1 5
πΊπ - - -
(a) π1 = {3,3,0}(b) π2 = {2,3,0}(c) π3 = {2,5,1}
![Page 23: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/23.jpg)
DATABASESYSTEMSGROUP
Assignment 8-1
iv. Comparison:
23
πΊπ πΊπ πΊπ
πΊπ - 2 3 0 3
πΊπ - - 1 3
πΊπ - - -
πΊπ πΊπ πΊπ
πΊπ - 0 3 0 3
πΊπ - - 2 3
πΊπ - - -
πΊπ πΊπ πΊπ
πΊπ - 1 6 1 5
πΊπ - - 2 5
πΊπ - - -
Actual J. similarity Perm. est. J similarity Hash. est. J similarity
Estimation of the actual J. similarity is rather poor, why?
Too small minhash vectors. Get more permutations of the universal set ormore hash functions to extend the minhash vectors!
![Page 24: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/24.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
24
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
Wait for the first 6 points to arrive
![Page 25: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/25.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
25
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
Wait for the first 6 points to arrive
![Page 26: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/26.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
26
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
Wait for the first 6 points to arrive
![Page 27: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/27.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
27
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
Wait for the first 6 points to arrive
![Page 28: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/28.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
28
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
Wait for the first 6 points to arrive
![Page 29: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/29.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
29
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
Wait for the first 6 points to arrive
![Page 30: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/30.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
30
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
Apply standard k-Means to create q clusters.
![Page 31: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/31.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
31
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.
![Page 32: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/32.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
32
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.
πΆπΉπ1 = (πΆπΉ2π, πΆπΉ1π, πΆπΉ2π‘, πΆπΉ1π‘, π)
12 + 22
12 + 12=5
2
1 + 2
1 + 1=3
2
12 + 22 = 5
1 + 2 = 3
2
![Page 33: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/33.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
33
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.
πΆπΉπ1 = (5
2,3
2, 5,3,2)
cπππ‘ππ = 322 =
1.51
![Page 34: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/34.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
34
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
πΆπΉπ1 = (5
2,3
2, 5,3,2)
πππππ’π = πΆπΉ2ππ β πΆπΉ1π
π2
= 522 β
322
2
=2.51
β1.51
2
=2.51
β2.251
= 2.5 β 2.25 + (1 β 1) = 0.5
![Page 35: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/35.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
35
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.
ππ πͺπππΏ πͺπππΏ πͺπππ πͺπππ π πππ πππ
1 5
2
3
2
5 3 2 1.5
1
0.5
![Page 36: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/36.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
36
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.
ππ πͺπππΏ πͺπππΏ πͺπππ πͺπππ π πππ πππ
1 5
2
3
2
5 3 2 1.5
1
0.5
2 32
145
8
17
25 7 2 4
8.5
0.5
![Page 37: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/37.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
37
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.
ππ πͺπππΏ πͺπππΏ πͺπππ πͺπππ π πππ πππ
1 5
2
3
2
5 3 2 1.5
1
0.5
2 32
145
8
17
25 7 2 4
8.5
0.5
3 181
25
19
7
61 11 2 9.5
3.5
0.7 := centroids
![Page 38: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/38.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
38
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
Compute distance between point p and eachof the q maintained micro-cluster centroids.
πππΏ2(πΆ1, π) = 2.06
πΆ1
πΆ2
πΆ3ππΏ2(πΆ2, π) = 5.85
ππΏ2(πΆ3, π) = 7.51
![Page 39: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/39.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
39
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
Select a cluster clu as the closestmicro-cluster to p.
πππΏ2(πΆ1, π) = 2.06
πΆ1
πΆ2
πΆ3ππΏ2(πΆ2, π) = 5.85
ππΏ2(πΆ3, π) = 7.51
![Page 40: Big Data Management and Analytics Assignment 8](https://reader034.fdocuments.us/reader034/viewer/2022051315/627a341c61302a2d401620d5/html5/thumbnails/40.jpg)
DATABASESYSTEMSGROUP
Assignment 8-2 - CluStream
40
β’ initPoints = 6
β’ q = 3
β’ Factor of clu radius t = 5
t 1 2 3 4 5 6 7 8 9 10 11 12
p 1
1
2
1
4
9
4
8
10
4
9
3
2
3
11
3
12
12
12
11
11
12
4
2
0 1 2 3 4 5 6 7 8 9101112
123456789
101112
Find the maximum boundary of clu:= factor of t of clu radius= 5 β 0.5 = 2.5
π
ππΏ2(πΆ1, π) = 2.06
πΆ1
πΆ2
πΆ3is p within the maximum cluster boundary of πΆ1?
(π₯ β π₯π1ππππ‘ππππ)2+(π¦ β π¦π1ππππ‘ππππ)
2β€ (π‘ β ππππΆ1)2?
(2 β 1.5)2+(3 β 1)2β€ (5 β 0.5)2?