What is the benefit of using BST?
description
Transcript of What is the benefit of using BST?
04/21/23 ITK 275 1
What is the benefit of using BST?
20 3025
10
17
3523
22
21
19
1
5
7
4
3
14
12
130 1511 18 27
26
28
32
34 3836
37
N=30
log230 = 5
Search 17, 29, 3
04/21/23 ITK 275 2
What is the real benefit of using BST?
20
30
25
10
17 35
23
22
21
19
1
5
74
3
14
12
13
0
15
11
18 27
26
28
32
34 38
36
37
O(log2n)
Search 18, 2, 26
04/21/23 ITK 275 3
What is the benefit of using BST?
O(log2n)
The average height of a binary search tree of n nodes is
if the n nodes are arrived based on a uniform distribution on the space of possible keys.
This is as good as the binary search.
Proof: skip
f(n) : The average internal path length of an n-node BST
04/21/23 ITK 275 4
O(log2n)The average height of a BST :
f(n) : The average internal path length of an n-node BST
f(3) : (3+2+3) / 3 = 2.67
0
1
2
0+1+2=3
1 1
0
0+1+1=2
1
2
0+1+2=3
04/21/23 ITK 275 5
The sequence of node insertion will affect the shape of the BST
1
7
12
15
17
23
25
13 27
1, 7, 12, 25, 27, 13, 23, 17, 15
This situation is not uncommon
e.g., the data is roughly sorted.
Highly unbalanced BST
04/21/23 ITK 275 6
Highly unbalanced BST1
7
12
15
17
23
25
13 27 1
7
12
13
15
17
23
25
27
Balanced BST
04/21/23 ITK 275 7
If R is too big, then Shift a Node from R to L
is a BST
LR
1. Insert to L
2. Find the min in R,
3. Copy to the root and delete it from R
04/21/23 ITK 275 8
How big is “too big”?
How to measure the unbalance?
How unbalanced do we allow a BST to be?
Chung-Chih Li. An immediate approach to balancing nodes in binary search trees. Journal of Computing Sciences in Colleges , 21(3):238--245 April 2006.
04/21/23 ITK 275 9
Definition: NBk
Node-Balanced of degree K
kRL
kn
nnn
..
nL.nR.
nnnn RL ..
04/21/23 ITK 275 10
Insert a node to an NBkk
RLk
nnn
n ..
Insert(, t) : // Insert t to 1.If ( = ) then
(a)new ;(b) t;(c)return ;
2.If (.data > t) then // Insert t to .L(a)If (.Ln - .Rn + 1> n / k) then
(1)ShiftToRight();(2)If ((.data < t) then t;
(b).L = Insert(.L, t);else // Insert t to .R
(a) If (.Rn - .Ln + 1> n / k) then(1)ShiftToLeft();(2)If ((.data > t) then t;
(b) .R = Insert(.R, t);3. Return ;
nL.nR.
n
04/21/23 ITK 275 11
Insert a node to an NBk
Insert(, t) : // Insert t to 1.If ( = ) then
(a)new ;(b) t;(c)return ;
2.If (.data > t) then // Insert t to .L(a)If (.Ln - .Rn + 1> n / k) then
(1)ShiftToRight();(2)If ((.data < t) then t;
(b).L = Insert(.L, t);else // Insert t to .R
(a) If (.Rn - .Ln + 1> n / k) then(1)ShiftToLeft();(2)If ((.data > t) then t;
(b) .R = Insert(.R, t);3. Return ;
04/21/23 ITK 275 12
Delete a node from an NBkk
RLk
nnn
n ..
Delete(, t): // Delete t from 1.If ( = ) then return ;2.If (.data > t) then // Search t in .L
().L = Delete(.L, t);(b)If (.Rn - .Ln + 1> n/k) then ShiftToLeft();(c)Return ;
3.If (.data < t) then // Search t in .R().R = Delete(.R, t);(b)If (.Ln - .Rn + 1> n/k) then ShiftToRight();(c)Return ;
4. // t = .data, i.e., needs to be deleted If (n = 1 ) then Delete and return ;
5.If (.Ln > .Rn ) then ()b = the maximum node in .L;(b) b.data;(c)Delete(.L, b.data);
else ()b = the minimum node in .R;(b) b.data;(c)Delete(.R, b.data);
6. Return ;
04/21/23 ITK 275 13
Delete a node from an NBk
Delete(, t): // Delete t from 1.If ( = ) then return ;2.If (.data > t) then // Search t in .L
().L = Delete(.L, t);(b)If (.Rn - .Ln + 1> n/k) then ShiftToLeft();(c)Return ;
3.If (.data < t) then // Search t in .R().R = Delete(.R, t);(b)If (.Ln - .Rn + 1> n/k) then ShiftToRight();(c)Return ;
4. // t = .data, i.e., needs to be deleted If (n = 1 ) then Delete and return ;
5.If (.Ln > .Rn ) then ()b = the maximum node in .L;(b) b.data;(c)Delete(.L, b.data);
else ()b = the minimum node in .R;(b) b.data;(c)Delete(.R, b.data);
6. Return ;
04/21/23 ITK 275 14
Analysis
BST Average Heights on n Random Keys
)(log nO1.
2. ...31107.4),log(loglog nOn
Devroye and Reed,, SIAM J. Comput. ‘95
04/21/23 ITK 275 15
Analysis of NBk with n Random Keys
knaX /1 nXa n
a
X
knaX /nnknaa 1/
k
knknaX
2
1/
k
kna
2
1
2
2 2
1
k
knX
X2a2
3
3 2
1
k
knX
the worst case
X3
a3
k
naX
k
n
04/21/23 ITK 275 16
Analysis of NBk with n Random Keys
the worst case
At depth h
nk
k h 1)
2
1(
0hX
.log))2log()1(log( nkkh
.log)1log()2log(
logn
kk
nh
hh k
knX )
2
1(
0hX
04/21/23 ITK 275 17
Analysis of NBk with n Random Keys
the worst case
At depth h .log)1log()2log(
logn
kk
nh
0hX
k h <
2 2.4094 log(n)
3 1.7095 log(n)
4 1.4748 log(n)
8 1.2047 log(n)
16 10958 log(n)
--- log(n)
...3.4),log(loglog nOnBST:
b )2log(n
AVL:
33.0,44.1 b
04/21/23 ITK 275 18
Experimental results
Computational Cost
1. AVL --- algorithm is complicated
2. NBk --- shifting operations
3. BST --- traveling long paths
04/21/23 ITK 275 19
Random and no duplication in data
2123
25
28
3133
36
4042
1315
1819
2122
2425
26
1112
1314
1617
1819
2021
45
28
1011
1213
1516
1718
1920
9
14
19
24
29
34
39
44
9 10 11 12 13 14 15 16 17 18
Log (n), n = number of nodes -- all keys are distinct
Hei
ghts
BST
NB2
AVL
NB8
log(n)
1.2 M
04/21/23 ITK 275 20
Random and no duplication in data
0.701.77
2.834.00
5.21
6.47
7.80
9.22
10.63
12.10
13.60
15.14
0.571.30
2.153.06
3.994.97
5.926.94 7.98
9.03
10.4011.56
0.461.10
1.822.55
3.324.15
5.055.88
6.757.63
8.569.45
0.410.98
1.612.35
3.103.91
4.685.49
6.377.26
8.179.21
05
1015
20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
Sec
onds
Number of nodes -- all keys are distinct
Millions
NB8
AVL
NB2
BST
2.66 GHz P41GB MMS XPVisual C++
04/21/23 ITK 275 21
Random, each key has n/100 duplicates
log(n)
1.2 M
1822
2836
52
82
134
247
469
905
1317 19 22 24 26 29 31 33
1013 14 15 16 1714
1218 19 20
1112
19 20 21
110
100
1000
9 10 11 12 13 14 15 16 17 18
Log (n), n = number of nodes -- each key has n/100 duplicates
Hei
ghts
BST
NB2
AVL
NB8
04/21/23 ITK 275 22
Random, each key has n/100 duplicates2.66 GHz P41GB MMS XPVisual C++
4.3
18.8
43.7
79.4
126.0183.4
252.5336.9
453.8572.4
707.2856.3
11.515.9
20.525.5
30.8 36.1 41.7 47.6 53.6
19.524.3
29.4 34.5 39.9 45.5 51.3
1.92.6
3.34.1
4.9 5.7 6.6 7.5 8.4 9.37.6
4.2
15.110.9
7.2
4.0
1.41.1
110
100
1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
Millions
Number of nodes -- each key has n/100 duplicates
Seconds BST
NB2
NB8
AVL
04/21/23 ITK 275 23
Random, arrive in batches of 32 sorted records
log(n)
1.2 M
130
226293 327 360 391 454 522 586 619
1518
22 22 23 25 27 28 31 32
11 12 13 14 1518 19 20 21
17
10 11
16 17 18 19
110
100
1000
9 10 11 12 13 14 15 16 17 18
Log (n), n = number of nodes -- data arrives in batches of size 32
Hei
ghts
BST
NB2
AVL
NB8
04/21/23 ITK 275 24
Random, arrive in batches of 32 sorted records
2.66 GHz P41GB MMS XPVisual C++
18.64
27.70
35.33
45.42
57.59
69.02
77.15
87.34
98.25
108.28
117.11
2.6 3.3 3.9 4.5 5.2 5.8 6.5 7.113.9 16.7 19.4 22.2 25.1
28.531.7
35.3
7.3 9.0 10.7 12.4 14.2 15.9 17.7 19.5
5.58 2.31.20.6
5.52.60.9
8.34.21.5
11.15.72.00
2040
6080
100
120
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
Millions
Number of nodes -- data arrives in batches of size 32
Sec
onds
NB8
AVL
NB2
BST
04/21/23 ITK 275 25
Conclusion NBk
1. can build a near optimal BST even when k is small
2. easy to analyze3. easy to implement4. practical in most conditions
While NBk's computational cost is much
better than BST and close to or better than AVL, there is no guarantee on any data. In other words, it is not as robust as AVL.
04/21/23 ITK 275 26
AVL Tree: A BST in which the height difference between the two children of any node is always less than 2.
h
h+1
+1
h’+1h’
+1-1
04/21/23 ITK 275 27
Rotations: RR
+1+2
+1
04/21/23 ITK 275 28
Rotations: RL
h+1
h+1h h
+1
0
+0
+2
-1
-1
0
0+1
04/21/23 ITK 275 29
Rotations: LL
h+1h+1
h
h-1
+1
-1
-1
0
+2
-2
-2
-1
h-1h-1
-1
0
0
+1
04/21/23 ITK 275 30
Possible complications
h+1
h+1h h
+2
-1
-1
Re-assign the links
Tracking the heightsand balance-factors
04/21/23 ITK 275 31
,)2log()1log( b nhn
32824.0,4402.1 b
h: Average Heights
n Random Keys
AVL