Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future
description
Transcript of Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future
Rudolf BayerTechnische Universität München
B-Trees and Databases, Past & Future
R. Bayer B-Trees and Databases
2
1969 2001 Factormain memory 200 KB 200 MB 103
cache 20 KB 20 MB 103
cache pages 20 5000 <103
disk size 7.5 MB 20 GB 3*103
disk/memory size 40 100 -2.5transfer rate 150 KB/s 15 MB/s 102
random access 50 ms 5 ms 10scanning full disk 130 s 1300 s -10(accessibility)
Computing Technology in 1969 vs 2001
R. Bayer B-Trees and Databases
3
Challenge of Applications in 1969
Space IndustrySupersonic Transport SSTC5ABoeing 747
Manufacturingparts explosion(spare) parts mangement
Commercebank check managementcredit card management
R. Bayer B-Trees and Databases
4
Basics of B-Trees
5 11 16 21
1 2 3 4 12 13 15 17 18 19 20 22 24 256 7 8 10
9
R. Bayer B-Trees and Databases
5
Basics of B-Trees: Insertion
5 11 16 21
1 2 3 4 12 13 15 17 18 19 20 22 24 256 7 8 9 10
R. Bayer B-Trees and Databases
6
Basics of B-Trees: the Split
8 5 11 16 21
1 2 3 4 12 13 15 17 18 19 20 22 24 25 6 7 9 10
R. Bayer B-Trees and Databases
7
Basics of B-Trees: recursive Split
1 2 3 4 12 13 15 17 18 19 20 22 24 256 7 9 10
5 8 11 16 21
R. Bayer B-Trees and Databases
8
Basics of B-Trees: Growth at Root
11
5 8 16 21
1 2 3 4 6 7 9 10 12 13 15 17 18 19 20 22 24 25
R. Bayer B-Trees and Databases
9
Scientific American 1984
R. Bayer B-Trees and Databases
10
Fundamental Properties of B-Trees
Time & I/O Complexity: O(logk n); k > 400for all elementary operations
• find• insert & delete
Storage Utilization ~ 83%Growth: height nodes size 1 1 8 KB
2 400 3.2 MB 3 16*104 1.3 GB 4 64*106 512 GB < 4 logical I/O per operation !!
R. Bayer B-Trees and Databases
11
Independence of DB Size
cached
. . .
Index part 1% of fileremember: since 1969disk sizememory size
const 100
< 2 physical I/O per operation !!
R. Bayer B-Trees and Databases
12
DB-Models in 1969
IMS: hierarchical, commercial successCODASYL: network model,
M. Senko, C. BachmannRelational: E. F. Codd, theory only
Senko, Codd in same department:Information Systems DepartmentIBM Research Lab, San José
Senko Codd Efficiency Simplicity
R. Bayer B-Trees and Databases
13
Relational DB-Model, Ted Codd
Research in 1969, published in 1970 CACM
Relational Algebra = Tables + Operatorstoday: x set operators Codd: lossless restriction by table tie
algebraic laws for query optimization(Codd does not mention this aspect)
R. Bayer B-Trees and Databases
14
2 Languages
• imperative, procedural: algebraic expressions
• declarative, non-procedural: applied predicate calculus, DSL/Alpha (1971)
no implementation of acceptable efficiency in sight!
R. Bayer B-Trees and Databases
15
Hard Questions from 1969-1974
• which model?• which language?• which implementation?
infighting, Codd to Systems Department defer decisions: rel. Storage System RSS
to support all models and languages
1971-1974 Leonard Liu: CS Dept1974 Cargese Workshop, Frank King
R. Bayer B-Trees and Databases
16
Which Language?
DL/I: IMSCODASYL: COBOL + pointer chasing and
currency indicators, ChamberlinRel. Algebra: CoddDSL/Alpha: CoddSQUARE: Chamberlin, et al.SEQUEL: Chamberlin, Boyce, ReisnerQBE: Moshe ZloofRendezvous: Codd
3 survivors: DL/I, SQL, Rel.Algebra
R. Bayer B-Trees and Databases
17
Implementation: System R, IBM
SQL Chamberlin, ReisnerSchemata: normalization, Codd, BoyceRel. Algebra: Codd et al.Optimization: Blasgen, Selinger, EswaranCost Models: Transactions: Gray, TraigerB-Trees: Bayer, Schkolnick, BlasgenRecovery: Lorie, Putzolu
R. Bayer B-Trees and Databases
18
Factors for Product Success
• simple, formalized model• simple user interface: SQL• algebra + laws for optimization• performance: B-trees• multiuser: transactions (Gray)• robustness: transactions + recovery,
self-organization of B-trees• scalability: B-trees with logarithmic
growth, parallelism
R. Bayer B-Trees and Databases
19
Prefix B-Trees
... (Smith, Bernie), (Smith, Henry) ...
... (Smith, C)
... (Smith, Bernie)
• store shortest separators: Simple Prefix B-trees• trim common prefixes: Prefix B-trees
(Smith, Henry)
R. Bayer B-Trees and Databases
20
Concurrency and B-Trees
Bayer, Schkolnick: Acta Informatica 9, 1-21 (1977)
...
• everybody reads root• root almost never changes• low probability of conflicts near leaves
• combination of synchronization protocols • no chance of testing real general case
R. Bayer B-Trees and Databases
21
UB-Trees: Multidimensional Indexing
• geographic databases (GIS)• Data-Warehousing: Star Schema• all relational databases with n:m relationships
R Sn m
(r, s)• XML• mobile, location based applications
R. Bayer B-Trees and Databases
22
Basic Idea of UB-Tree
• linearize multidimensional space by space filling curve, e.g. Z-curve or Hilbert
• Use Z-address to store objects in B-Tree
Response time for query is proportional to size of the answer!
R. Bayer B-Trees and Databases
23
UB-Tree: Regions and Query-Box
Z-region [0.1 : 1.1.1]
R. Bayer B-Trees and Databases
24
World as self balancing UB-Tree