of DBS 5. Storage Structures -
Transcript of of DBS 5. Storage Structures -
Realizationof DBS
5. Storage Structures
Theo Härderwww.haerder.de
© 2011 AG DBIS
Realization of Database Systems – SS 2011
Main reference:Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 6.
Jim Gray, Andreas Reuter: Transaction Processing – Concepts and Techniques, 5th printing, Morgan Kaufmann Publ., 1993, Chapter 14.
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures
Goal: design of• Storage structures for records and complex objects • Auxiliary structures such as free placement administration, addressing, etc.
Free-placement administration
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Disk-based record addressing• TID, allocation table, indexing of tables
Memory-based record addressing• Classification of the solution concepts, Pointer Swizzling methods
Mapping of records• Fixed/variable fields, partitioning
Storage structures for complex objectsList and set const cto s t ple const cto
© 2011 AG DBIS
DB connectionfor external data
5-2
• List and set constructors, tuple constructor
Problems of large objects
Storage structures for LOBs• Segments of fixed and variable size , access via B*-tree, pointer list, . . .
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures (2)
Operations
Insert <record> at <location> with <database-key>Retrieve <record> with <database-key>Add <entry> to <B*-tree>Ret ie e <add ess list> f om <B* t ee> fo < al e>
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Retrieve <address-list> from <B*-tree> for <value>
Mapping functions• record identifier address• attribute value record-id.list• record identifier record-id.list• address {occupied, free}
FIX Pi FIX Pj UNFIX Pj
© 2011 AG DBIS
DB connectionfor external data
5-3
FIX Pi, FIX Pj, UNFIX Pj,FIX Pk, UNFIX Pi, ...
Properties of the upper interface• Non-volatile storage with “addressing assistance”• Free-placement administration• Addressing methods of and between physical records• Access paths supporting content addressability
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Free-Placement Administration
Free-placement administration (FPA) for• External storage (allocation of files)• Segments (allocation of internal records)• Pages (administration of allocated/free entries)
For all pages of a segment:
Complex objects
Memory-based record addressing
Mapping of records
Large objects
For all pages of a segment:• Insert/update search for n free bytes• Delete/update release or allocation of storage space• In general: search, allocation, and release of storage space in Sj
PH
PirecordsPH
page Pi
LP
pages
segment Sj
© 2011 AG DBIS
DB connectionfor external data
5-4
LP
In PH (page header): ID of Pi,free placement info, type, org. data
free-placement table F in Sjfi
i
Lf
1
fi = # free bytes
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Free-Placement Administration (2)
Size of F: entries per page of size LP with s = #pages in segment
pages for F
k
sn
f
PHP
L
LLk
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Location of F• Begin of segment• Equidistant distribution i k + 1 (i=0,1,2,...)• End of segment
Kind of FPA
© 2011 AG DBIS
DB connectionfor external data
5-5
Kind of FPA• Exact: Lf = 2 bytes • Fuzzy: Lf = 1 byte (or less)
units of fi LP/256 - multiplefor LP = 4KB 16 bytes
FPA within Pi• Exact fi in PH• Contiguous administration (displacements!)• Free-storage chain (best-fit / first-fit)
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Disk-based Record Addressing Problem statement
• Long-term storage of data records • Avoidance of “technology dependencies”• Support of migration, etc.
Complex objects
Memory-based record addressing
Mapping of records
Large objects
General form of a record address• DBID, SID, TID and, if necessary, table selector (RID), • if table completely stored in segment: in record: TID; in DB catalog: DBID,SID• if table in several segments: SID, TID
Goals of the addressing technique:• Fast possibly direct record access
© 2011 AG DBIS
DB connectionfor external data
5-6
• Fast, possibly direct record access• Stable against minor displacements (moves within a page without impact)• Infrequent or no reorganizations
Addressing in segments• Logically contiguous address space• Direct addressing (logical byte address, RBA) Instable under displacements
Indirect addressing is very important
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Disk-based Record Addressing – Techniques
addressing methods
Complex objects
Memory-based record addressing
Mapping of records
Large objects
direct addressing indirect addressing
logical (relative)byte addressin segment
TID allocation table primary key
© 2011 AG DBIS
DB connectionfor external data
5-7
DBK DBK/PPP PK/PPP ARID/PPP
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Record Addressing: TID Concept
TID (tuple identifier) consists of two components:• Page number (3 bytes)• Relative index position within page (1 byte)• Serves for addressing within a segment (z. B. SID = A)
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Migration of a record into another page without TID change
Creation of a proxy TID in primary page
Overflow chain: length <= 1
page 123TIDs
segment A:
© 2011 AG DBIS
DB connectionfor external data
5-8
record
123 3
123 6
TID
overflow record
record
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Record Addressing via Allocation Table
Each record owns a unique logical identifier• Database key (DBK)• Allocation of DBKs done by the DBMS, in general• System-internal references to records are exclusively made via DBK
Allocation table contains a PP for each DBKSID (1 b ) b (3 b )
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• SID (1 byte), page number (3 bytes)
Hybrid method:use of “probable page pointers” (PPP) in access paths possibly saves accesses to allocation table
DBK
allocation tablefor record type Y
PP
segment A
Y003page123
index structure
searchcriterion DBK PPP
xxx Y003 A123
© 2011 AG DBIS
DB connectionfor external data
5-9
A 123
A 124Y006
Y003DBK xxx …
Y006
zzz …
page124
xxx Y003 A123
zzz Y006 A124
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Indexing of Tables Storage of tables
• Unordered table:records (rows) are scattered across the pages of the segment (heap)
• Ordered table:records are embedded in a B*-tree (key-sequenced table); thereby, clustering is achieved
Complex objects
Memory-based record addressing
Mapping of records
Large objects
W d h bl i d i d bl (IT)
root page
internal pages
leaf pages
25 61
8 13 33 45 77 85
ITTAB (PK)
© 2011 AG DBIS
DB connectionfor external data
5-10
We denote such a table as index-organized table (IT)
Indexing of tables• With secondary indexes for columns Ai : ITab(Ai)• Use of different addressing methods
- TID (physical)- DBK (indirect: logical/physical)- PK (primary key: logical)- Hybrid methods
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Indexing of Tables (2)
How do addressing and table allocation play together?
Unordered table. . .
segment Tab recordsstored as a heap
Complex objects
Memory-based record addressing
Mapping of records
Large objects
a57
A5
. . .Ptra57 Ptr . . .
ITab(A5)
p
© 2011 AG DBIS
DB connectionfor external data
5-11
• Records are not (or hardly) displaced in case of modification• Addressing methods (Ptr):
TID, DBK, and DBK/PPP are conceivable• Index support for unordered tables
in DB2, Sybase, MS SQL-Server, Oracle, ...
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Indexing of Tables (3) Index-organized table
ITab(A5) ITTab(PK)
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• Split in ITTab requires many address adjustments in ITab(Ai), when using- TID- DBK- DBK/PPP
• Improvement: logical addressing
Ptra57 Ptr . . . a11PK1 . . . a57 . . .
© 2011 AG DBIS
DB connectionfor external data
5-12
PK1a57 PKj . . .
ITab(A5) ITTab(PK)
a11PK1 . . . a57 . . .
• No maintenance of ITab(Ai) needed in case of splits/displacements in ITTab
• But: higher access costs for index scan, etc.
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Indexing of Tables (4)
Use of a hybrid addressing method• Reference has two components
- Logical reference: PK- Physical reference: probable DB page (PPP, Guess DBA)
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• Entry in index
attribute value PK PPPindex key HRID = (Hybrid Row Identifier)
ITTab(PK)ITab(A5)
© 2011 AG DBIS
DB connectionfor external data
5-13
• Combined advantages of both methods • What happens in case of long primary keys?
a11PK1 . . . a57 . . .. . .PK1 PPP1 PKj PPPja57
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Indexing of Tables (5)
Optimization for long primary keys• Example: table Order_Line of TPC-C benchmark:
OL (ol-o-id, ol-w-id, ol-d-id, ol-number, ol-i-id, ...) • Simplified notation: OL (A1, A2, A3, A4, A5, ...)
• Avoidance of PK storage in the index (solution of Oracle)
Complex objects
Memory-based record addressing
Mapping of records
Large objects
g ( )- Use of a mapping table ATab- Reference to ATab by ARID
ITOL(A1, A2, A3, A4)
ATab . . .
IOL(A5)
ARID1a57 PPP1 ARIDj PPPj
© 2011 AG DBIS
DB connectionfor external data
5-14
• If access fails via PPP , ATab is searched using ARID • All IOL(Ai) use ATab• Starting from ATab, access can be performed to ITOL via PPP or via PK
a11, a21, a31, a41ARID1 a57 . . .
PK1
PPPjPKj. . .
PPP1PK1
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Memory-based Addressing
Task: Programs should be enabled to transparently process transient and persistent data objects in main memory• Exclusive usage of direct addresses in main memory (virtual addressing),
i i i t i t t bj t i ffi i t
Complex objects
Memory-based record addressing
Mapping of records
Large objects
i.e., in main memory, access to persistent objects is as efficient as access to transient objects
• No additional costs for programs only accessing transient objects
• Mapping costs for persistent objects should not be paid for each access
Mapping of persistent objects residing on external storage (ES) to such in virtual storage (VS)• Persistent addresses (e.g., SID, RID, TID) are long (e.g. 64 bit),
© 2011 AG DBIS
DB connectionfor external data
5-15
Persistent addresses (e.g., SID, RID, TID) are long (e.g. 64 bit), in contrast, virtual addresses are shorter (e.g. 32 bit)
• Translation of pointers (pointer swizzling) from the long format (using indirect addressing) to the shorter format using an addressing method ‘as direct as possible’
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Memory-based Addressing (2)
Goal: Fast processing of pointer sequences in VS — e.g. 105 refs/sec• Object processing: traversing sequences of references and
navigation in meshed object structures• Direct access in main memory is substantially cheaper than access via
persistent addresses (localization of a page in DB buffer and search of the
Complex objects
Memory-based record addressing
Mapping of records
Large objects
p ( p gobject in the page)
• Additional access paths to support search in main memory, if necessary:B*-tree access requires h+1 direct pointer references
Dimensions of Pointer Swizzling*
full
uncachingswizzing
software
copy
© 2011 AG DBIS
DB connectionfor external data
5-16
* White, S.J., DeWitt, D.J.: Quickstore: A High Performance Mapped Object Store, in: The VLDB Journal 4:4, Oct. 1995, pp. 629-674.
no-swizzinghardware
in-place
direct
eagerindirect
lazy
partial
no-uncaching
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Pointer Swizzling
Classification of swizzling methods• Most important criteria:
location, point of time, and mode (orthogonal)
• Location:
Complex objects
Memory-based record addressing
Mapping of records
Large objects
- In-Place Swizzling: retention of object formats and page structures- Copy Swizzling: copy of objects in a buffer and swizzling of pointers
in the copies
• Point of time:- Eager Swizzling: swizzling of all pointers as soon as objects are placed
in main memory - Lazy Swizzling: swizzling of pointers at the first reference or later
(according to arbitrary criteria — magic number 3)
© 2011 AG DBIS
DB connectionfor external data
5-17
( g y g )
• Mode:- Direct Swizzling: use of virtual address of the object: using this method,
replacement of the object can become very difficult or even impossible during processing
- Indirect Swizzling: use of virtual address of the object descriptors
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Pointer Swizzling (2)
Classification criterion: locationin-placeDB buffer
O1
O2
copyobject buffer (heap) is typicallyallocated in the client computer
O1
Complex objects
Memory-based record addressing
Mapping of records
Large objects
O2
Classification criterion: point of timeeager
as soon as in main memory(avalanche effect)
lazymany possibilities
© 2011 AG DBIS
DB connectionfor external data
5-18
Classification criterion: mode
indirect
O1 O2
descriptor 1 descriptor 2
direct
O1 O2
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Pointer Swizzling (3)
Direct and indirect swizzling – Principle
object 1 object 2 object 1 object 2
Complex objects
Memory-based record addressing
Mapping of records
Large objects
object 3
a) Symmetric references b) Referencing of descriptors
object 4descriptor 4descriptor 3
© 2011 AG DBIS
DB connectionfor external data
5-19
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Pointer Swizzling (4)
Direct and indirect variant using Copy Swizzling
object 4object 1
allocation tableOID/MM addr.
h(OID1)
Complex objects
Memory-based record addressing
Mapping of records
Large objects
object 2
object 3
object 5
a) Direct swizzling in an object buffer
allocation tableOID/Handle descriptor 1 descriptor 4
© 2011 AG DBIS
DB connectionfor external data
5-20b) Indirect swizzling in an object buffer
object 2
object 3
object 4object 1
h(OID1)
object 5
descriptor 2 descriptor 3 descriptor 5
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Pointer Swizzling (5)
Checks
Lazy: check whether swizzling is already performedEager: no checkIndirect: check whether object is already / still there
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Indirect: check whether object is already / still there Direct: no check
(but no uncaching of the object after swizzling or unswizzling)
Costs
Eager/Direct: 0 checks ( no uncaching)Eager/Indirect: 1 check (in the descriptor)
© 2011 AG DBIS
DB connectionfor external data
5-21
g ( p )( replacement using reference counter)
Lazy/Direct: 1 check (+ cost of heuristics)( replacement using symmetric pointers)
Lazy/Indirect: 2 checks (+ cost of heuristics (> 3 refs))
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Pointer Swizzling (6)Location
Time Eager EagerLazy Lazy
In-Place Copy
Complex objects
Memory-based record addressing
Mapping of records
Large objectsRemarks:
l f ll / b h k l ( h )
Mode
1 2 3 4 5 6 7 8
ID D D DI I I
© 2011 AG DBIS
DB connectionfor external data
5-22
1 + 5 : Swizzling of all pages/objects at Checkout, no replacement (no uncaching)2 + 4 : cumbersome organization
Questions- Which methods allow fastest processing (no consideration of swizzling cost)?- Which methods enable object replacement (uncaching)?- How can Lazy/Direct (3 + 7) be implemented?
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Mapping of Records
Record manager • Physical storage of records in pages• Operations: read, insert, modify, delete
Record description• Per attribute: Fi t N X5h
attribute name type . . . length attribute value
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• Per attribute: First_Name Xaver5. . .varchar
metadata incatalog (DD)
instance inrecord
• Description of records and access paths in the catalog• Special methods for storing values
- Blank-/ null suppression- Character compression
© 2011 AG DBIS
DB connectionfor external data
5-23
Character compression- Cryptographic encoding- Symbol for undefined values
• Table substitution for values: KL = Kaiserslautern
Organization • n record types per segment• m records of different types per page• Record size < page size: RL LP - LPH
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Records
Design goals• Space economy• Fast location of the i-th field
(to a large extent, computation using catalog information)• Dynamic extension (ALTER TABLE ...)
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Concatenation of fixed-length fieldsCatalog: f5 | f8 | f80 | f6 | ... |
• space consuming• inflexible
Pointer in prefix
RID . . .
e.g. TID f5 f8 f6
. . .
f80
© 2011 AG DBIS
DB connectionfor external data
5-24
Pointer in prefixCatalog: f5 | v | v | f6 | ... |
RID
2 bytes
. . . • inflexible
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Records (2)
Embedded length fieldCatalog: f5 | v | v | f6 | f2 | v
RID TL val L val L val val val L val
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• Increased use of the catalog• Dynamic extension possible
Optimization: embedded length fields using pointersCatalog: f5 | v | v | f6 | f2 | v |
f5 f6 f2
© 2011 AG DBIS
DB connectionfor external data
5-25
• Address of the n-th attribute can be computed • Dynamic extensibility
RID
f5
TL val L valval val L val
f6 f2
FL L val
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Records (3)
Record mapping: evaluation of methods
concatenation of fixed-length
fieldspointer in
prefixembedded
length fieldsembedded
length fields using pointers
Complex objects
Memory-based record addressing
Mapping of records
Large objects
space economy
access speed within a record
extensibility
Special storage requirements
First Name Name Job
© 2011 AG DBIS
DB connectionfor external data
5-26
• 200 attribute/record ?• RL LP - LPH
• Must fit for n relational DBMSs• Indexing
Xaver
First_Name Name Job
OID
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Records (4)
Extreme solution: AOV (built-in schema evolution)
First_Name
A OID V
XYZ 0815 Xaver
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Mapping to n-ary relation DR
• Search of the entire record having 200 attributes ? Via OID
A15 XaverXYZ 0815
OID V5
V4V3V2V1AID
INTFLOAT
MONEYVARCHAR
OWNtypes:
© 2011 AG DBIS
DB connectionfor external data
5-27
• Index on all attributesSearch: Select *
FROM Emp (DR)WHERE First_Name = ’Xaver’(OR) AND Job = ’Programmer’(OR) AND Age > 50
How can this query be mapped onto DR?
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Records (5)
Problem: dynamic growth / variable length• Growth and shrinking in a page• Overflow schemata, garbage collection Methods of record storage introduced so far are to be combined with
additional options
Complex objects
Memory-based record addressing
Mapping of records
Large objects
additional options
Strictly contiguous storage of records• Numerous migrations needed in case of high update frequency• Advantages for indirect addressing schemes
Splitting of the record
© 2011 AG DBIS
DB connectionfor external data
5-28
• Ordering according to reference frequencies• Improvement of clustering• Repeated overflow possible• Is inevitable in case of storing attributes of type TEXT or IMAGE
F1 F2 F3 F4 F5 F6 F7
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Complex Objects*
Complex objects composed of• Atomic values and thereupon • recursively applied set-, list-, and tuple constructors
Model for complex objects (eNF2)set list
Complex objects
Memory-based record addressing
Mapping of records
Large objects
set list
values
tuple
Storage strategy• Orthogonality is important: no enumeration of all possibilities• Filing of frequently accessed substructures (possibly shared) in a single
or a few storage units
© 2011 AG DBIS
DB connectionfor external data
5-29* Keßler, U., Dadam, P.: User-guided, flexible storage structures for complex objects, Proc. BTW’93,Braunschweig, 1993, S. 206-225.
or a few storage units• Rarely accessed substructures should be separated Application knowledge!
Performance aspects• of complex objects/operations are essentially affected by the storage
structures used• minimization of I/O clustering, consideration of object growth
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Complex Objects (2)
Simple example• Complex_object Employee [. . .]
set [. . .] of tuple (Emp_No [. . .] : integer,Name [. . .] : string (30),Salary [. . .] : real,CV [. . .] : var_string)
[. . .] denotes location of storage structure description
Complex objects
Memory-based record addressing
Mapping of records
Large objects
[. . .] denotes location of storage structure description
Degrees of freedom for physical storage structures1. Choice of storage structures for the implementation
of sets, lists, and tuples (constructor data structure)2. In-line storage or referencing of the elements of a set or list resp. the
attribute values of a tuple in the constructor data structure
Each constructor has a constructor data structureExample: SimpleSet {Emp 1 Emp 2 Emp 3}
© 2011 AG DBIS
DB connectionfor external data
5-30
• Example: SimpleSet {Emp_1, Emp_2, Emp_3}• Variable-length array as constructor data structure
Emp_1 Emp_2 Emp_3
Emp_1 Emp_2 Emp_3materialized storage (in-line)
referenced storage
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Complex Objects (3)
Twofold application of the set constructors• { {Emp_1 , Emp_2} , {Emp_3 , Emp_4} }• Pre-setting: variable-length arrays as constructor data structures
F i l i
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Four implementations
Anchor_Rec
Structure_Rec
Anchor_Rec
© 2011 AG DBIS
DB connectionfor external data
5-31
Emp_1 Emp_2 Emp_3 Emp_4Emp_Rec
1. Elements of outer set : referencedElements of inner set : referenced
Emp_1 Emp_2 Emp_3 Emp_4Emp_Rec
2. Elements of outer set : materializedElements of inner set : referenced
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Complex Objects (4)
Four implementations (cont.)
Anchor_Rec Anchor_Rec
Emp 2 Emp 3 Emp 4Emp1
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Emp_Rec
Emp_1 Emp_2 Emp_3 Emp_4
3. Elements of outer set : referencedElements of inner set : materialized
If in addition linked lists can be used as constructor data structures
Emp_2 Emp_3 Emp_4
4. Elements of outer set : materializedElements of inner set : materialized
Emp1
© 2011 AG DBIS
DB connectionfor external data
5-32
If, in addition, linked lists can be used as constructor data structures, 16 variants can be obtained in total
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Set- and List Constructors
Independent degrees of freedom• Constructor data structure
- Variable-length array- Linked list- ...
• Mode of element storage- Directly in constructor data structure- Referencing of the elements via pointers
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Independent specification of these degrees of freedom requires two parameters (in a data definition language):
object_type = . . ./* definition of a set */set [implementation = implementation_type,
element_placement = placement_type] of object_type …/* definition of a list. */list [implementation = implementation_type,element_placement = placement_type] of object_type ...
P t l
© 2011 AG DBIS
DB connectionfor external data
5-33
Parameter valuesimplementation_type = array linked_listplacement_type = inplace referenced (record_type_name)
Complete definition of the storage structure (case 1)complex_object Set_of_Set_of_Emp [anchor_record_type=Anchor_Rec]
set[implementation=array, element_placement=referenced(Structure_Rec)] ofset [implementation=array, element_placement=referenced (Emp_Rec)] of Emp
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures for Tuple Constructors
Here, the same degrees of freedom exist, in principle• Choice of a constructor data structure: allocation of the tuple to a
record or to several records• Materialized or referenced storage of the attribute values
Complex objects
Memory-based record addressing
Mapping of records
Large objects
New parameter: “location”attribute_description = attribute_name [location = location_type,
element_placement = placement_type]
For each attribute, “location” and “element_placement” can be separately specified
“Location” allows for the optimization of record access
© 2011 AG DBIS
DB connectionfor external data
5-34
paccording to access frequencies of the individual attributeslocation_type = primary secondary (record_type_name)
The constructor data structure of a tuple can be divided into several records.
Using “primary”, the related attribute is allocated in the primary block.
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures – Example
Instance of an Employee relation
Employee
Emp_No Name Salary CV
77234 Roberts 4000 Mrs. Julia Roberts is born …
Complex objects
Memory-based record addressing
Mapping of records
Large objects
77235 Bond 5000 Mr. James Bond is born …
Definition of a related storage structure1. Referenced Prim_Rec
complex_object Employee [anchor_record_type=Link_Rec]set [implementation=linked_list, element_placement=referenced (Prim_Rec)]
of tuple(Emp No [location=primary element placement=inplace] : integer
© 2011 AG DBIS
DB connectionfor external data
5-35
(Emp_No [location=primary, element_placement=inplace] : integer,Name [location=primary, element_placement=inplace] : string (30),Salary [location=secondary (Sec_Rec),
element_placement=inplace] : real,CV [location=secondary (Sec_Rec),
element_placement=referenced (CV_Rec)]] : var_string)
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Structures – Example (2)
Related storage structures for the Employee relation
77234 Roberts 4000 Mrs Julia Roberts is born
1. Link_Rec Prim_Rec Sec_Rec CV_Rec
1. Referenced Prim_Rec
Complex objects
Memory-based record addressing
Mapping of records
Large objects
77234 Roberts 4000 Mrs. Julia Roberts is born …
77235 Bond 5000 Mr. James Bond is born …nil
2. Materialized Prim_Reccomplex_object Employee [anchor_record_type=Link_Rec]
set [implementation=linked_list, element_placement=inplace] of . . .
© 2011 AG DBIS
DB connectionfor external data
5-36
nil 77235 Bond 5000 Mr. James Bond is born …
77234 Roberts 4000 Mrs. Julia Roberts is born …
2. Link_Rec Sec_Rec CV_Rec
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Large Objects
Requirements• Ideally no size restriction• General administration functions• Tailor-made processing functions, . . .
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Examples for large objects (today up to n (=4) TByte)• Texts, CAD data• Image data, audio sequences • Videos, . . .
Principal possibilities of DB integration
Storage as LOB in the DB Storage using DataLinks concept in external file servers
© 2011 AG DBIS
DB connectionfor external data
5-37
Storage as LOB in the DB(mostly indirect storage)
Employees
PHOTODEPTNAME
ORDBS server
g g p
Employees
ORDBS server
PHOTOIDDEPTNAME
Image server
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage of Large Objects*
Representation of large storage objects• potentially consists of many pages or segments• is an uninterpreted byte sequence• Address (OID, object identifier) points to object header • OID is proxy in the record which the long field belongs to
R i d i fl ibili d i h d
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• Required processing flexibility determines access paths and storage structure
Processing problems• Is object size known in advance?• Are many modifications anticipated during the life time of the object?• Is fast sequential access needed? . . .
Mapping onto external storage• Page based
© 2011 AG DBIS
DB connectionfor external data
5-38
g- Unit of storage allocation: page, “scattered” collection of pages
• Segment based (several pages)- Segments of fixed size (Exodus), segments of variable size (EOS)- Segments with a fixed growth pattern (Starburst)
• Access structure to the object- Chain of segments/pages- List of entries (descriptors), B*-tree
* Biliris, A.: The Performance of Three Database Storage Structures for Managing Large Objects,Proc. ACM SIGMOD’92 Conf., San Diego, Calif., 1992, pp. 276-285
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Long Fields in Exodus
Storage of long fields• Data are kept in (small) segments of fixed size
• Choice of segments sizes adjusted to the processing characteristics
• Insertion of byte sequences is simple and possible anywhere
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• Performance degradation in case of sequential access
B*-tree as access structure• Leaves are segments of fixed size (here 4 pages of 100 bytes)
• Internal nodes and root represent an index for byte positions
• For each child-node, internal nodes and root store entries of
the form (page-#, counter)
© 2011 AG DBIS
DB connectionfor external data
5-39
the form (page #, counter)
- Counter maintains the maximum byte number of the corresponding subtree (page entries on the left-hand side belong to the subtree)
- Object size: counter in the right-most entry of the root
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Long Fields in Exodus (2)
Representation of very long dynamic objects• Up to n GBytes using three tree levels (even for small segments)• Space occupancy typically ~ 80%
OID
Complex objects
Memory-based record addressing
Mapping of records
Large objects
root
internalnodes
(pages)
leafnodes
(segments) 350 250 300 400 280 230
© 2011 AG DBIS
DB connectionfor external data
5-40
• How to determine object position of byte 100 in the last page?
Special operations• Search for a byte interval• Insertion/deletion of a byte sequence on a given position• Attachment of a byte sequence at the end of the long field
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Long Fields in Exodus* (3)
Support of versioned storage objects• Labeling of the object header with a version number• Copy and update only of those pages which differ in the new
version (in update operations for which versioning is turned on)
V2
Complex objects
Memory-based record addressing
Mapping of records
Large objects
900 1810
V1
V2
200
400 680 880
900 1780Version-determining operation:deletion of 30 bytesat the end of V1
© 2011 AG DBIS
DB connectionfor external data
5-41* M.J. Carey, D.J. DeWitt, J.E. Richardson, E.J. Shekita: Object and File Management in the EXODUS
Extensible Database System. Proc. 12th VLDB Conf., 1986, pp. 91-100
350 600 900
350 250 300 400 280 230
400 680 910
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Long Fields in Starburst
Enhanced requirements• Efficient allocation and release of storage for fields of 100 MB up to 2 GB• High I/O performance: write and read operations with raw-disk speed
Principal representation• Descriptor containing list of segment specifications
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Descriptor containing list of segment specifications• Long field consists of one or several segments• Segments, also denoted as Buddy segments, are allocated using the Buddy
method in large predefined extents of fixed size on external storage
5 100 310descriptor first last
#segments
© 2011 AG DBIS
DB connectionfor external data
5-42
Segment allocation when object size is known in advance• Object size G (in pages) • G MaxSeg: a single segment is allocated• G > MaxSeg: a sequence of maximum segments is allocated• Last segment is reduced to remaining object size
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Long Fields in Starburst* (2) Segment allocation when object size is not known
• Growth pattern of segment sizes as shown: 1, 2, 4, ..., 2n pages are aggregated to a Buddy segment; MaxSeg = 2048 for n = 11
• If MaxSeg is reached, further segments are allocated with size MaxSeg • Last segment is reduced to remaining object size
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Allocation of Buddy segments using the binary Buddy method2n
22
20000 001 010 011 . . .
21
. . .0*
00* 01*
© 2011 AG DBIS
DB connectionfor external data
5-43* T. J. Lehman, B. G. Lindsay: The Starburst Long Field Manager. Proc. 15th VLDB Conf., 1989, pp. 375-383
• Aggregation of two buddies of size 2n 2n+1 (n > 0)
Processing properties• Efficient support of sequential and random reads• Simple attachment and removal of byte sequences at the end of object
• Difficult insertion and deletion of byte sequences within the object
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Starburst: Storage Organization for Long Fields
Relation
DBSpace#
Size(bytes)
Numberof BSEGS
Size ofFirst
Size ofLast
Offset#1
Offset#2 . . . Offset
#N
DB Space
Long Field Descriptor
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Implementation of a long field
DB Space
Counts Pointers
AllocationBit Array
Buddy Space
© 2011 AG DBIS
DB connectionfor external data
5-44
p g• Long field descriptor (< 316 bytes) is stored in relation
• Long field consists of one or several Buddy segments, which are allocated in large predefined Buddy Spaces of fixed size on disk
• Buddy segments contain only data and no control information and consist of 1, 2, 4, 8, ... or 2048 pages ( max. segment size 2 MB when using 1 KB pages)
• Buddy Spaces are allocated in (even larger) DB files (DB Spaces). They are composed of control page (allocation page) and data area
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Storage Allocation Using Variable Segments Generalization of the approaches of Exodus and Starburst in Eos
• Object is stored in a sequence of segments of variable size• Segment consists of pages allocated in physical contiguity on external storage • Only the last page of a segment can contain free space
Principal representation
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Principal representation
950 1250 430 560
1250 1810
© 2011 AG DBIS
DB connectionfor external data
5-45
The sizes of the various segments can widely differ
Processing properties• The operational properties of both underlying approaches can be obtained• Reorganization is possible, if adjacent segments become very small (page only)
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Summary
Free-placement information at different levels required: device, segment (file), page
Goals of disk-based addressing• Combination of direct-access speed and flexibility of indirection
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• Record displacements in a page without side effects
TID, DBK (allocation table) or primary key
Indexing of tables• Physical or hybrid methods in case of unordered tables• Hybrid methods combined with primary key in case of ordered tables
(Index-organized tables)
Memory-based addressing (Pointer Swizzling)
© 2011 AG DBIS
DB connectionfor external data
5-46
y g ( g)• Transparent program access to persistent and transient objects• Mapping of long disk addresses onto virtual addresses• Orthogonal classification criteria: location, point of time, mode
Mapping of records• Storing fields of variable length • Dynamic extension possible• Computation of field addresses
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Summary (2)
Storage of complex objects• Constructors for lists, sets, and tuples • Application of constructors is orthogonal and recursive
Large objects need efficient DBMS support
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• Tailor-made processing techniques & performance properties needed• Transport to the application (minimization of copies needed)
• Query optimization, evaluation of LOB functions, synchronization, logging and recovery
Storage of large objects gains increasing importance• B*-tree technique: flexible representation, moderate access speed• Large segments (lists) of variable length: high I/O performance
h f h l d h h
© 2011 AG DBIS
DB connectionfor external data
5-47
• Choice of various techniques tailored to the processing characteristics
DB linkage for external files• DB support desired for management, consistency control, and content-based
search • DataLinks concept provides referential integrity, access control,
coordinated backup and recovery as well as transaction consistency
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Complex objects
Memory-based record addressing
Mapping of records
Large objects
© 2011 AG DBIS
DB connectionfor external data
5-48
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Summary (2)
Storage of complex objects• Constructors for lists, sets, and tuples• Application of constructors is orthogonal and recursive
Large objects need efficient DBMS support
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• Tailor-made processing techniques & performance properties needed• Transport to the application (minimization of copies needed)
• Query optimization, evaluation of LOB functions, synchronization, logging and recovery
Storage of large objects gains increasing importance• B*-tree technique: flexible representation, moderate access speed• Large segments (lists) of variable length: high I/O performance
h f h l d h h
© 2011 AG DBIS
DB connectionfor external data
5-49
• Choice of various techniques tailored to the processing characteristics
Realizationof DBS
Disk-based record addressing
Free-placementadministration
DB Linkage of External Data Motivation
• Most data in an enterprise are stored in files• They will increase for a long time and even grow in volume• Because many applications are based on files, file access has to be supported, too
(uniform access to DBs and other data sources)
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Properties• File systems do not provide sufficient meta-data for search functions and integrity
preservation• DBMS support a wide spectrum of functions, but are currently not optimized for the
storage of a large number of BLOBs (multimedia types) • BLOBs need hierarchical storage management of powerful file systems (e.g., tertiary
storage) which guarantee cost-effective processing of data for varying access pattern (frequent or rare changes)
© 2011 AG DBIS
DB connectionfor external data
5-50
Linkage of file systems and DBMSs should combine pros of both approaches!
Application examples• CAD systems: synchronization of millions of components/files (proprietary format)
• Multimedia objects: management of libraries for images, documents, or videos
• HTML and XML files: DB support for the functionality of Web servers
Realizationof DBS
Disk-based record addressing
Free-placementadministration
DB Linkage of External Data* (2)
Storage model for DB linkageDBMS
file system 1
fileURL1
Complex objects
Memory-based record addressing
Mapping of records
Large objects
file system n
fileSQL table
URL2
Which problems have to be solved?• Referential integrity• Access control
© 2011 AG DBIS
DB connectionfor external data
5-51* Information Technology – Database Language SQL - Part 9: Management of External Data, International Standard, May 2001 (www.jtc1sc32.org)
• Coordinated backup and recovery• Transaction consistency• Additionally: search via conventional data types, contents of external data• Performance aspects in DB and file applications
Participating file systems need additional control componentwhich cooperates with the DBMS via special protocols
Realizationof DBS
Disk-based record addressing
Free-placementadministration
DB Linkage of External Data (3)
DataLinks concept for the management of external data
Emp table
file APISQL APIApplications
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Emp tableName DNo Photo
=DataLink type (URL)
imagesin
externalfiles
DataLinks File System Filter (DLFF)• Enforces referential integrity when files are renamed or deleted• Enforces DB-centric access control when a file is opened • File API remains unchanged – no changes in the applications
© 2011 AG DBIS
DB connectionfor external data
5-52
File API remains unchanged no changes in the applications• DLFF does not reside in the read/write path for external files (performance!)
DataLinks File Manager (DLFM)• Executes Link/UnLink operations under transaction protection • Guarantees referential integrity• Supports coordinated backup/recovery
DBMS manages/coordinates operations on external files • Via referenced URLs• Via DLFM API
Realizationof DBS
Disk-based record addressing
Free-placementadministration
DB Linkage of External Data (5)
DataLinks architecture
Data-LinksFile
standardfile system
AIX,HP-UX,
Applicationdirect takeover of data
standard data access
List ofSQL
Complex objects
Memory-based record addressing
Mapping of records
Large objects
FileMgr
,Solaris,
Windows
files
DBMSusing
DataLinksextension
DB
URLsSQL
hierarchicalstorage management
© 2011 AG DBIS
DB connectionfor external data
5-53
Typical application• Integration of unstructured and semi-structured data with
applications based on DBMS use• Reach: large number of files in computer networks• Using function value indexing: files referenced via URLs remain unchanged• User extracts features of images or videos and stores them in the DB
to perform evaluations together with predicates on other DB data • Query By Image Content (QBIC) supports extraction/search of such features.
Realizationof DBS
Disk-based record addressing
Free-placementadministration
DB Linkage of External Data (4) Processing model from the viewpoint of the application
• SQL access to meta-data repository for external data• Search is also possible via content of external data Function value indexing• List of references of searched objects • Application references external data directly via file API
DataLinks data type in SQL:99 – example
Complex objects
Memory-based record addressing
Mapping of records
Large objects
DataLinks data type in SQL:99 exampleCREATE TABLE Emp (
Name VARCHAR (30);DNo INTEGER,Photo DATALINK (200)
LINKTYPE URLFILE LINK CONTROL
INTEGRITY allREAD PERMISSION DBWRITE PERMISSION blockedRECOVERY yesON UNLINK restore);
• DBMS control can be activated in a leveled way
© 2011 AG DBIS
DB connectionfor external data
5-54
• URL: http://server name/pathname/filename/• Integrity: URLs are kept consistent as references • Read Permission: either at the file system or is delegated to the DBMS.
Authorization is embedded as a token in the URL• Write Permission: either at the file system or is blocked • Recovery: coordinated backup and recovery is only possible
for option WRITE PERMISSION blocked• On Unlink: file can be deleted or can be returned under file system control
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Large Objects (2)
Principal possibilities of DB integration
Storage as LOB in the DB (mostly indirect storage)
BLOB - Binary Large ObjectORDBS
Complex objects
Memory-based record addressing
Mapping of records
Large objects
y g jfor audio, image data etc.
CLOB - Character Large Objectfor text data
DBCLOB - Double Byte CharacterLarge Object (DB2)for special graphic data etc.
Employees
PHOTOABTNAME
ORDBS server
Storage using DataLinks concept in external file servers
© 2011 AG DBIS
DB connectionfor external data
5-55
g g p
Employees
ORDBS server
PHOTOIDABTNAME
Bilddatei (Server)
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Large Objects (3)
Creation of LOB columns*
LOB column definition
column name BLOB ( n )
Complex objects
Memory-based record addressing
Mapping of records
Large objects
column name BLOB ( n )CLOB
DBCLOBKMG
LOGGED
NOT LOGGED
NOT COMPACT
COMPACT
© 2011 AG DBIS
DB connectionfor external data
5-56* The realization examples correspond to DB2 – Universal Database
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Large Objects (4)
ExamplesCREATE TABLE Graduate
(RunNo Integer,Name Varchar (50),
Complex objects
Memory-based record addressing
Mapping of records
Large objects
. . .Photo BLOB (5 M) NOT LOGGED COMPACT, -- imageCV LOB (16 K) LOGGED NOT COMPACT); -- text
CREATE TABLE Design(Pno Char (18),Time_of_Update Timestamp,Updated_By Varchar (50)Drawing BLOB (2 M) LOGGED NOT COMPACT); -- graphic
© 2011 AG DBIS
DB connectionfor external data
5-57
ALTER TABLE GraduateADD COLUMN MasterThesis CLOB (500 K)
LOGGED NOT COMPACT;
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Large Objects (5)
Specification of LOBs requires care• Maximum lenght
- Reservation of an application buffer- Clustering and Optimzation using indirect storage allocation;
descriptor in the tuple is dependent on the LOB size(72 bytes when <1K up to 316 bytes for 2G)
Complex objects
Memory-based record addressing
Mapping of records
Large objects
- For smaller LOBs (< page size), direct storage allocation possible
• Compact storage- COMPACT reserves no space for later growth
What may happen in case of a LOB modification?- NOT COMPACT is default
• Logging- LOGGED: in case of updates, LOB column is treated like all other columns
(ACID!)
© 2011 AG DBIS
DB connectionfor external data
5-58
( ) What does this mean for the log file?
- NOT LOGGED: updates are not recorded in the log file. So-called shadow pages (shadowing) guarantee atomicity until Commit
1 2 3 41 2 3 4Lob1
Lob1‘
What happens in case of a device failure?
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Large Objects (6)
How are large objects processed?• BLOB and CLOB are no types of the host language Special declaration of BLOB, CLOB, ... by SQL TYPE ist required, because
they use the same host language types. Furthermore, it can be guaranteed that the length to be expected by the DBMS can be exactly met
Complex objects
Memory-based record addressing
Mapping of records
Large objects
that the length to be expected by the DBMS can be exactly met.
Preparations required in the AP• SQL TYPE IS CLOB (2 K) c1 (or BLOB (2 K))
is translated by the C-precompiler into
static struct c1_t{unsigned long length;char data [2048];} c1;
© 2011 AG DBIS
DB connectionfor external data
5-59
} c1;
• Creation of a CLOB
c1.data = ‘Hello’;c1.length = sizeof (‘Hello’)-1;
can be hidden by the use of makros (e.g., c1 = SQL_CLOB_INIT(‘Hello’);)
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Large Objects (7)
Insert, delete, and update can be performed similar to other types, if sufficiently large AP buffers exist
Complex objects
Memory-based record addressing
Mapping of records
Large objects
Fetch the data for Graduate having RunNo 17 into AP
. . .SELECT Name, Photo, CVfINTO :x, :y :yindicator, :z :zindicatorFROM GraduateWHERE RunNo = 17;
© 2011 AG DBIS
DB connectionfor external data
5-60
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Large Objects (8)
Which operations can be applied to LOBs?• Comparison predicates: =, <>, <, <=, >, >=, IN, BETWEEN
• LIKE predicate
Complex objects
Memory-based record addressing
Mapping of records
Large objects
• Uniqueness or sequence for LOB values- PRIMARY KEY, UNIQUE, FOREIGN KEY- SELECT DISTINCT, . . ., COUNT (DISTINCT)- GROUP BY, ORDER BY
• Use of of aggregate functions like MIN, MAX
© 2011 AG DBIS
DB connectionfor external data
5-61
• Operations- UNION, INTERSECT, EXCEPT- joins of LOB attributes
• Index structures across LOB columns
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Large Objects (9)
How can LOBs be indexed?• User-defined function assigns index values to LOBs • Function value indexing
Complex objects
Memory-based record addressing
Mapping of records
Large objects
f(blob1) = x BLOBindex
blob1
Is direct processing of LOBs in AP realistic?
Books EXEC SQL
© 2011 AG DBIS
DB connectionfor external data
5-62
(Title Varchar (200), SELECT Abstract, Booktext,VideoBNO ISBN, INTO :kilobuffer, :megabuffer, :gigabufferAbstract CLOB (32 K),Booktext CLOB (20 M), FROM BooksVideo BLOB (2 G)) WHERE Title = ‘American Beauty’
Realizationof DBS
Disk-based record addressing
Free-placementadministration
Large Objects (10) Client/Server architecture
AP
Client
DBMS
AP buffer
Complex objects
Memory-based record addressing
Mapping of records
Large objects • Allocation of buffers?
Server
DB
DB buffer
© 2011 AG DBIS
DB connectionfor external data
5-63
• Transfer of an entire LOB into the AP?
• Should the transfer be performed via the DB buffer?
• “Piece-wise” processing of LOBs required by AP!
Locator concept for the access to LOBs