Chap 7 . Indexing
-
Upload
giacomo-birney -
Category
Documents
-
view
61 -
download
1
description
Transcript of Chap 7 . Indexing
![Page 1: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/1.jpg)
1
Chap 7. Indexing
![Page 2: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/2.jpg)
2
Chapter Objectives(1)
Introduce concepts of indexing that have broad applications in the design of file systems
Introduce the use of a simple linear index to provide rapid access to records in an entry-sequenced, variable-length record file
Investigate the implementation of the use of indexes for file maintenance
Introduce the template features of C++ for object I/O
Describe the object-oriented approach to indexed sequential files
![Page 3: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/3.jpg)
3
Chapter Objectives(2)
Describe the use of indexes to provide access to records by more than one key
Introduce the idea of an inverted list, illustrating Boolean operations on lists
Discuss of when to bind an index key to an address in the data file
Introduce and investigate the implications of self-indexing files
![Page 4: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/4.jpg)
4
Contents(1)
7.1 What is an Index?
7.2 A Simple Index for Entry-Sequenced Files
7.3 Using Template Classes in C++ for Object I/O
7.4 Object-Oriented Support for Indexed, Entry-
Sequenced Files of Data Objects
7.5 Indexes That Are Too Large to Hold in Memory
![Page 5: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/5.jpg)
5
Contents(2)
7.6 Indexing to Provide Access by Multiple Keys
7.7 Retrieval Using Combinations of Secondary Keys
7.8 Improving the Secondary Index Structure: Inverted Lists
7.9 Selective Indexes
7.10 Binding
![Page 6: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/6.jpg)
6
Overview: Index(1)
Index: a data structure which associates given key values with corresponding
record numbers
It is usually physically separate from the file (unlike for indexed sequential
files tight binding).
Linear indexes (like indexes found at the back of books)
Index records are ordered by key value as in an ordered relative file
Best algorithm for finding a record with a specific key value is binary
search
Addition requires reorganization
![Page 7: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/7.jpg)
7
Overview: Index(2)
k1 k2 k4 k5 k7 k9
k1 k2 k4 k5 k7 k9
AAA ZZZ CCC XXX EEE FFF
Index File
Data File
![Page 8: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/8.jpg)
8
Overview: Index(3)
Tree Indexes (like those of indexed sequential files)
Hierarchical in that each level
Beginning with the root level, points to the next record
Leaves POINTs only the data file
Indexed Sequential File
Binary Tree Index
AVL Tree Index
B+ tree Index
![Page 9: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/9.jpg)
9
Roles of Index?
Index: keys and reference fields
Fast Random Accesses
Uniform Access Speed
Allow users to impose order on a file without actually rearranging the
file
Provide multiple access paths to a file
Give user keyed access to variable-length record files
![Page 10: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/10.jpg)
10
A Simple Index(1)
Datafile entry-sequenced, variable-length record
primary key : unique for each entry in a file
Search a file with key (popular need) cannot use binary search in a variable-length
record file(can’t know where the middle record)
construct an index object for the file
index object : key field + byte-offset field
![Page 11: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/11.jpg)
11
A Simple Index (2)
ANG3795 167
COL31809 353
COL38358 211
DG18807 256
FF245 442
LON2312 32
MER75016 300
RCA2626 77
WAR23699 132
DG139201 396
LON|2312|Romeo and Juliet|Prokofiev . . .
RCA|2626|Quarter in C Sharp Minor . . .
WAR|23699|Touchstone|Corea . . .
ANG|3795|Sympony No. 9|Beethoven . . .
COL|38358|Nebeaska|Springsteen . . .
DG|18807|Symphony No. 9|Beethoven . . .
MER|75016|Coq d'or Suite|Rimsky . . .
COL|31809|Symphony No. 9|Dvorak . . .
DG|139201|Violin Concerto|Beethoven . . .
FF|245|Good News|Sweet Honey In The . . .
32
77
132
167
211
256
300
353
396
442
Datafile
Actual data recordAddress ofrecord
Referencefield
KeyIndexfile
![Page 12: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/12.jpg)
12
A Simple Index (3)
Index file: fixed-size record, sorted
Datafile: not sorted because it is entry sequenced
Record addition is quick (faster than a sorted file)
Can keep the index in memory
find record quickly with index file than with a sorted one
Class TextIndex encapsulates the index data and index operations
Key Reference field
![Page 13: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/13.jpg)
Let’s See Figure 7.4Class TextIndex{ public: TextIndex(int maxKeys = 100, int unique = 1);
int Insert(const char*ckey, int recAddr); //add to index int Remove(const char* key); //remove key from index int Search(const char* key) const;
//search for key, return recAddr void Print (ostream &) const; protected: int MaxKeys; // maximum num of entries int NumKeys;// actual num of entries char **Keys; // array of key values int* RecAddrs; // array of record references int Find (const chat* key) const; int Init (int maxKeys, int unique); int Unique;// if true --> each key must be unique}
![Page 14: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/14.jpg)
TextIndex::TextIndex
TextIndex:: TextIndex (int maxKeys, int unique)
: NumKeys (0), Keys(0), RecAddrs(0)
{Init (maxKeys, unique);}
TextIndex :: ~TextIndex ()
{delete Keys; delete RecAddrs;}
![Page 15: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/15.jpg)
TextIndex::Init
int TextIndex :: Init (int maxKeys, int unique)
{
Unique = unique != 0;
if (maxKeys <= 0)
{
MaxKeys = 0;
return 0;
}
MaxKeys = maxKeys;
Keys = new char *[maxKeys];
RecAddrs = new int [maxKeys];
return 1;
}
![Page 16: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/16.jpg)
TextIndex::Insert
int TextIndex :: Insert (const char * key, int recAddr){
int i;int index = Find (key);if (Unique && index >= 0) return 0; // key already inif (NumKeys == MaxKeys) return 0; //no room for another keyfor (i = NumKeys-1; i >= 0; i--){
if (strcmp(key, Keys[i])>0) break; // insert into location i+1Keys[i+1] = Keys[i];RecAddrs[i+1] = RecAddrs[i];
}Keys[i+1] = strdup(key);RecAddrs[i+1] = recAddr;NumKeys ++;return 1;
}
![Page 17: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/17.jpg)
TextIndex::Remove
int TextIndex :: Remove (const char * key)
{
int index = Find (key);
if (index < 0) return 0; // key not in index
for (int i = index; i < NumKeys; i++)
{
Keys[i] = Keys[i+1];
RecAddrs[i] = RecAddrs[i+1];
}
NumKeys --;
return 1;
}
![Page 18: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/18.jpg)
TextIndex::Search
int TextIndex :: Search (const char * key) const
{
int index = Find (key);
if (index < 0) return index;
return RecAddrs[index];
}
![Page 19: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/19.jpg)
TextIndex::Find
int TextIndex :: Find (const char * key) const
{
for (int i = 0; i < NumKeys; i++)
if (strcmp(Keys[i], key)==0) return i;// key found
else if (strcmp(Keys[i], key)>0) return -1;// not found
return -1;// not found
}
![Page 20: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/20.jpg)
Index Implementation
Page 706~709
G.1 Recording.h
G.2 Recording.cpp
G.3 Makerec.cpp
Page 710~712
G.4 Textind.h
G.5 Textind.cpp
![Page 21: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/21.jpg)
IndexRecordingFile
int IndexRecordingFile (char * myfile, TextIndex & RecordingIndex){
Recording rec; int recaddr, result;DelimFieldBuffer Buffer; // create a bufferBufferFile RecordingFile(Buffer); result = RecordingFile . Open (myfile,ios::in);if (!result){ cout << "Unable to open file "<<myfile<<endl; return 0; }while (1) // loop until the read fails{
recaddr = RecordingFile . Read (); // read next recordif (recaddr < 0) break;rec. Unpack (Buffer);RecordingIndex . Insert(rec.Key(), recaddr);cout << recaddr <<'\t'<<rec<<endl;
}RecordingIndex . Print (cout);result = RetrieveRecording (rec, "LON2312", RecordingIndex, RecordingFile);cout <<"Found record: "<<rec;
}
![Page 22: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/22.jpg)
RetrieveRecording
int RetrieveRecording (Recording & recording, char * key,
TextIndex & RecordingIndex, BufferFile & RecordingFile)
// read and unpack the recording, return TRUE if succeeds
{ int result;
cout <<"Retrieve "<<key<<" at recaddr "<<RecordingIndex.Search(key)<<endl;
result = RecordingFile . Read (RecordingIndex.Search(key));
cout <<"read result: "<<result<<endl;
if (result == -1) return FALSE;
result = recording.Unpack (RecordingFile.GetBuffer());
return result;
}
![Page 23: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/23.jpg)
Template Class RecordFile
we want to make the following code possible
– Person p; RecordFile pFile; pFile.Read(p);
– Recording r; RecordFile rFile; rFile.Read(r);
difficult to support files for different record types without having to
modify the class
Template class which is derived from BufferFile
– the actual declarations and calls
– RecordFile <Person> pFile; pFile.Read(p);
– RecordFile <Recording> rFile; rFile.Read(p);
Template Class for I/O Object(1)
![Page 24: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/24.jpg)
Template Class for I/O Object(2)
Template Class RecordFile
template <class RecType>class RecordFile : public BufferFile{ public:
int Read(RecType& record, int recaddr = -1); int Write(const RecType& record, int recaddr = -1); int Append(const RecType& record); RecordFile(IOBuffer& buffer) : BufferFile(buffer) {}
};//The template parameter RecType must have the following methods//int Pack(IOBuffer &); pack record into buffer//int Unpack(IOBuffer &); unpack record from buffer
![Page 25: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/25.jpg)
Adding I/O to an existing class RecordFile
add methods Pack and Unpack to class Recording
create a buffer object to use in the I/O
– DelimFieldBuffer Buffer;
declare an object of type RecordFile<Recording>
– RecordFile<Recording> rFile (Buffer);
Declaration and Calls
Template Class for I/O Object(3)
Recording r1, r2;rFile.Open(“myfile”);rFile.Read(r1);rFile.Write(r2);
Directly open a file and read andwrite objects of class Recording
![Page 26: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/26.jpg)
Object-Oriented Approach to I/O
Class IndexedFile
add indexed access to the sequential access provided by class RecordFile
extends RecordFile with Update, Append and Read method
– Update & Append : maintain a primary key index of data file
– Read : supports access to object by key
TextIndex, RecordFile ==> IndexedFile
Issues of IndexedFile
– how to make a persistent index of a file
– how to guarantee that the index is an accurate reflection of the contents
of the data file
![Page 27: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/27.jpg)
27
Create the original empty index and data files
Load the index file into memory
Rewrite the index file from memory
Add records to the data file and index
Delete records from the data file
Update records in the data file
Update the index to reflect changes in the data file
Retrieve records
Basic Operations of IndexedFile(1)
![Page 28: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/28.jpg)
28
Basic Operations of TextIndexedFile (1)
Creating the files
initially empty files (index file and data file) created as empty files with header records
implementation ( makeind.cpp in Appendix G ) Create method in class BufferFile
Loading the index into memory
loading/storing objects are supported in the IOBuffer classes
need to choose a particular buffer class to use for an index file ( tindbuff.cpp in Appendix G )
– define class TextIndexBuffer as a derived class of FixedFieldBuffer to support reading and writing of index objects
![Page 29: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/29.jpg)
29
Rewriting the index file from memory
part of the Close operation on an IndexedFile
write back index object to the index file
should protect the index when failure
write changes when out-of-date(use status flag)
Implementation – Rewind and Write operations of class BufferFile
Record Addition
Basic Operations of TextIndexedFile(2)
Add an entry to the index
Requires rearrangementif in memory, no file access using TextIndex.Insert
Add a new record to data file
using RecordFile<Recording>::Write
+
![Page 30: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/30.jpg)
30
Record Deletion
data file: the records need not be moved
index: delete entry really or just mark it
– using TextIndex::Delete
Record Updating (2 categories)
the update changes the value of the key field
– delete/add approach
– reorder both the index and the data file
the update does not affect the key field
– no rearrangement of the index file
– may need to reconstruct the data file
Basic Operations of TextIndexedFile(3)
![Page 31: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/31.jpg)
Class TextIndexedFile(1)
Members
methods
– Create, Open, Close, Read (sequential & indexed), Append, and
Update operations
protected members
– ensure the correlation between the index in memory (Index),
the index file (IndexFile), and the data file (DataFile)
char* key()
– the template parameter RecType must have the key method
– used to extract the key value from the record
![Page 32: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/32.jpg)
Class TextIndexedFile(2)Template <class RecType>class TextIndexedFile{ public:
int Read(RecType& record); // read next recordint Read(char* key, RecType& record) // read by key int Append(const RecType& record);int Update(char* oldKey, const RecType& record);int Create(char* name, int mode=ios::in|los::out);int Open(char* name, int mode=ios::in|los::out);int Close();TextIndexedFile(IOBuffer & buffer, int keySize, int maxKeys=100);~TextIndexedFile(); // close and delete
protected:TextIndex Index; BufferFile IndexFile;TextIndexBuffer IndexBuffer;RecordFile<RecType> DataFile;char * FileName; // base file name for fileint SetFileName(char* fName, char*& dFileName, char*&IdxFName);
};
![Page 33: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/33.jpg)
TextIndexedFile 생성자 / 소멸자
template <class RecType>
TextIndexedFile<RecType>::TextIndexedFile (IOBuffer & buffer,
int keySize, int maxKeys) : DataFile(buffer), Index (maxKeys),
IndexBuffer(keySize, maxKeys),
IndexFile(IndexBuffer)
{
FileName = 0;
}
template <class RecType>
TextIndexedFile<RecType>::~TextIndexedFile (){ Close(); }
![Page 34: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/34.jpg)
TextIndexedFile::Createint TextIndexedFile<RecType>::Create (char * fileName, int mode)// use fileName.dat and fileName.ind{ int result;
char * dataFileName, * indexFileName;result = SetFileName (fileName, dataFileName, indexFileName);cout <<"file names "<<dataFileName<<" "<<indexFileName<<endl;if (result == -1) return 0;result = DataFile.Create (dataFileName, mode);if (!result){
FileName = 0; // remove connectionreturn 0;
}result = IndexFile.Create (indexFileName, ios::out|ios::in);if (!result){
DataFile . Close(); // close the data fileFileName = 0; // remove connectionreturn 0;
}return 1;
}
![Page 35: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/35.jpg)
TextIndexedFile::Opentemplate <class RecType>int TextIndexedFile<RecType>::Open (char * fileName, int mode)// open data and index file and read index file{ int result;
char * dataFileName, * indexFileName;result = SetFileName (fileName, dataFileName, indexFileName);if (!result) return 0;// open filesresult = DataFile.Open (dataFileName, mode);if (!result) { FileName = 0; return 0; }result = IndexFile.Open (indexFileName, ios::out);if (!result) { DataFile . Close(); FileName = 0; return 0; }// read index into memoryresult = IndexFile . Read ();if (result != -1) {result = IndexBuffer . Unpack (Index);if (result != -1) return 1; }DataFile.Close();IndexFile.Close();FileName = 0;return 0;
}
![Page 36: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/36.jpg)
TextIndexedFile::Read
template <class RecType>
int TextIndexedFile<RecType>::Read (RecType & record)
{ return result = DataFile . Read (record, -1);}
template <class RecType>
int TextIndexedFile<RecType>::Read (char * key, RecType & record)
{
int ref = Index.Search(key);
if (ref < 0) return -1;
int result = DataFile . Read (record, ref);
return result;
}
![Page 37: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/37.jpg)
TextIndexedFile::Append
template <class RecType>
int TextIndexedFile<RecType>::Append (const RecType & record)
{
char * key = record.Key();
int ref = Index.Search(key);
if (ref != -1) // key already in file
return -1;
ref = DataFile . Append(record);
int result = Index . Insert (key, ref);
return ref;
}
![Page 38: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/38.jpg)
TextIndexedFile::Close
template <class RecType>
int TextIndexedFile<RecType>::Close ()
{ int result;
if (!FileName) return 0; // already closed!
DataFile . Close();
IndexFile . Rewind();
IndexBuffer.Pack (Index);
result = IndexFile . Write ();
cout <<"result of index write: "<<result<<endl;
IndexFile . Close ();
FileName = 0;
return 1;
}
![Page 39: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/39.jpg)
TextIndexBuffer
class TextIndexBuffer: public FixedFieldBuffer
{public:
TextIndexBuffer(int keySize, int maxKeys = 100,
int extraFields = 0, int extraSize=0);
// extraSize is included to allow derived classes to extend
// the buffer with extra fields.
// Required because the buffer size is exact.
int Pack (const TextIndex &);
int Unpack (TextIndex &);
void Print (ostream &) const;
protected:
int MaxKeys;
int KeySize;
char * Dummy; // space for dummy in pack and unpack
};
![Page 40: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/40.jpg)
TextIndexBuffer::TextIndexBuffer
TextIndexBuffer::TextIndexBuffer (int keySize, int maxKeys, int extraFields, int extraSpace)
: FixedFieldBuffer (1+2*maxKeys+extraFields,
sizeof(int)+maxKeys*keySize+maxKeys*sizeof(int) + extraSpace)
// buffer fields consist of numKeys, actual number of keys
// Keys [maxKeys] key fields size = maxKeys * keySize
// RecAddrs [maxKeys] record address fields size = maxKeys*sizeof(int)
{
MaxKeys = maxKeys;
KeySize = keySize;
AddField (sizeof(int));
for (int i = 0; i < maxKeys; i++)
{
AddField (KeySize);
AddField (sizeof(int));
}
Dummy = new char[keySize+1];
}
![Page 41: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/41.jpg)
TextIndexBuffer::Pack
int TextIndexBuffer::Pack (const TextIndex & index)
{
int result;
Clear ();
result = FixedFieldBuffer::Pack (&index.NumKeys);
for (int i = 0; i < index.NumKeys; i++)
{// note only pack the actual keys and recaddrs
result = result && FixedFieldBuffer::Pack (index.Keys[i]);
result = result && FixedFieldBuffer::Pack (&index.RecAddrs[i]);
}
for (int j = 0; j<index.MaxKeys-index.NumKeys; j++)
{// pack dummy values for other fields
result = result && FixedFieldBuffer::Pack (Dummy);
result = result && FixedFieldBuffer::Pack (Dummy);
}
return result;
}
![Page 42: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/42.jpg)
TextIndexBuffer::Unpack
int TextIndexBuffer::Unpack(TextIndex & index)
{
int result;
result = FixedFieldBuffer::Unpack (&index.NumKeys);
for (int i = 0; i < index.NumKeys; i++)
{// note only pack the actual keys and recaddrs
index.Keys[i] = new char[KeySize]; // just to be safe
result = result && FixedFieldBuffer::Unpack (index.Keys[i]);
result = result && FixedFieldBuffer::Unpack (&index.RecAddrs[i]);
}
for (int j = 0; j<index.MaxKeys-index.NumKeys; j++)
{// pack dummy values for other fields
result = result && FixedFieldBuffer::Unpack (Dummy);
result = result && FixedFieldBuffer::Unpack (Dummy);
}
return result;
}
![Page 43: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/43.jpg)
IndexRecordingFile
int IndexRecordingFile (char * myfile, TextIndexedFile<Recording> & indexFile){ Recording rec; int recaddr, result;
DelimFieldBuffer Buffer; // create a bufferBufferFile RecFile(Buffer); result = RecFile . Open (myfile,ios::in);if (!result){ cout << "Unable to open file "<<myfile<<endl;
return 0;}while (1) // loop until the read fails{ recaddr = RecFile . Read (); // read next record
if (recaddr < 0) break;rec. Unpack (Buffer);indexFile . Append(rec);
}Recording rec1;result = indexFile.Read ("LON2312", rec1);cout <<"Found record: "<<rec;
}
![Page 44: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/44.jpg)
Enhancements to TextIndexedFile(1)
Support other types of keys
Restriction: the key type is restricted to string (char *)
Relaxation: support a template class SimpleIndex with parameter for key
type
Support data object class hierarchies
Restriction: every object must be of the same type in RecordFile
Relaxation: the type hierarchy supports virtual pack methods
![Page 45: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/45.jpg)
Enhancements to TextIndexedFile(2)
Support multirecord index files
Restriction: the entire index fit in a single record
Relaxation: add protected method Insert, Delete, and Search to
manipulate the arrays of index objects
Active optimization of operations
Obvious: the most obvious optimization is to use binary search in the
Find method
Active: add a flag to the index object to avoid writing the index record
back to the index file when it has not been changed
![Page 46: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/46.jpg)
Where are we going?
Plain Stream File
Persistency ==> Buffer support ==> BufferFile
<incremental approach> Deriving BufferFile using
various other classes
Random Access ==> Index support => IndexedFile
<incremental approach> : Deriving TextIndexedFile using RecordFile and
TextIndex
![Page 47: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/47.jpg)
47
Too Large Index(1)
On secondary storage (large linear index)
Disadvantages
binary searching of the index requires several seeks(slower than a sorted
file)
index rearrangement requires shifting or sorting records on second storage
Alternatives (to be considered later)
hashed organization
tree-structured index (e.g. B-tree)
![Page 48: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/48.jpg)
48
Too Large Index (2)
Advantages over the use of a data file sorted by key even if the index is on the
secondary storage
can use a binary search
sorting and maintaining the index is less expensive than doing the data file
can rearrange the keys without moving the data records if there are pinned
records
![Page 49: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/49.jpg)
49
Index by Multiple Keys(1)
DB-Schema = ( ID-No, Title, Composer, Artist, Label)
Find the record with ID-NO “COL38358” (primary key - ID-No)
Find all the recordings of “Beethoven” (2ndary key - composer)
Find all the recordings titled “Violin Concerto” (2ndary key - title)
![Page 50: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/50.jpg)
50
Index by Multiple Keys(2)
Most people don’t want to search only
by primary key
Secondary Key
can be duplicated
Figure -->
Secondary Key Index
secondary key --> consult one
additional index (primary key
index)
BEETHOVEN ANG3795
BEETHOVEN DG139201
BEETHOVEN COL38358
COREA WAR23699
DVORAK COL31809
PROKOFIEV LON2312
RIMSKY-KORSAKOV MER75016
SPRINGSTEEN COL38358
SWEET HONEY IN THE R FF245
BEETHOVEN DG18807
Secondary key Primary key
Composer index
BEETHOVEN DG18807
![Page 51: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/51.jpg)
51
Secondary Index:Basic Operations(1)
Record Addition
similar to the case of adding to primary index
secondary index is stored in canonical form
– fixed length (so it can be truncated)
– original name can be obtained from the data file
can contain duplicate keys
local ordering in the same key group
![Page 52: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/52.jpg)
52
Secondary Index:Basic Operations (2)
Record Deletion (2 cases)
Secondary index references directly record
– delete both primary index and secondary index
– rearrange both indexes
Secondary index references primary key
– delete only primary index
– leave intact the reference to the deleted record
– advantage : fast
– disadvantage : deleted records take up space
![Page 53: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/53.jpg)
53
Secondary Index: Basic Operations (3)
Record Updating
primary key index serves as a kind of protective buffer
Secondary index references directly record
– update all files containing record’s location
Secondary index references primary key (1)
– affect secondary index only when either primary or secondary key is changed
Continued.
![Page 54: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/54.jpg)
54
Secondary Index: Basic Operations (4)
Secondary index references primary key(2)
when changes the secondary key
– rearrange the secondary key index
when changes the primary key
– update all reference field
– may require reordering the secondary index
when confined to other fields
– do not affect the secondary key index
![Page 55: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/55.jpg)
55
Retrieval of Records
Types
primary key access
secondary key access
combination of above
Combination of keys
using secondary key index, it is easy
boolean operation (AND, OR)
![Page 56: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/56.jpg)
56
Inverted Lists(1) Inverted List
a secondary key leads to a set of one or more primary keys
Disadvantages of 2nd-ary index structure
rearrange when adding
repeated entry when duplicating
Solution A: by an array of references
Solution B: by linking the list of references
![Page 57: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/57.jpg)
57
Array of References
BEETHOVEN ANG3795 DG139201 DG18807 RCA2626
COREA WAR23699
DVORAK COL31809
PROKOFIEV LON2312
RIMSKY-KORSAKOV MER75016
SPRINGSTEEN COL38358
SWEET HONEY IN THE R FF245
Secondary key Set of primary key references
Revised composer index
* no need to rearrange
* limited reference array
* internal fragmentation
![Page 58: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/58.jpg)
58
Inverted Lists (2)
Guidelines for better solution
no reorganization when adding
no limitation for duplicate key
no internal fragmentation
Solution B: by Linking the list of references
A list of primary key references
secondary key field, relative record number of the first corresponding primary
key reference
PROKOFIEV ANG36193
LON2312
![Page 59: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/59.jpg)
59
Linking List of References (1)
BEETHOVEN
COREA
PROKOFIEV
RIMSKY-KORSAKOV
SPINGSTEEN
SWEET HONEY IN THE R
DVORAK
3
2
7
10
6
4
9
LON2312
RCA2626
ANG23699
COL38358
DG18807
MER75016
COL31809
DG139201
ANG36193
WAR23699
-1
-1
-1
8
-1
1
-1
-1
5
0
0
1
2
3
4
5
6
7
8
9 FF245 -1
Secondary Index file Label ID List file
Improved revision of the composer index
0
1
2
3
4
5
6
10
![Page 60: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/60.jpg)
60
Linking List of References (2)
The primary key references in a separate, entry-sequenced file
Advantages
rearranges only when secondary key changes
rearrangement is quick
less penalty associated with keeping the secondary index file on secondary storage (less need for sorting)
Label ID List file not need to be sorted
reusing the space of deleted record is easy
![Page 61: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/61.jpg)
61
Linking List of References (3)
Disadvantage
same secondary key references may not be physically grouped
– lack of locality
– could involve a large amount of seeking
– solution: reside in memory
– same Label ID list can hold the lists of a number of secondary index files
– if too large in memory, can load only a part of it
![Page 62: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/62.jpg)
62
Selective Indexes
Selective Index: Index on a subset of records
Selective index contains only some part of entire index
provide a selective view
useful when contents of a file fall into several categories
– e.g. 20 < Age < 30 and $1000 < Salary
![Page 63: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/63.jpg)
63
Index Binding(1)
When to bind the key indexes to the physical address of its associated record?
File construction time binding
(Tight, in-the-data binding)
tight binding & faster access
the case of primary key
when secondary key is bound to that time
– simpler and faster retrieval
– reorganization of the data file results in modifications of all
bound index files
![Page 64: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/64.jpg)
64
Index Binding (2)
Postpone binding until a record is actually retrieved (Retrieval-time binding) minimal reorganization & safe approach mostly for secondary key
Tight, in-the-data binding is good when static, little or no changes rapid performance during retrieval mass-produced, read-only optical disk
![Page 65: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/65.jpg)
65
Let’s Review (1)
7.1 What is an Index?
7.2 A Simple Index for Entry-Sequenced Files
7.3 Using Template Classes in C++ for Object I/O
7.4 Object-Oriented Support for Indexed, Entry-
Sequenced Files of Data Objects
7.5 Indexes That Are Too Large to Hold in Memory
![Page 66: Chap 7 . Indexing](https://reader033.fdocuments.us/reader033/viewer/2022061612/5681389e550346895da05893/html5/thumbnails/66.jpg)
66
Let’s Review(2)
7.6 Indexing to Provide Access by Multiple Keys
7.7 Retrieval Using Combinations of Secondary Keys
7.8 Improving the Secondary Index Structure:
Inverted Lists
7.9 Selective Indexes
7.10 Binding