Implementation of Extended Indexes in Postgres This is a recopilation of original paper of Paul M....

35
Implementation of Extended Indexes in Postgres This is a recopilation of original paper of Paul M. Aoki Computer Science Departament Of EECS University of California, Berkeley [email protected].

Transcript of Implementation of Extended Indexes in Postgres This is a recopilation of original paper of Paul M....

Implementation of Extended Indexes in Postgres

This is a recopilation of original paper of Paul M. Aoki

Computer Science Departament Of EECS University of California,

Berkeley

[email protected]

Keywords

• IR – Information Retreival

• RDBMS – Relational DataBase Management System

Abstract

• The vaunted "Spartan simplicity“

• There is no natural way to model a keyword index

Abstract

• Focunsing on two issues– General problems– Features

Section One: Introducction

• Technology does not meet the needs

• Some new approaches

Introducction

• Some extension don’t fit precisely

• This paper is a case study of the implementation of one such extension.

Introducction

Mapping– Section 2 describes extended indexing as it was

originally proposed, including a discussion of its advantages over other solutions and some implementation difficulties that it presents.

– Section 3 gives an overview of the extensibility features of POSTGRES.

– Section 4 provides detalls of an implementation of this type of indexing under POSTGRES such as the modifications made to the original proposal

Section Two: Relational System for Information Retreival • There are two common choices

– Inverted-file System

– Relational System

Section Two: Relational System for Information Retreival – Inverted-File System

• Store collections in a order data struct

– Disventages• the user must generate code or

queries that make specific use of its properties.

Section Two: Relational System for Information Retreival

Relational Systems

Present collections of records as tables (relations).

Advantages:The data independenceHide storage structure

Section two: Relacional Systema for Inforamtion Retreival

• Computer search for the best method

Section Two: Relational System for Information Retreival

• Index:– In DBMS terminology,

For example: Q1: One might extract the values of a

particular field from each record in a table

Section Two: Relational System for Information Retreival

• I mean that one can build an index over the column "emp.salary"-texable_income(emp.salary)".

• This limits the usefulness of indexes to certain applications.

Section 2.1 : Extended Indexing

• User can add new index access methods to a DBMS.

• It must be associated with an ordering/partitioning class.

• The class information is used by query optimizer

Section 2.1 : Extended Indexing

– Example: BOXes• Build a set of binary Boolean operators <, <=, =,

> , >=

• Define an Ordering on Box colums

• Associating “box-area-operators” class

• Associating the B-Tree access method

Section 2.1 Extended Indexing

• Query optimizer sees a query that use “box area operators”

• All meta data is stored in system catalog

• Use on the fly

Section 2.1 : Extended Indexing

• Example: As a more realistic Bibliographic searches

Section 3: Extensibility in Postgres

• Extend the system

• Example: “Box” type, “box-equality” function, “box-equality-operator” = ,

R-tree

Section 3: Extensibility in Postgres

• Operators and Access method are assigned to classes

• Overloaded

• Dont need recompilate

Section 4: The Implementation

• Three stages– Type-function-operator definition

– Access method implementation

– Modification of Postgres internals

Section 4: The Implementation

Type Function/Operators definition

• Keyword and KeywordList• Function return a list

Section 4: The Implementation

Section 4: The Implementation

Modifications of Postgres internals

• System catalogs modifications

Section 5: Other modifications

• Changes query optimizer were minimal

• No changes to the query procesor

Conclusions

Any questions ?

[email protected]

Identifying Algebraic Properties to Support Optimization of Unary Similirity Queries

[email protected]@ucsp.edu.pe

Introducction

In 1970, Codd introduced the relational model, which is the foundation for most of the actual commercial DataBase Management Systems (DBMS).

It is based on the mathematical relation theory: the database is represented as a set of relations, where each relation is a table with tuples (or rows) and attributes(or columns).

Initially, the relational model supported only traditional data, i.e., numerical and string data types.

Elements of these types can be compared using exact matching = , <, <= , > , >=

Introducction

Now with the advent of multimedia and spatial applications, the Relational DBMS (RDBMS) must be able to support new data types, operators and kinds of queries.– Thus,similarity emerges as the natural way to

compare elements in complex domains, such as images, audios, videos, genomic sequences, and time series, and consequently handling operations based on similarity (or distance) between data becomes a must

Introduction

• To ilustrate this, – Query1: “In a health-care information system:

Given a mammography exam with images of left and right breast from cranio-caudal (RCC) and medio-lateral oblique (RMLO) views of a patient, show the exams whose texture do not dier more than 10 units from those in the exam".

Example

• Q2: In a health-care information system: \Given a head tomography exam of a patient showing a pathology, retrieve the 5 exams most similar not presenting pathology, and that texture do not dier more than 5 units from those in the exam".

• Q3: In Geographic Information Systems (GIS): \Find the 15 districts nearest to `Arequipa' that are not farther than 15 miles, and where the population having between 21 and 64 year is greater than 65-year-old population and over".

Partial Solution

• Multi- similirity Algebra(MSA)– It has been designed to integrate dierent

interpretations of similarity values – It has higher abstraction level and thus does

not address the problem of an \operational" algebra.

Introducciton

• None of these previous works has addressed optimizations based on query rewriting for the similarity-based select operators in complex expressions

Similarity Algebra