Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 1 -ISA: AN...

Hagenberg -Linz -Prague-Vienna

iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 1

-ISA: AN INCREMENTAL LOWER BOUND APPROACH FOR EFFICIENTLY FINDING APPROXIMATE NEAREST NEIGHBOR OF

COMPLEX VAGUE QUERIES

DANG Tran Khanh, KÜNG Josef, WAGNER Roland

Institute for Applied Knowledge Processing (FAW)

Johannes Kepler University of Linz

Austria

OUTLINE

Complex Vague Queries in the Vague Query System (VQS)

Similarity search problem of the VQS in the conventional DBMSs

Incremental hyper-Sphere Approach (ISA)

Overcome shortcomings of Incremental hyper-Cube Approach (ICA)

-ISA: Finding Approximate Nearest Neighbors of Complex

Vague Queries

The issue of the dimensionality curse

The issue of increasing the query condition number

Experimental Results

Conclusions

COMPLEX VAGUE QUERIES IN THE VAGUE QUERY SYSTEM

The VQS:

Introduced by Kueng and Palkoska 1997

Support similarity search capabilities in the conventional DBMSs: return

to users records semantically close to a given query

One of the VQS’s basic ideas:

• NCR-Tables (Numeric-Coordinate-Representation-Tables): keep

numeric semantic information of non-numeric attributes

NCR-Tables – an example

Colors Name red green blue

black 0 0 0 blue 0 0 255 light blue 173 216 230

dark blue 0 0 139

... ... ...

Car Nr Typ Col

L-1234 VW blue W-5679 Opel black ... ... ...

fuzzy field NCR-key NCR - columns

NCR-table

SELECT FROM CarWHERE

Col IS ‘dark blue‘INTO

myResultTable;

Complex Vague Queries in VQS: A simplified view of the problem

NCR-Table 1 NCR-Table n…

Index 1 … Index n

Value_nk…Value_1k...

…………

Value_n1…Value_11...

Attribute n…Attribute 1...Query relation

Vague query processing module

The issue of the dimensionality curse [Weber et al 1998; Beyer

et al 1999]

NCR-Tables with high-dimensional data:

• The probability of overlaps between a query and data regions is very

high, and thus the performance of multidimensional access methods

(MAMs) is decreased significantly

• A linear scan over the whole data set would perform better than

Approximate nearest neighbor problem:

dist(Q, P) (1+)dist(Q, P’) (1)

• Almost for single data sets: single–feature nearest neighbor (S-FNN)

queries [Arya et al 1998, Kleinberg 1997, Amato et al 2000, Ciaccia

and Patella 2000, etc.]

Solving Complex Vague Queries in VQS: “Random access“ [Fagin 1996] is impossible

……

Attr2Attr1Query

relation

……

[Values]Domain1Attr1

……

[Values]Domain1Attr1

Incremental hyper-Cube Approach (ICA) [Kueng and Palkoska 1999]

Issues with the ICA: see [Dang et al 2002a, Dang et al 2002b] for the details

How to determine the initial hyper-cubes ? How to extend the hyper-cubes in necessary case Accessing unnecessary disk pages and objects Repeated disk accesses Only best match record is returned (not top-k records)

INCREMENTAL HYPER-SPHERE APPROACH (ISA)

Input: A query relation/view S A complex vague query Q with n query conditions qi (i=1, 2… n) Assume each feature space (or NCR-Table) related to Q is managed

by a multidimensional index structure Fi

Output: Best match record/tuple Tmin for Q, TminS. Ties are arbitrarily broken.

Step 1: Search on each Fi for the corresponding qi using the adapted incremental algorithm for hyper-sphere range queries.

Step 2: Combine the searching results from all qi to find at least an appropriate record in S, which contains the returned NCR-Values with respect to each query condition. If there is no appropriate record found then go back to step 1.

Step 3: Compute total distances/scores for the found records using formula 2 below and find a record Tmin with the minimum total distance TDcur. Ties are arbitrarily broken.

Step 4: Compute the maximum searching radius for each qi with respect to TDcur using formula 3 below and continue doing the search as steps 1, 2 and 3 until one of two following conditions holds: (a) the current searching radius of each qi is greater than or equal to its maximum searching radius; (b) found a new appropriate record Tnew with the total distance TDnew<TDcur

Step 5: If condition (a) holds then return Tmin as the best match for Q. Otherwise, i.e. condition (b) holds, replace Tmin with Tnew, i.e. TDcur is also replaced with a smaller value TDnew, and go back to step 4

Modifying ISA to retrieve top-k records: see [Dang et al 2002b]

High-dimensional feature spacesand/or

Query condition number increases

ISA performance is decreased

-ISA: FINDING APPROXIMATE NEAREST NEIGHBORS OF COMPLEX

VAGUE QUERIES

CVQ = M-FNN (Multi-Feature Nearest Neighbor) query

Using lower bound total distance (LBTD)

VAGUE QUERIES Input:

A query relation/view S A complex vague query Q with n query conditions qi (i=1, 2… n) Assume each feature space (or NCR-Table) related to Q is managed

by a multidimensional index structure Fi

A real >0 used as a tolerant error

Output: (1+)-approximate NN record/tuple Tapp for Q, TappS. Ties are

arbitrarily broken.

Step 1: Search on each Fi for the corresponding qi using the adapted incremental algorithm for hyper-sphere range queries.

Step 2: Combine the searching results from all qi to find at least an appropriate record in S, which contains the returned NCR-Values with respect to each query condition. If there is no appropriate record found then go back to step 1.

Step 3: Compute total distances/scores for the found records using formula 2 and find a record Tapp with the minimum total distance TDcur. Ties are arbitrarily broken.

VAGUE QUERIES

Step 4: Let di be distance from query condition qi to the last NCR-Value returned in the corresponding feature space, which is being managed by Fi. Compute LBTD as follows:

LBTD = min {TDcur, di}, i=1,2…n (5)

Step 5: If TDcur <= (1+)LBTD, return Tapp as a (1+)-approximate NN record for Q. Otherwise, go to step 6

Step 6: Compute the maximum searching radius for each qi with respect to TDcur using formula 3 and continue doing the search as steps from 1 to 5 until the algorithm is stopped at step 5. If the current searching radius of a certain qi is greater than or equal to its maximum searching radius then searching on Fi is stopped

See next slice

VAGUE QUERIESLower Bound Total Distance - An example

QR Attr1 Attr2

VAGUE QUERIES

Approximate k-nearest neighbors

See our paper for more details

EXPERIMENTAL RESULTS

Data sets:

Uniformly distributed: 2, 4, and 8 dimensions (100K objects for each

of them)

Real: 9 and 16 dimensions (more than 64K feature vectors of

images, URL: http://kdd.ics.uci.edu/)

Using the SH-tree [Dang et al 2001a] to manage

multidimensional data

Page size: 8KB

100 query points were randomly selected from each

corresponding data set

2-condition (4-d and 8-d) NN queries, different values

2-condition (4-d) k-NN queries, = 0.2

EXPERIMENTAL RESULTS3-condition (2-d) NN queries, different values

2-condition NN queries (9-d and 16-d real data sets), =1

=1 means tolerant error is permitted up to 100% -ISA saved about 4.5 % and 1% of the affected object and disk access

number, individually, for 16-d data set while it remained the accuracy at 71%

One notable fact here is that the effective epsilon calculated as introduced in (Arya et al. 1998) is quite low, only 0.23. This is a very promising result.

CONCLUSIONS

-ISA: An Incremental Lower Bound Approach for Efficiently Finding Approximate Nearest Neighbor of Multi-Feature Queries in VQS

-ISA is one of the vanguard solutions to dealing with this problem

-ISA is very useful for application domains that the returned results need not to be exact but similar or approximate similar (with a certain tolerant error) to a given query. The experimental results have proven this. With a suitable value, the -ISA can save a very high percentage of the costs including both IO-cost and CPU-cost while it still preserves the accuracy of the returned results at a particularly very high value

-ISA is applicable to not only numeric domains such as NCR-tables, but also any ranked input

Application areas: TIS (tourist information systems), GIS, digital libraries, multimedia systems, etc.

More information

• URL: http://www.faw.uni-linz.ac.at/• E-mail: {khanh, jkueng, rwagner}@faw.uni-linz.ac.at

Research related to dealing with complex vague queries

The A0 algorithm [Fagin 1996] (There are some improvements of Fagin‘s algorithm, see the paper for more details): Finding top-k matches for a user query involving several

multimedia attributes Problem: this algorithm assumes that random access is

possible in the system. This assumption is correct only three following conditions hold:

1. there is at least a key for each subsystem,2. there is a mapping between the keys,3. and we must ensure that the mapping is one-to-one

In VQS: condition (1) is always satisfied (each fuzzy field are the key for the corresponding NCR-table), but there is no the mapping one-to-one between the fuzzy fields

Cannot be applied to our problem

Other approaches for multimedia databases: [Ortega et al 1997, Chaudhuri et al 1996, Boehm K. et al 2001] (see our paper)

Chaudhuri et al. 1999 introduced a solution to translate a top-k multi-feature query to a range query that the conventional DBMS can process. This approach employs information in the histograms kept by a relational system

Research related to dealing with complex vague queries (cont.)

ISA and J* algorithm

The ISA The J* algorithmThe input is ranked with support of the incremental algorithm adapted for range queries

Assume that the ranked input is available, do not show how to deal with it

Reduce the database access cost first; this cost and the processed states are reduced by taking into account the hyper-sphere range queries and computing the maximum searching radii

Reduce the processed states first, the database access cost is alleviated by iterative deepening technique (S. Russell and P. Norvig: Artificial Inteligence: A Modern Approach. Prentice Hall, Inc., 1995)

Derived from the ICA that had been introduced earlier and had the same overall goals as the J* alg.

Claimed to be the first alg. that can process “joins” of ranked input and multi-level joins

Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 1 -ISA: AN...

Documents

Transcript of Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 1 -ISA: AN...

LINZ MAGAZINE 2013

FH Hagenberg / OPEN COMMONS_LINZ: Open Innovation Toolbox

Publikationsverzeichnis - JKU Linz

Near Field Communication Research Lab · PDF fileNear Field Communication Research Lab Hagenberg ... Reader/Writer mode ... Near Field Communication Research Lab Hagenberg

W } P u ^ µ µ v Z } } u o o } } v - iiWAS€¦ · Baith Mohamed, Gerald Quirchmayr, Erich Schikuta Austria Session iiWAS – IC Machine Learning ^ ] } v Z ] W l ] Ç } E u } } ,

UNIVERSIT AT LINZ JKU

Eusaat Poster Linz

Linz - Totalitarismo y Autoritarismo

Solution to Peter Linz

Confidential Tarif 2014 - Team Travel Service · Excursion to Bavarian Castles incl. lunch 12 48 125 89 79 69 65 59 2 LINZ - LINZ - LINZ - LINZ - LINZ - LINZ - LINZ - LINZ UNTERRBRINGUNG

Univ.Prof.Mag.Dr. Gabriele Kotsis - iiWAS › conferences › momm2012 › keynotes › kotsis › slides… · Univ.Prof.Mag.Dr. Gabriele Kotsis gabriele.kotsis@jku.ac.at Department

Near Field Communication Research Lab Hagenberg · 9/4/2009 · Near Field Communication Research Lab Hagenberg Secure Element Development Josef Langer, Andreas Oyrer 4th Sept. 2009,

iiwas 2010

FM - JKU Linz

LINZ DANUBE CRUISES

Brucknerfest Linz 2013 Programmheft

From Science to Business: The Softwarepark Hagenberg

IKT Linz GmbH Open Government Data als Teil der Open Commons Region Linz Gerald Kempinger, IKT Linz GmbH Open Commons Region Linz, Stefan Pawel

Hagenberg CaMPUS - Transatlantic Exchange Partnership … · 2019. 10. 28. · Hagenberg CaMPUS 4 Studying in austria Life as an International Student in austria Congratulations on

Kinderfilmfestival 2013 Programmheft Linz