NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and...
-
Upload
marilynn-armstrong -
Category
Documents
-
view
213 -
download
1
Transcript of NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and...
![Page 1: NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.](https://reader035.fdocuments.us/reader035/viewer/2022072011/56649e225503460f94b0f925/html5/thumbnails/1.jpg)
NUITS: A Novel User Interface for Efficient Keyword Search over Databases
The integration of DB and IR provides users with a wide range of high quality services
Critical Challenges:
Efficiency
Result presentation
……
1. Internet users search information with search engines
Expect to query databases in the same way
Not know the database schema and SQL
2. Hidden Web problem
Most of data on the Web are stored in databases. They are hidden to Web search engines because of the mismatch of search interfaces between Web search engines and databases
3. Integrating different classes of information systems
Modern information systems manage many kinds of datastructured relational data
semi-structured XML documents unstructured text documents
……
Keyword search: to provide a unified query language/interface
NUITS High efficiency User-friendly result display Supporting advanced keyword queries
Characteristics of NUITS
Input:
In addition to simple keyword queries (a set of keywords), advanced queries with specified conditions are also supported
Output:
Search results can be organized into clusters, facilitating users' quick browsing
Search Engine:
Adopt a new and efficient search algorithm
Architecture
![Page 2: NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.](https://reader035.fdocuments.us/reader035/viewer/2022072011/56649e225503460f94b0f925/html5/thumbnails/2.jpg)
Keyword Query Specification Simple keyword: just a keyword
eg: database Typed keyword: a type can be either a relation-name or attribute-name
eg: Paper:database
Author:*
Writer:* Conditional keyword: conditions associated with a keyword
eg: database year>2000
database year~2000
Keyword query: a set of keywords associated with Boolean operators
Q ::= p | (Q) | Q AND Q | Q OR Q | NOT Q
p: a keyword which can be a simple keyword, typed keyword, or conditional keyword
Q: a keyword query
eg: Ullman AND (database OR algorithm)
TreeCluster: Clustering Results
In order to assist users to find the needed results, we propose to organize the result trees into two levels of clusters
Structural-Level Clustering: Clustering trees using tree isomorphism. The two trees below are isomorphic, and both mean the two authors Hristidis and Papakonstantinou coauthor papers
Search Algorithm NUITS adopts a data-graph-based search algorithm, and each result is a tuple-connection-tree The algorithm proposes a dynamic programming approach to find the optimal top-1 with a low time complexity It computes top-k minimum cost tuple-connection-trees one-by-one incrementally, and does not need to compute or sort all result trees in order to find the top-k results
An example of result tree Another example of result tree
The structural pattern of the above two trees
Content-Level Clustering: Based on keyword frequencies and content similarity. If the size of the cluster is larger than the user-given threshold after structural-level clustering, the content-level clustering further clusters result trees
eg: Considering a keyword query Gray Database, we cluster the results based on content of author names, because Gray occurs less frequently than Database in DBLP dataset. Then there may be clusters for different authors, such as Jim Gray or W.A. Gray
Note: All examples in this poster are based on DBLP dataset
![Page 3: NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.](https://reader035.fdocuments.us/reader035/viewer/2022072011/56649e225503460f94b0f925/html5/thumbnails/3.jpg)
Structural pattern
Implementation of TreeCluster
TreeCLuster is cost effective:
Use labels to represent schema information of each result tree and reformulate the clustering problem as a problem of judging whether labeled trees are isomorphic
Rank user keywords according to their frequencies in databases, and further partition the large clusters based on keyword nodes
Give each cluster a readable description, and present the description and each result graphically
Structural-Level
Clusters
Content-Level
Clusters
Tuple-connection-tree
Architecture