DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan...
-
Upload
hugh-webster -
Category
Documents
-
view
221 -
download
0
Transcript of DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan...
DBease: Making Databases User-Friendly and Easily
Accessible
Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng
Database Group, Department of Computer Science and Technology,
Tsinghua University, Beijing 100084, China
How to Access Databases?
• Traditional database-access methods:–SQL
Select title, author, booktitle, year From dblpWhere title Contains “search” And booktitle
Contains “cidr”
–Query-by-exmaple (Form)
–Keyword Search“search cidr”
CIDR'11 - DBease
(2)
cidr
Comparison of Different Methods
CIDR'11 - DBease
(3)
Usa
bil
ity
Too many
results!
Keyword Search• Is traditional keyword search good enough?
CIDR'11 - DBease
No result!
(4)
Form-based Search
• Form-based Search has the same problem.
CIDR'11 - DBease
Complicated and
stillno result!
(5)
Our Solution
CIDR'11 - DBease
(6)
Type-Ahead Search
Type-Ahead Search in Forms
SQL SuggestionU
sab
ilit
y
What is Type-Ahead Search?
CIDR'11 - DBease
(7)
Type-Ahead Search
• Advantages– On-the-fly giving users instant feedback– Helping users navigate the underlying
data– Tolerating inconsistencies between query
and data– Supporting Synonyms – Supporting XML data– Supporting Multiple tables
CIDR'11 - DBease
(8)
Problem Formulation
• Data: A set of records• Query
– Q = {p1, p2, …, pl}: a set of prefixes
– δ: Edit-distance threshold
• Result– A set of records having all query prefixes or their similar
forms (conjunctive)
CIDR'11 - DBease
Edit Distance:The number of edit operations
(insertion, deletion, substitution)transformed a string to another
ed(string, stang) =2
(9)
Indexing
• Trie Index• Words: root to leaves• Inverted lists on leaves
CIDR'11 - DBease
(10)
(11)CIDR'11 - DBease
Algorithm• Step 1: Find similar prefixes incrementally• Step 2: Retrieve the leaf nodes of similar prefixes • Step 3: Compute union lists of inverted lists of leaf nodes• Step 4: Intersect the union lists of query keywords
=cid r
Type-Ahead Search in Forms
CIDR'11 - DBease
(12)
Type-Ahead Search
Type-Ahead Search in Forms
Usa
bil
ity
What is Type-Ahead Search in Forms?
CIDR'11 - DBease
(13)
Type-Ahead Search in Forms
• Problem Formulation– Data: A relation with multiple attributes– Query: A set of prefixes on attributes in a
form interface – Answers:
• Local results of the focused attribute• Global results of the relation
• Advantages– On-the-fly Faceted Search– Supporting Aggregation
CIDR'11 - DBease
(14)
Data Partition
• Global Table Local Tables
CIDR'11 - DBease
(15)
ID Title Conf. Author
1 xml database
VLDB albert
2 xml database
SIGMOD
bob
3 xml search VLDB albert
4 xml security
VLDB alice
5 rdbms SIGMOD
charlie
ID Conf.
C1 VLDB
C2 SIGMOD
ID Author
A1 albert
A2 bob
A3 alice
A4 charlie
Indexing
• Each attribute– Trie– Mapping Tables
• Local Global• Global Local
CIDR'11 - DBease
(16)
……
Φ
x
m
l
s
e
T1: xml datrabase
T2: xml search
T3: xml security
a c
T1
Trie:1, 2
T2 3
T3 4
T4 5
L-G Mapping Table:
1 T1
2 T1
3 T2
4 T3
G-L Mapping Table:
5 T4
Our Solution
CIDR'11 - DBease
(17)
Author:
Title:
xml databasexml searchxml security
xml database (albert)xml database (bob)xml search (albert)xml security (alice)
……
Φ
x
m
l
s
e
T1: xml datrabase
T2: xml search
T3: xml security
a c
T1
Trie:1, 2
T2 3
T3 4
T4 5
L-G Mapping Table:
1 T1
2 T1
3 T2
4 T3
G-L Mapping Table:
5 T4
Author:
xmlTitle:
albertalice
xml database, albertxml search, albertxml security, aliceal
Our Solution
CIDR'11 - DBease
(18)
l
b
e
r
i
c
5: alice
4: albert
e
T1
Trie
1, 2
T2 3
T3 4
T4 5
L-G Mapping Table
1 T1
2 T1
3 T2
4 T3
G- L Mapping Table
5 T4
a
a
SQL Suggestion
CIDR'11 - DBease
(19)
Type-Ahead Search
Type-Ahead Search in Forms
SQL SuggestionU
sab
ilit
y
What is SQL Suggestion?
CIDR'11 - DBease
(20)
SQL Suggestion
• Problem Formulation– Data: A database with multiple tables– Query: A set of keywords– Answers: Relevant SQL queries
• Advantages– Suggest SQL queries based on keywords– Help users formulate SQL queries to find accurate
results– Designed for both SQL programmers and Internet users– Group answers based on SQL structures– Support Aggregation– Support Range queries
CIDR'11 - DBease
(21)
Our Solution
• Suggest Templates from Keywords– A template is a structure in
the databases– Modeled as a graph
• Nodes: entities (table names or attribute names)
• Edges: foreign keys or membership
• Suggest SQL queries from Templates– Mapping between keywords
and templates
CIDR'11 - DBease
(22)
keyword paper ir(a) Query
(b) Template
(c) SQL
Template Suggestion
• Template Generation– Extension from basic entities
(tables)
• Template Ranking– Template weight
• Pagerank – Relevancy between a keyword
and an entity• Tf*idf
• Algorithms– Fagin algorithms– Threshold-based pruning
techniques
CIDR'11 - DBease
(23)
SQL Suggestion
• SQL suggestion model– Mapping from keywords to templates – Matching is a set of mappings with all
keywords– Weighted set-covering problem (NP-hard)
• SQL ranking– Relevancy between keywords and attributes – Attribute weight
• Algorithms– Greedy algorithms
CIDR'11 - DBease
(24)
Search: dbeasehttp://dbease.cs.tsinghua.edu.cn
Differences to Google Instant Search• Fuzzy prefix matching• Google firstly predicts queries, and
then use the top queries to search the documents. Google may involve false negatives, while we can find the accurate top-k answers.
CIDR'11 - DBease
(27)
Differences to Complete Search• Fuzzy prefix matching• Different index structures• More efficient
CIDR'11 - DBease
(28)
Differences to Keyword Search• Effectiveness
– SQL Suggestion supports range queries, and aggregation functions.
– SQL Suggestion can group answers.– SQL Suggestion can help users to express
their query intent more accurately.
• Efficiency– Faster
CIDR'11 - DBease
(29)