Yonsei University, Korea Seung-won Hwang ...
Transcript of Yonsei University, Korea Seung-won Hwang ...
![Page 1: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/1.jpg)
Data Intelligence LabYonsei University, Korea
Seung-won Hwanghttp://dilab.yonsei.ac.kr/~swhwang
![Page 2: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/2.jpg)
Research background
PhD(@2005) on left-hand side
Recent work on right-hand side
2
![Page 3: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/3.jpg)
Entity search as a platform
Browser Spreadsheet
3
![Page 4: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/4.jpg)
Entity search as a platform
Mobile search And beyond!!
4
![Page 5: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/5.jpg)
Knowledge WebSemantic Web Human readable vs machine r
eadable contents
Human defines standard for data formats and models
Explicit and precise specification of knowledge representation that everyone has to agree upon
Machine reads human readable contents
Machine learns to conflate different formats of the same thing
Latent and fuzzy representation of knowledge learned by mining big data
Excerpt from Kuansan Wang’s keynote slides@ICADL2015
![Page 6: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/6.jpg)
Recent Work
Harvesting, Completion (#1,#3)AAAI, ICDE, VLDB, VLDB Journal
Linking, Multilingual linking (#2)ACL, EMNLP, ACM TOIS, IEEE TKDE
PerformanceSIGIR, WSDM, VLDB
6
isA(Bordeaux, wine)=??isProperty(wine|age,texture,aroma)=0.8Verb?
황승원 黃升嫄
![Page 7: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/7.jpg)
Conflation: Graph
7
Rij is confidence of G.i matches G’.j
Propagate matching confidence of G.i and G’.j neighbors
Repeat #1 and #2 until convergence
Rij
Rij
![Page 8: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/8.jpg)
Search performance as a platform
8
Diverse software generate search queries
Consistent low latency is crucial
“Microsoft” Long
Short
Cost prediction
Resource manager
Prediction model
SIGIR14, WSDM15 Best paper runner-up
![Page 9: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/9.jpg)
Cost prediction features
9
Inverted index for “Microsoft”
Processing Not evaluated
Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N
Docs sorted by static rankHighest Lowest
……. …….
Score distribution (mean,max,var), #postings, etc
![Page 10: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/10.jpg)
Advanced features for automatic refinement
10
<Fields related to query execution plan>rank=BM25Fenablefresh=1 partialmatch=1language=en location=us ….
<Fields related to search keywords>Redmond (MS or Microsoft)
![Page 11: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/11.jpg)
Performance when deployed
11
50
100
150
2005
0
10
0
15
0
20
0
25
0
30
0
35
0
40
0
45
0
50
0
55
0
60
0
65
0
70
0
75
0
80
0
85
0
90
0
95
0
Re
spo
nse
Tim
e
(ms)
Query Arrival Rate (QPS)
Sequential
Degree=3
Predictive
50% throughput increase
![Page 12: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/12.jpg)
Spatial KB and search as a platform
Devices as a producer/consumer of information
Location as a first-class citizen context
12
![Page 13: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/13.jpg)
13
![Page 14: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/14.jpg)
Conflation for spatial entity [AAAI16, ICDM15]
KB harvesting Map translation
Intelligent query expansion (“seattle center” “seattle center” or “space needle” or “Chihuly museum”)
14
![Page 15: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/15.jpg)
![Page 16: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/16.jpg)
Performance [VLDB16]
Automatic query expansion restaurant restaurant OR banquet
“seattle center” “seattle center” or “space needle”
Multiple keywords Complex AND/OR with location
Example
T = ((restaurant OR banquet) AND (vegetarian OR halal)OR ((hotel OR resort) AND wifi)OR …S = user location (Seoul)
16
![Page 17: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/17.jpg)
Additional technical challenges
17
S2I: Text-first indexGood at selective keyword predicates
IR-tree: Augmented R-treeGood at selective spatial predicates
Crane:Good at narrow-necked vessel
Fox:Good at bowl
![Page 18: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/18.jpg)
Challenges Cost model design
Exponential possible ways(solution space)
Efficient optimization
Theoretic guarantee
······
Additional technical challenges
Our approach
18
Measuringthe problem(Cost model)
Proposingthe solution(Optimization)
![Page 19: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/19.jpg)
Additional technical challenges
Base mapping (spatial keyword processing part) Intersection (keyword predicate processing part)
Optimized solution Base mapping is optimized with the following five techniques.
19
Name Space Reduction(↑ better)
Alg. Cost(↓ better)
TheoreticBound(↓ better)
T1 Single verification pop up 2𝐾 Linear OPT
T2 Intersection push down 2𝐹 Linear 5
3
𝐹X OPT
T3 Least selective intersection first ෑ
𝑖=1
𝑁
𝑀𝑖! ⋅ 𝐶𝑀𝑖Sorting OPT
T4 Modified Huffman union tree 𝐶𝑁 Sorting OPT
T5 Verification selection 2σ𝑖=1𝑁 (𝑀𝑖−1)
Exp. OPT
Linear 2X OPT
······
![Page 20: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/20.jpg)
Additional technical challenges
20
Base mapping [134.7 ms] vs. Optimized solution [1.8 ms]
0
500
1000
99th Allqueries
99th Allqueries
99th Allqueries
Base kNN TopK
Res
po
nse
tim
e (m
s)
Baseline-I
Baseline-P
Base mapping
Optimized solution
Up to 11 times faster
∩
∪
𝒮𝐼 𝑆 𝒦𝛪(wifi)
V
∩
𝒦𝛪(halal)𝒦𝛪(banquet)
∩
∪
∩
V
𝒮𝐼 𝑆
𝒦𝛪(halal)
𝒦𝛪(wifi)
𝒦𝛪(banquet)
![Page 21: Yonsei University, Korea Seung-won Hwang ...](https://reader034.fdocuments.us/reader034/viewer/2022042421/6260f7302e82ab061a61a21b/html5/thumbnails/21.jpg)
Thanks!!! Understanding Emerging Spatial Entities, AAAI 2016 Fine-grained Semantic Conceptualization of FrameNe
t, AAAI 2016 Verb Pattern: A Probablistic Semantic Representation
of Verbs, AAAI 2016 Processing and Optimizing Main Memory Spatial-
Keyword Queries, VLDB 2016 KSTR: Keyword-aware Skyline Travel Route Recomme
ndation, ICDM 2015 Delayed-Dynamic-Selecive (DDS) Prediction
for Reducing Extreme Tail Latencies in Web Search, WSDM 2015 (Best Paper Runner-up)
Predictive Parallelization: Taming Tail Latencies in Web Search, SIGIR 2014
Overcoming Asymmetry in Entity Graphs, IEEE TKDE 14
ARIA: Asymmetry-Resistant Instance Alignment, AAAI 14
Bootstrapping Entity Translation on Weakly Comparable Corpora, ACL 13
Entity Translation Mining from Comparable Corpora: Combining Graph Mapping with Corpus Latent Features, IEEE TKDE 13
Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach, ACM TOIS 12
Web Scale Taxonomy Cleansing, VLDB 2011 Mining Entity Translations from Comparable Corpora:
A Holistic Graph Mapping Approach ,CIKM 2011 SocialSearch: Enhancing Entity Search with Social Net
work Matching ,EDBT 2011
21
AnyQuestions?Visit dilab.yonsei.ac.kr/~swhwang