An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems
-
Upload
gong-cheng -
Category
Technology
-
view
590 -
download
5
Transcript of An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems
![Page 1: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/1.jpg)
.nju.edu.cn
An Empirical Study of Vocabulary Relatedness
and Its Application to Recommender Systems
Gong Cheng, Saisai Gong, Yuzhong Qu
State Key Laboratory for Novel Software Technology, Nanjing University, China
Presented at ISWC2011
![Page 2: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/2.jpg)
Gong Cheng (程龚) [email protected] 2 of 36
ws .nju.edu.cn
Vocabulary matching
Measuring term similarity
FullProfessor
FacultyMember
AssistantProfessor
Professor
Faculty
AssistantProfessor
0.9
0.8
1.0
![Page 3: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/3.jpg)
Gong Cheng (程龚) [email protected] 3 of 36
ws .nju.edu.cn
Vocabulary matching
Vocabulary distance
Measuring vocabulary similarity
Semantic Web for Research
Communities (SWRC)
eBiquity Person
Foundational Model of
Anatomy (FMA)
GALEN
NCBI organismal classification
(NCBITaxon)
0.8
0.5
0.5
0.60.02
![Page 4: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/4.jpg)
Gong Cheng (程龚) [email protected] 4 of 36
ws .nju.edu.cn
Vocabulary matching
Vocabulary distance
Vocabulary relatedness
Measuring vocabulary relatedness
FullProfessor
FacultyMember
AssistantProfessorPhD
Postgraduate-Research-
Degree
EngD
not that similar, but somewhat related
![Page 5: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/5.jpg)
Gong Cheng (程龚) [email protected] 5 of 36
ws .nju.edu.cn
Contributions
How to measure vocabulary relatedness?
6 measures, from 4 aspects
How about vocabulary relatedness in real-life cases?
Empirical analysis of 2,996 vocabularies and other 4 billion RDF triples
Where to apply vocabulary relatedness?
Post-selection vocabulary recommendation in vocabulary search
![Page 6: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/6.jpg)
Gong Cheng (程龚) [email protected] 6 of 36
ws .nju.edu.cn
Outline
Data set
Vocabulary relatedness
Post-selection vocabulary recommendation
Conclusions
![Page 7: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/7.jpg)
Gong Cheng (程龚) [email protected] 7 of 36
ws .nju.edu.cn
Data set statistics
Crawled from February 2010 to May 2011 by
![Page 8: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/8.jpg)
Gong Cheng (程龚) [email protected] 8 of 36
ws .nju.edu.cn
Data set distributions
RDF documents over pay-level domains
![Page 9: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/9.jpg)
Gong Cheng (程龚) [email protected] 9 of 36
ws .nju.edu.cn
Data set distributions
Vocabularies over top-level domains
![Page 10: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/10.jpg)
Gong Cheng (程龚) [email protected] 10 of 36
ws .nju.edu.cn
Outline
Data set
Vocabulary relatedness
Post-selection vocabulary recommendation
Conclusions
![Page 11: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/11.jpg)
Gong Cheng (程龚) [email protected] 11 of 36
ws .nju.edu.cn
Vocabulary relatedness
6 numerical measures, from 4 aspects
Semantic relatedness
Explicit
Implicit
Hybrid
Content similarity
Expressivity closeness
Distributional relatedness
Comparison
![Page 12: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/12.jpg)
Gong Cheng (程龚) [email protected] 12 of 36
ws .nju.edu.cn
Measure 1: explicit semantic relatedness
owl:imports
v1 v2 v3
1 2
Eji
ji
E
SGvv
vvRin and between path shortest a ofweight
1,
GE
v1 v2
v3
rdfs:seeAlso
owl:priorVersion
![Page 13: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/13.jpg)
Gong Cheng (程龚) [email protected] 13 of 36
ws .nju.edu.cn
Measure 2: implicit semantic relatedness
owl:inverseOf
v2 v3 v4
1 2GI
t2 t3t4
owl:inverseOf
rdfs:subClassOf
Iji
ji
I
SGvv
vvRin and between path shortest a ofweight
1,
v2 v3 v4
![Page 14: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/14.jpg)
Gong Cheng (程龚) [email protected] 14 of 36
ws .nju.edu.cn
Measure 3: hybrid semantic relatedness
v1
v2
v3
1
2
IEji
ji
IE
SGvv
vvRin and between path shortest a ofweight
1,
v4
1
GE+I
![Page 15: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/15.jpg)
Gong Cheng (程龚) [email protected] 15 of 36
ws .nju.edu.cn
Statistical properties of GE, GI and GE+I
Empirical analysis (1)
![Page 16: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/16.jpg)
Gong Cheng (程龚) [email protected] 16 of 36
ws .nju.edu.cn
Empirical analysis (2)
Explicit relations between vocabularies
![Page 17: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/17.jpg)
Gong Cheng (程龚) [email protected] 17 of 36
ws .nju.edu.cn
Measure 4: content similarity
Harmonic mean
Maximum similarity between their labels
![Page 18: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/18.jpg)
Gong Cheng (程龚) [email protected] 18 of 36
ws .nju.edu.cn
Empirical analysis (3)
86 label-like properties
rdfs:label, dc:title, and their subproperties (e.g. skos:prefLabel)
and local name
63.67%
36.33%
Terms and their labels
w/
w/o
36.21%
63.79%
Vocabulary distribution
w/
w/o
![Page 19: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/19.jpg)
Gong Cheng (程龚) [email protected] 19 of 36
ws .nju.edu.cn
Measure 5: expressivity closeness
tq
tp
tr
MetaTerms
rdfs:domain
owl:inverseOf
owl:TransitiveProperty
owl:TransitiveProperty
rdf:type
Jaccard
![Page 20: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/20.jpg)
Gong Cheng (程龚) [email protected] 20 of 36
ws .nju.edu.cn
Empirical analysis (4)
4,978 meta-level terms, 469 (9.42%) in >1 vocabulary
Most popular meta-level terms
1. rdf:type
2. rdfs:domain
3. rdfs:range
4. …
and after excluding language constructs
10.13 meta-level terms per vocabulary
≤20 meta-level terms in 92.96% vocabularies
but hundreds in Cyc
![Page 21: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/21.jpg)
Gong Cheng (程龚) [email protected] 21 of 36
ws .nju.edu.cn
Measure 6: distributional relatedness
Distributional profile
vvp
vvp
vvp
v
n |
...
|
|
DP2
1
jijiD vvvvR DP,DPcos,
![Page 22: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/22.jpg)
Gong Cheng (程龚) [email protected] 22 of 36
ws .nju.edu.cn
Empirical analysis (5)
Instantiation found for 1,874 (62.55%) vocabularies
Most popular vocabularies (excluding languages)
![Page 23: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/23.jpg)
Gong Cheng (程龚) [email protected] 23 of 36
ws .nju.edu.cn
Empirical analysis (6)
Co-instantiation found for 9,763 pairs of vocabularies
Most popular vocabulary co-instantiation (excluding languages)
![Page 24: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/24.jpg)
Gong Cheng (程龚) [email protected] 24 of 36
ws .nju.edu.cn
Vocabulary relatedness
6 numerical measures, from 4 aspects
Semantic relatedness
Explicit
Implicit
Hybrid
Content similarity
Expressivity closeness
Distributional relatedness
Comparison
![Page 25: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/25.jpg)
Gong Cheng (程龚) [email protected] 25 of 36
ws .nju.edu.cn
Agreement between measures
Spearman’s rank correlation coefficient (ρ∈[-1,1])
Single-link hierarchical clustering
![Page 26: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/26.jpg)
Gong Cheng (程龚) [email protected] 26 of 36
ws .nju.edu.cn
Outline
Data set
Vocabulary relatedness
Post-selection vocabulary recommendation
Conclusions
![Page 27: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/27.jpg)
Gong Cheng (程龚) [email protected] 27 of 36
ws .nju.edu.cn
Ranking by single measure:
Ranking by multiple measures:
Relatedness-based ranking
![Page 28: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/28.jpg)
Gong Cheng (程龚) [email protected] 28 of 36
ws .nju.edu.cn
Popularity-based re-ranking
Number of pay-level domains instantiating vi
Degree of influence of popularity
![Page 29: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/29.jpg)
Gong Cheng (程龚) [email protected] 29 of 36
ws .nju.edu.cn
Evaluation settings
20 “selections” randomly selected from 1,302 moderate-sized vocabularies
Depth-10 pooling with
2 experts
Ratings
Closely related: 2
Somewhat related: 1
Unrelated: 0
Metric: NDCG
![Page 30: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/30.jpg)
Gong Cheng (程龚) [email protected] 30 of 36
ws .nju.edu.cn
Gold standard
739 assessments
Agreement between experts
80%
or 91% when “closely related = somewhat related = related”
7.85%10.55%
81.60%
Assessments
Closely related
Somewhat related
Unrelated
![Page 31: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/31.jpg)
Gong Cheng (程龚) [email protected] 31 of 36
ws .nju.edu.cn
Evaluation results --- individual measures
56.88% isolated vocabularies in GE 37.45% uninstantiated vocabularies
![Page 32: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/32.jpg)
Gong Cheng (程龚) [email protected] 32 of 36
ws .nju.edu.cn
Evaluation results --- combinations of measures
![Page 33: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/33.jpg)
Gong Cheng (程龚) [email protected] 33 of 36
ws .nju.edu.cn
Relatedness vs. popularity
NDCG@1 vs. number of pay-level domains instantiating it
![Page 34: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/34.jpg)
Gong Cheng (程龚) [email protected] 34 of 36
ws .nju.edu.cn
Outline
Data set
Vocabulary relatedness
Post-selection vocabulary recommendation
Conclusions
![Page 35: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/35.jpg)
Gong Cheng (程龚) [email protected] 35 of 36
ws .nju.edu.cn
Conclusions
Vocabulary-level relatedness
4 aspects, 6 measures
Empirical analysis
Statistical findings
Comparison
Post-selection vocabulary recommendation
Relatedness-based ranking
Popularity-based re-ranking
Evaluation
Falcons Ontology Search
http://ws.nju.edu.cn/falcons/ontologysearch/
![Page 36: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems](https://reader033.fdocuments.us/reader033/viewer/2022060122/559665e81a28abf9338b4872/html5/thumbnails/36.jpg)
Gong Cheng (程龚) [email protected] 36 of 36
ws .nju.edu.cn
Take away
Vocabulary meta-descriptions are incomplete.
Terms lack labels.
Co-instantiated ∝ explicitly related
http://ws.nju.edu.cn/falcons/ontologysearch/