Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson
University May 7 th 2014 1
Slide 2
2 Search Process Query Processing Document Ranking Search
Result Clustering and Diversification What is the Goal
Contributions
Slide 3
3 Simple search Query: keywords Find documents which have those
keywords Rank them based on query Result: ranked documents
Slide 4
4 Search Process Query Processing Document Ranking Search
Result Clustering and Diversification What is the Goal
Contributions
Slide 5
5 Query length Correlated with performance in the search task
Query is small collection of keywords Hard to find relevant
documents only based on 2,3 words Solution Query reformulation
Query expansion
Slide 6
6 Query Expansion Selection of new terms Relevant documents
WordNet (Synonym, hyponym, ) Disambiguation
Slide 7
7 Query Expansion Selection of new terms Weighting those
terms
Slide 8
8 Search Process Query Processing Document Ranking Search
Result Clustering and Diversification What is the Goal
Contributions
Slide 9
9 Probabilistic Methods What is the probability that this
document is relevant to this query? The event that the document is
judged as relevant to query The document description
Slide 10
10 Language Models What is the probability of generating query
Q, given document d, with language model M d. Maximum likelihood
estimate of the probability Maximum likelihood estimate of the
probability
Slide 11
11 Search Process Query Processing Document Ranking Search
Result Clustering and Diversification What is the Goal
Contributions
Slide 12
12
Slide 13
13 Search Process Query Processing Document Ranking Search
Result Clustering and Diversification What is the Goal
Contributions
Slide 14
14 Searching on google
Slide 15
15 Searching on google I want all of these searches show the
same results, since they have same meaning, and it is the intent of
the user to know all of them, when searching for one.
Slide 16
16 Search Process Query Processing Document Ranking Search
Result Clustering and Diversification What is the Goal
Contributions Query Expansion Query Expansion(Tasks to Decide)
Document Ranking
Slide 17
17 How? New Semantic Query Expansion Method New Semantic
Document Ranking Method
Slide 18
18 Search Process Query Processing Document Ranking Search
Result Clustering and Diversification What is the Goal
Contributions Query Expansion Query Expansion(Tasks to Decide)
Document Ranking
Slide 19
19 Example: Gain Weight Desirable keywords in expanded query:
Gain, weight, muscle, mass, fat Gain weight Muscle Mass Fat What
are these relations?
Slide 20
20 Digging in dbpedia and wikipedia
http://en.wikipedia.org/wiki/Weight_gain
http://dbpedia.org/page/Muscle
http://dbpedia.org/page/Adipose_tissue
Slide 21
21 Search Process Query Processing Document Ranking Search
Result Clustering and Diversification What is the Goal
Contributions Query Expansion Query Expansion(Tasks to Decide)
Document Ranking
Slide 22
22 How to map query phrases into Wikipedia components? Which
properties and their related entitles should be selected? Can those
properties be selected automatically for each phrase? Or should it
be fixed for the whole algorithm? If its automatic, what is the
process?
Slide 23
23 Is dbpedia and Wikipedia enough to decide, or should we use
other ontologies? How should we weight the extracted entities
(terms, senses) in order to select the expanded query among
them.
Slide 24
24 Search Process Query Processing Document Ranking Search
Result Clustering and Diversification What is the Goal
Contributions Query Expansion Query Expansion(Tasks to Decide)
Document Ranking
Slide 25
25 Are the documents annotated? Yes Rank documents using the
extracted entitles from the query expansion phase. No Rank the
documents based on the semantics of the expanded query other than
the terms or phrases. Define probabilities over senses other than
terms in the query and documents.
Slide 26
26 Are the documents annotated? Yes Rank documents using the
extracted entitles from the query expansion phase. No Rank the
documents based on the semantics of the expanded query other than
the terms or phrases. Define probabilities over senses other than
terms in the query and documents. Documents are not annotated, so
how?
Slide 27
27 Semantic Similarity between two non-annotated documents (
the expanded query and the document) There are papers on using
WordNet ontology, with topic specific PageRank algorithm, for
similarity of two sentences (phrase or word). The application on
information retrieval has not been seen yet.
Slide 28
28 Semantic Similarity between two non-annotated documents (
the expanded query and the document) There are papers on using
WordNet ontology, with topic specific PageRank algorithm, for
similarity of two sentences (phrase or word). The application on
information retrieval has not been seen yet. Find the aspects of
different algorithms which are more beneficial in the information
retrieval domain (two large documents)
Slide 29
29 Semantic Similarity between two non-annotated documents (
the expanded query and the document) There are papers on using
WordNet ontology, with topic specific PageRank algorithm, for
similarity of two sentences (phrase or word). The application on
information retrieval has not been seen yet. More reasonable is to
apply the algorithm on dbpedia (instead of WordNet) in the entity
domain (instead of sense domain)
Slide 30
30 Applying a search result clustering and diversification,
based on the different semantics of the query.
Slide 31
31 1. B. Selvaretnam, M. B. (2011). Natural language technology
and query expansion: issues, state-of-the-art and perspectives.
Journal of Intelligent Information Systems, 38(3), 709-740. 2. C.
Carpineto, G. R. (2012). A Survey of Automatic Query Expansion in
Information Retrieval. ACM Computing Surveys, 44(1), 1-50. 3.
Hiemstra, Djoerd. "A linguistically motivated probabilistic model
of information retrieval." In Research and advanced technology for
digital libraries, pp. 569-584. Springer Berlin Heidelberg, 1998.
4. S. W. S. R. K. Sparck Jones, "A probabilistic model of
information retrieval : development and comparative experiments
Part 1," Information Processing & Management, vol. 36, no. 6,
pp. 779-808, 2000. 5. Sparck Jones, Karen, Steve Walker, and
Stephen E. Robertson. "A probabilistic model of information
retrieval: development and comparative experiments: Part 2."
Information Processing & Management 36.6 (2000): 809-840. 6. a.
R. N. A. Di Marco, "Clustering and Diversifying Web Search Results
with Graph-Based Word Sense Induction," Computational Linguistics,
vol. 39, no. 3, pp. 709-754, 2013. 7. Di Marco, Antonio, and
Roberto Navigli. "Clustering and diversifying web search results
with graph-based word sense induction." Computational Linguistics
39, no. 3 (2013): 709-754. 8. Pilehvar, Mohammad Taher, David
Jurgens, and Roberto Navigli. "Align, disambiguate and walk: A
unified approach for measuring semantic similarity." InProceedings
of the 51st Annual Meeting of the Association for Computational
Linguistics (ACL 2013). 2013.