Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented...

Improved search for Socially Annotated DataAuthors: Nikos Sarkas, Gautam Das, Nick KoudasPresented by: Amanda Cohen Mostafavi

Introduction• Social Annotation: A process where users

collaboratively assign a short sequence of keywords (tags) to a number of resources▫Each tag sequence is a concise and accurate

summary of the resource’s content▫Meant to aid navigation through a collection

• Leads to searching via tags▫Enables relevant text retrieval▫Allows accurate retrieval of non-textual objects▫Presents a need for an efficient retrieval and

ranking method based on user tags

RadING

•Ranking annotated data using Interpolated N-Grams

•Searching and ranking method based exclusively on user tags

•Uses interpolated n-grams to model tag sequences associated with every resource

•How does it rank?

Probabilistic Foundations

•Goal: To rank resources by the probability that they will be relevant to the query

•Given keyword query Q, and a collection of resources R, we apply Bayesian theorem to get:

p(R is relevant | Q) = p(Q|R is relevant)p(R is Relevant)

Where p(R is relevant) is the probability that R is relevant, independent of the query posed and p(Q) is the probability of the query issued

Probabilistic Foundations

•p(R is relevant) is constant throughout the resource collection, as well as p(Q)▫Meaning: ranking resources by p(R is

relevant|Q) is equivalent to ranking by p(Q|R is relevant)

•In order to estimate the probability of the query being “generated” by each resource, resources need to be modeled based on knowledge of social annotation

Dynamics and Properties of the Social Annotation Process•The goal of the tagging process is to

describe the resource’s content•User opinions crystallize quickly, can find

annotation trends after witnessing a small number of assignments

•Therefore we assume the following:▫p(Q | R is relevant) = p(Q is used to tag R)▫In English: Users will use keyword

sequences derived from the same distribution to both tag and search for a resource

Social Annotation Process: Things to consider…•Resources are rarely given assignments

with one tag•Also, tag positions are not random,

progress from left to right from more general to more specific

• Tags representing different perspectives on a resource are less likely to occur together in the same assigment

•Used n-gram models to model these co-occurance patterns

N-gram Models

•Given an assignment made up of a sequence (s) of l tags t1…tl, the probability of this sequence being assigned to a resource is:▫p(t1,…,tl ) = p(t1)p(t2|t1)…p(tl|t1,…, tl-1)

•The purpose of using n-gram models is to approximate the probability of a subsequence with only the last n-1 tags▫In the case of a bi-gram model, p(tk|t1,…,tk-1)

approximates to p(tk|tk-1)

N-gram Models

•Calculate the probability using the Maximum Likelihood equation

•c(t1, t2) = the number of occurrences of the bi-gram

•The summation is the sum of the occurrences of all bigrams involving t1 as the first tag

ttcttp

),()|(

Interpolation

•Interpolation is used to compensate for sparse data, distributes probability mass from high counts to low counts

•Used the Jelinek-Mercer interpolation technique. Applied to a bi-gram, yields:

)()(ˆ)|(ˆ)|(

202112212

tptpttpttp bg

Parameter Optimization

•Goal: to maximize the likelihood function L(λ1,λ2) in order to find the ideal interpolation parameters

•Definitions:▫D*: The constrained domain of λ1 and λ2

▫λ*: The global maximum of L(λ1,λ2)

▫λc : The point at which L(λ1,λ2) evaluates to its maximum value within D*, which must be found to optimize parameters

RadING Optimization Framework•Step 1: If L(λ1,λ2) is unbounded, perform

1D optimization to locate λc

•Step 2: If L(λ1,λ2) is bounded, apply 2D optimization to find λ*

•Step 3: If λ* is not in D*, locate λc

Searching Process•Step 1: Train a bi-gram model for each

resource▫Compute the bi-gram and unigram probability

and optimize the interpolation parameters•Step 2: At query-time compute the probability

of the query keyword sequence being generated by each resource’s bi-gram model

•Use Threshold Algorithm to compute top-k results

jjkR qqpqqp1

11 )|(),...,(

Searching Example

Experimental Evaluation

•Test data: web crawl of del.icio.us▫70,658,851 assignments▫Posted by 567,539 users▫Attached to 24,245,248 unique URLs▫Average length of assignment: 2.77▫Standard deviation: 2.70▫Median: 2

Optimization Efficiency

Ranking Effectiveness

•Compares RadING ranking method to adaptations of tf/idf ranking▫Tf/Idf: concatenates resources’ assignments

into a document and performs raking based tf/idf similarity to each document

▫Tf/Idf+: computes tf/idf similarity of each individual assignment and rank resources based on average similarity

•10 Judges contacted through Amazon Mechanical Turk to measure precision

Ranking Effectiveness

Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented...

Documents

Transcript of Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented...

L E G E N D Y - Webgarden.czmedia1.pergamen.jex.cz › files › media1:4d29cad9b0066...Pax Tharkas - Pax Sarkas Plains of Dust — Prašné pláně Qualinost — Qualinost Qualtigoth

Data Stream Computation Lecture Notes in COMP 9314 modified from those by Nikos Koudas (Toronto U), Divesh Srivastava (AT & T), and S. Muthukrishnan (Rutgers)

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

Npd presentation-g6-latifi,mostafavi,felfeli

Nilesh Bansal and Nick Koudas WebDB 2007 SEARCHING THE BLOGOSPHERE Nilesh Bansal Nick Koudas University of Toronto.

ANALYSIS AND DESIGN OF W-BAND PHASE SHIFTERSsorinv/theses/Ioannis_Sarkas_MASc_thesis.pdf · analysis and design of w-band phase shifters by ioannis sarkas a thesis submitted in conformity

Giorgos Sarkas Portfolio

Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.

Answering Top-k Queries Using Views By: Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto),

1391/09/221. Mostafavi N Department of pediatric infectious disease Isfahan university of medical sciences 1391/09/222.

PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T.

Towards Mobility as a Service · 2019. 10. 30. · • Synchronicity . Mobility – Inclusive City = Universal access. Universal accessibility. Mir Mostafavi (FSG) Patrick Morales

Baniassadi, F., Alvanchi, A. and Mostafavi, A. (2018), A ...

The Reliability and Validity of the Persian Version of ...jnfh.mums.ac.ir/article_12105_05d8526998cb546460b2558ad7b0619d.pdf · Mohammadi MR, Akhondzadeh S, Mostafavi SA, Keshavarz

Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.

La Superficie de la Arquitectura - D. Leatherbarrow & M. Mostafavi

(Exemplary Projects )Peter Zumthor Mohsen Mostafavi-Thermal Bath Vals-Architectural Association Publications(1996)

Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,

23/09/13911. Dr Mostafavi N Departement of Pediatric infectious Disease Isfahan University of Medical Sciences 23/09/13912.

Liang Jin * UC Irvine Nick Koudas University of Toronto Chen Li * UC Irvine Anthony K.H. Tung National University of Singapore VLDB’2005 * Liang Jin and.