Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han,...
-
Upload
roland-juniper-hudson -
Category
Documents
-
view
218 -
download
0
Transcript of Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han,...
Truth Discovery with Multiple Confliction Information Providers
on the WebXiaoxin Yin, Jiawei Han, Philip S.Yu
Industrial and Government Track short paper
AdvisorAdvisor :: Dr. Koh Jia-LingDr. Koh Jia-LingSpeakerSpeaker :: Che-Wei LiangChe-Wei Liang
DateDate :: 2007.11.202007.11.20
1
Outline
• Introduction• Problem Definitions• Computational Model– Web Site Trustworthiness and Fact Confidence– Iterative Computation
• Empirical Study• Conclusions
2
Introduction
• World-wide web– a necessary part of our lives.– ex: Amazon.com, ShopZilla.com.
• Is the world-wide web always trustable?– There is no guarantee for the correctness of
information on the web.
3
Introduction
• Ranking web pages– According to authority based on hyperlinks.– Ex: Authority-Hub analysis, PageRank,
more general link-based analysis.
• Does authority or popularity of web sites lead to accuracy of information?
5
Problem Definitions
• Define1: Confidence of facts.– The probability of a fact f being correct,
denote by s(f).
• Define2: Trustworthiness of web sites.– The expected confidence of the facts provided by
a web site w, denote by t(w).
7
Problem Definitions
• Facts may be conflict or supportive to each other.– Ex: “Jennifer Widom”, “J. Widom”
• Concept of implication– imp(f1 → f2): f1’s influence on f2’s confidence.
8
Basic heuristic
• Basic heuristic1. Usually there is only one true fact
for a property of an object.
2. This true fact appears to be the same or similar on different web sites.
9
Basic heuristic (cont.)
• Basic heuristic3. The false facts on different web sites are
less likely to be the same or similar.
4. In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects.
10
Web Site Trustworthiness and Fact Confidence
• Trustworthiness t(w)
where F(w) is the set of facts provided by w.
11
Web Site Trustworthiness and Fact Confidence
• more difficult to estimate the confidence of a fact.
12
Web Site Trustworthiness and Fact Confidence
• Simple case– f1 is the only fact about object o1
– assume w1 and w2 are independent.
• Confidence s(f)
W(f) is the set of web sites providing f.13
Web Site Trustworthiness and Fact Confidence
• Trustworthiness score of a web site
• τ(w) is between 0 and +∞, better characterizes how accurate w is.– ex: t(w1) = 0.9, t(w2) = 0.99
t(w2) = 1.1 × t(w1)
τ(w2) = 2 × τ(w1)
14
Web Site Trustworthiness and Fact Confidence
• Compute the confidence of f based on σ*(f) in the same way as computing it based on σ(f).
• Different web sites are independent. add a dampening factor γ, 0 < γ < 1.
incorrect!
17
Web Site Trustworthiness and Fact Confidence
• Negative-confidence problem– a fact f conflicting with some facts provided by
trustworthy web sites. σ*(f) < 0 and s*(f) < 0.
• – If γ . σ*(f) > 0, s(f) is very close to s*(f).– If γ . σ*(f) < 0, s(f) is close to zero but still
positive.
unreasonable!
18
Iterative Computation
• TRUTHFINDER - Iterative method– TruthFinder has little information about the
web sites and the facts.
– Each iteration, improves its knowledge about trustworthiness and confidence.
– Stops when the computation reaches a stable state.
19
Empirical Study
• Compare with VOTING– Which Chooses the fact that is provided by most
web sites.
• Intel PC with a 1.66GHz dual-core processor, 1GB memory, Windows XP Professional.ρ = 0.5 and γ = 0.3.
20
Conclusions
• Introduce and formulate the Veracity problem– resolving conflicting facts from multiple web site.– finding true facts among them.
• Propose TRUTHFINDER– Utilizes Web site trustworthiness and fact confidence to
find trustable web sites and true facts.
• Experiment achieves high accuracy.
25