YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang...
-
Upload
esther-patrick -
Category
Documents
-
view
215 -
download
0
Transcript of YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang...
YZUCSE SYSLAB
A Study of Web Search Engine Bias and its Assessment
Ing-Xiang Chen and Cheng-Zen Yang
Dept. of Computer Science and Engineering
Yuan Ze University, Taiwan, ROC
2006/5/222006/5/22
Models of Trust for the Web (MTW'06)
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 2
Outline
• Introduction
• Related Work
• A 2-D Bias Assessment
• Results and Discussion
• Conclusions
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 3
Introduction
• Web search engines have become a significant gateway to the Internet.
• People may get used to a few particular search engines.
• Users may thus be affected by biased search results unknowingly.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 4
Search Engine Bias
• Search engine bias is incurred from:– diverse operating policies and business strategie
s, e.g. “Falun Gong” event in China,– some limitations of crawling, indexing, and ran
king techniques, e.g. the Googlewashed event,– opposed political standpoints, diverse cultural b
ackgrounds, and different social custom.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 5
A Query Example
Brand Names About AltaVista Excite Google Inktomi Lycos MSN Overture Teoma YahooAmana X O X X X O XConservExplorer O O O OFrigidaire X XGE X O X X XInglis X X XKitchenaid X XKlondikeMAC-GRAYMaytag X X X O X XRoper O O X O XSun Frost O O O O OWhirlpool X X X XNote. O represents that brand names appear both in URL strings and in the contents. X represents that brand names appear only in the contents.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 6
Our Research
• Establish a new mechanism in assessing bias of Web search engines.
• Provide a two-dimensional scheme by adopting both indexical bias and content bias to assess search engine bias.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 7
Indexical Bias vs. Content Bias
• The bias of a search engine represents the “deviation of the norm from the result of a search engine”.
• The differences in the sets of URLs retrieved by most Web search engines are termed indexical bias.
• The deviations of contents provided by a search engine from the contents provided by most Web search engines are termed content bias.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 8
Related Work
• The assessment of indexical bias (proposed by Mowshowitz and Kawaguchi )
1. Select a pool of search engines as the norm.
2. Transform the URLs into vectors.
3. Calculate the similarity of URLs between the search engine to be compared and the norm.
4. Subtract the similarity value from 1 to gain the bias value.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 9
An Example
– First 10 URLs were retrieved from 3 search engines by using 2 queries. (10 x 3 x 2 = 60)
– 48 distinct URLs represent 44 Web sites.– The norm = (7,3,2,2,2,2,…….,1,1,1,1,1) – Google = (3,1,1,0,2,1,…….,0,1,1,0,0)
– 48/(124 x 28)1/2 = 0.8146– Bias value = 1 - 0.8146 = 0.1854
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 10
Our Considerations
• The method proposed by Mowshowitz and Kawaguchi tells us the deviations of Web sites not really their contents.
• If we examine the bias from both the indexical view and content view, we may get the panorama of search engine bias.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 11
Selection of the Norm
• An explicit norm is mainly from careful examinations of subject experts.
• Manual examination is impractical in a extremely large and fast-changing Web environment.
• An implicit norm is defined by choosing a collection of search results from several representative search engines.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 12
The Selection Criteria
• The search engines– are generally designed for different subject
areas.– are comparable to each other.– have their own processing rules.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 13
The Process of Bias Assessment
QueryURL Locator
SearchEngine
SearchEngine
SearchEngine
Document Parser
WebPages
Vocabulary Entries
...
IndexicalBias
ContentBias
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 14
The Assessment Algorithm (I)
• Scores are calculated as follows:– Score = f (d + tWt + HWH +hWh)*log(n/d)
• f: term frequency
• d: document frequency
• t: title
• H: H1
• h: H2
• n: total document number
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 15
The Assessment Algorithm (II)
• X = (x1, x2, x3, … ,xn)
• N = (n1, n2, n3, … ,nn)
• Bias = 1 – cos ( X, N )
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 16
Experimental Environment
• 10 popular Web search engines are chosen.– About, AltaVista, Excite, Google, Inktomi,
Lycos , MSN , Overture, Teoma, and Yahoo.
• The top 10 URLs are retrieved for further calculation.(Silverstein et al. showed that 85% queries are from the first result screen.)
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 17
The Averaged Indexical Bias
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 18
The Averaged Content Bias
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 19
2-D Analysis for Hot Queries
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Content Bias
Inde
xica
l Bia
s
About
Alta Vista
Excite
Inktomi
Lycos
MSN
Overture
Teoma
Yahoo!
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 20
ANOVA Results (I)• The averaged bias result
– Indexical Bias
– Content Bias
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 21
ANOVA Results (II)
• Between each search engine over the ten hot query terms– Indexical Bias
– Content Bias
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 22
The Case of “Second Superpower”
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Content Bias
Inde
xica
l Bia
s
About
Alta Vista
Excite
Inktomi
Lycos
MSN
Overture
Teoma
Yahoo
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 23
Conclusions
• The bias of Web search engines has a deep effect upon Internet users.
• The assessment of Indexical bias by only considering URLs may not display the panorama of search engine bias.
• We provide users with a more comprehensive reference to notice the blind spot of one-dimensional bias assessment.
• Statistical analyses further present that a two-dimensional scheme can fulfill the task of bias assessment.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 24
Thank You!
If you have any question, please email [email protected].