YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang...

24
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineerin g Yuan Ze University, Taiwan, ROC 2006/5/22 2006/5/22 Models of Trust for the Web (MTW'06)

Transcript of YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang...

Page 1: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB

A Study of Web Search Engine Bias and its Assessment

Ing-Xiang Chen and Cheng-Zen Yang

Dept. of Computer Science and Engineering

Yuan Ze University, Taiwan, ROC

2006/5/222006/5/22

Models of Trust for the Web (MTW'06)

Page 2: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 2

Outline

• Introduction

• Related Work

• A 2-D Bias Assessment

• Results and Discussion

• Conclusions

Page 3: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 3

Introduction

• Web search engines have become a significant gateway to the Internet.

• People may get used to a few particular search engines.

• Users may thus be affected by biased search results unknowingly.

Page 4: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 4

Search Engine Bias

• Search engine bias is incurred from:– diverse operating policies and business strategie

s, e.g. “Falun Gong” event in China,– some limitations of crawling, indexing, and ran

king techniques, e.g. the Googlewashed event,– opposed political standpoints, diverse cultural b

ackgrounds, and different social custom.

Page 5: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 5

A Query Example

 Brand Names About AltaVista Excite Google Inktomi Lycos MSN Overture Teoma YahooAmana X O X X X O XConservExplorer O O O OFrigidaire X XGE X O X X XInglis X X XKitchenaid X XKlondikeMAC-GRAYMaytag X X X O X XRoper O O X O XSun Frost O O O O OWhirlpool X X X XNote. O represents that brand names appear both in URL strings and in the contents. X represents that brand names appear only in the contents.

Page 6: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 6

Our Research

• Establish a new mechanism in assessing bias of Web search engines.

• Provide a two-dimensional scheme by adopting both indexical bias and content bias to assess search engine bias.

Page 7: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 7

Indexical Bias vs. Content Bias

• The bias of a search engine represents the “deviation of the norm from the result of a search engine”.

• The differences in the sets of URLs retrieved by most Web search engines are termed indexical bias.

• The deviations of contents provided by a search engine from the contents provided by most Web search engines are termed content bias.

Page 8: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 8

Related Work

• The assessment of indexical bias (proposed by Mowshowitz and Kawaguchi )

1. Select a pool of search engines as the norm.

2. Transform the URLs into vectors.

3. Calculate the similarity of URLs between the search engine to be compared and the norm.

4. Subtract the similarity value from 1 to gain the bias value.

Page 9: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 9

An Example

– First 10 URLs were retrieved from 3 search engines by using 2 queries. (10 x 3 x 2 = 60)

– 48 distinct URLs represent 44 Web sites.– The norm = (7,3,2,2,2,2,…….,1,1,1,1,1) – Google = (3,1,1,0,2,1,…….,0,1,1,0,0)

– 48/(124 x 28)1/2 = 0.8146– Bias value = 1 - 0.8146 = 0.1854

Page 10: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 10

Our Considerations

• The method proposed by Mowshowitz and Kawaguchi tells us the deviations of Web sites not really their contents.

• If we examine the bias from both the indexical view and content view, we may get the panorama of search engine bias.

Page 11: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 11

Selection of the Norm

• An explicit norm is mainly from careful examinations of subject experts.

• Manual examination is impractical in a extremely large and fast-changing Web environment.

• An implicit norm is defined by choosing a collection of search results from several representative search engines.

Page 12: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 12

The Selection Criteria

• The search engines– are generally designed for different subject

areas.– are comparable to each other.– have their own processing rules.

Page 13: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 13

The Process of Bias Assessment

QueryURL Locator

SearchEngine

SearchEngine

SearchEngine

Document Parser

WebPages

Vocabulary Entries

...

IndexicalBias

ContentBias

Page 14: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 14

The Assessment Algorithm (I)

• Scores are calculated as follows:– Score = f (d + tWt + HWH +hWh)*log(n/d)

• f: term frequency

• d: document frequency

• t: title

• H: H1

• h: H2

• n: total document number

Page 15: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 15

The Assessment Algorithm (II)

• X = (x1, x2, x3, … ,xn)

• N = (n1, n2, n3, … ,nn)

• Bias = 1 – cos ( X, N )

Page 16: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 16

Experimental Environment

• 10 popular Web search engines are chosen.– About, AltaVista, Excite, Google, Inktomi,

Lycos , MSN , Overture, Teoma, and Yahoo.

• The top 10 URLs are retrieved for further calculation.(Silverstein et al. showed that 85% queries are from the first result screen.)

Page 17: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 17

The Averaged Indexical Bias

Page 18: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 18

The Averaged Content Bias

Page 19: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 19

2-D Analysis for Hot Queries

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Content Bias

Inde

xica

l Bia

s

About

Alta Vista

Excite

Google

Inktomi

Lycos

MSN

Overture

Teoma

Yahoo!

Page 20: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 20

ANOVA Results (I)• The averaged bias result

– Indexical Bias

– Content Bias

Page 21: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 21

ANOVA Results (II)

• Between each search engine over the ten hot query terms– Indexical Bias

– Content Bias

Page 22: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 22

The Case of “Second Superpower”

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Content Bias

Inde

xica

l Bia

s

About

Alta Vista

Excite

Google

Inktomi

Lycos

MSN

Overture

Teoma

Yahoo

Page 23: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 23

Conclusions

• The bias of Web search engines has a deep effect upon Internet users.

• The assessment of Indexical bias by only considering URLs may not display the panorama of search engine bias.

• We provide users with a more comprehensive reference to notice the blind spot of one-dimensional bias assessment.

• Statistical analyses further present that a two-dimensional scheme can fulfill the task of bias assessment.

Page 24: YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.

YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 24

Thank You!

If you have any question, please email [email protected].