Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and...

24
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied Computing 2006

Transcript of Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and...

Page 1: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings

Baoning Wu and Brian D. Davison

Lehigh University

Symposium on Applied Computing 2006

Page 2: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Motivation

Link-based ranking algorithms are important to current popular search engines. (e.g., HITS for Teoma)

Link farms will deteriorate the performance of link-based ranking algorithms

Page 3: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

HITS algorithm

Each page has two measures, authority score a shows how good this page is for a query, hub score h shows the possibility that the page points to good authority pages. E is the adjacency matrix.

a = ET h

h = E a

Page 4: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Example: for query “weather” http://www.tripadvisor.com/ http://www.virtualtourist.com/ http://www.abed.com/memoryfoam.html http://www.abed.com/furniture.html http://www.rental-car.us/ http://www.accommodation-specials.com/ http://www.lasikeyesurgery.com/ http://www.lasikeyesurgery.com/lasik-surgery.asp http://mortgage-rate-refinancing.com/ http://mortgage-rate-refinancing.com/mortgage-

calculator.html

Page 5: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Factors that degrade HITS

Mutually reinforcing relationships

Duplicate pages

Link farms

Page 6: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Complete hyperlink

Definition: The link with its anchor text as a unit.

Duplication of a complete link is a much stronger sign of copying behavior on the Web than a duplicate link target.

Page 7: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Document - Complete link Matrix

Page 8: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Bipartite Graph

Two disjoint sets X and Y, each edge starts from an element in X and ends with an element in Y.

Page 9: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Link farms

Link farms are usually densely connected via multiple overlapping small bipartite cores.

Task: to detect densely connected bipartite components from “document - complete link” matrix

Page 10: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Algorithm for finding bipartite components

Page 11: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Result: k=2 and l=2

Page 12: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Adjustment: document-document matrix

Page 13: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Final matrix

Page 14: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Weighted adjacency matrix

Page 15: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Experiment: HITS result of “rental car” http://www.discountcars.net/ http://www.motel-discounts.com/ http://www.stlouishoteldeals.com/ http://www.richmondhoteldeals.com/ http://www.jacksonvillehoteldeals.com/ http://www.jacksonhoteldeals.com/ http://www.keywesthoteldeals.com/ http://www.austinhoteldeals.com/ http://www.gatlinburghoteldeals.com/ http://www.ashevillehoteldeals.com/

Page 16: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Experiment: B&H HITS result of “rental car” http://www.rentadeal.com/ http://www.allaboutstlouis.com/ http://www.allaboutboston.com/ https://travel2.securesites.com/ about_travelguides/addlisting.html http://www.allaboutsanfranciscoca.com/ http://www.allaboutwashingtondc.com/ http://www.allaboutalbuquerque.com/ http://www.allabout-losangeles.com/ http://www.allabout-denver.com/ http://www.allabout-chicago.com/

Page 17: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Experiment: CL-HITS result of “rental car” http://www.hertz.com/ http://www.avis.com/ http://www.nationalcar.com/ http://www.thrifty.com/ http://www.dollar.com/ http://www.alamo.com/ http://www.budget.com/ http://www.enterprise.com/ http://www.budgetrentacar.com/ http://www.europcar.com/

Page 18: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Experiment: B&H HITS result of “translation online” http://www.no-gambling.com/ http://www.teleorg.org/ http://ong.altervista.org/ http://bx.b0x.com/ http://video-poker.batcave.net/ http://www.websamba.com/marketing-campaigns http://online-casino.o-f.com/ http://caribbean-poker.webxis.com/ http://roulette.zomi.net/ http://teleservices.netfirms.com/

Page 19: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Experiment: CL-HITS result of “translation online” http://www.freetranslation.com/ http://www.systransoft.com/ http://babelfish.altavista.com/ http://www.yourdictionary.com/ http://dictionaries.travlang.com/ http://www.google.com/ http://www.foreignword.com/ http://www.babylon.com/ http://www.worldlingo.com/products_services /worldlingo_translator.html http://www.allwords.com/

Page 20: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Duplicate example: BH-HITS result of “maps” http://www.maps.com/ http://www.mapsworldwide.com/ http://www.cartographic.com/ http://www.amaps.com/ http://www.cdmaps.com/ http://www.ewpnet.com/maps.htm http://mapsguidesandmore.com/ http://www.njdiningguide.com/maps.html http://www.stanfords.co.uk/ http://www.delorme.com/

Page 21: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Duplicate example: CL-HITS result of “maps” http://www.maps.com/ http://maps.yahoo.com/ http://www.delorme.com/ http://tiger.census.gov/ http://www.davidrumsey.com/ http://memory.loc.gov/ammem/gmdhtml/gmdhome.html http://www.esri.com/ http://www.maptech.com/ http://www.streetmap.co.uk/ http://www.libs.uga.edu/darchive/hargrett/maps/maps.html

Page 22: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

User evaluation

Category HITS BHITS CL-HITS CL-POP

Quite relevant 12.9% 24.5% 48.4% 46.3%

Relevant 10.7% 18.3% 28.8% 26.2%

Not sure 6.6% 10.5% 6.7% 6.4%

Irrelevant 26.8% 14.8% 11.3% 12.7%

Totally irrelevant 42.8% 31.9% 4.6% 8.1%

Page 23: Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Discussion

Using link alone, the precision at 10 is 66.4%. Much lower than using “complete link”.

Random anchor texts.