Advanced Keyword Research SMX Toronto March 2013

Post on 08-May-2015

356 views 2 download

Transcript of Advanced Keyword Research SMX Toronto March 2013

1

2

3

4

Slide from LIS 544 IMT 542 INSC 544 by Jeff Huang lazyjeff@uw.edu and Shawn Walker stw3@uw.edu The document with the highest proportion of terms which are part of the query is most relevant • Documents containing more of the term(s) scored higher • Longer documents discounted • Rare terms weighted higher

5

6

Hilltop was one of the first to introduce the concept of machine-mediated “authority” to combat the human manipulation of results for commercial gain (using link blast services, viral distribution of misleading links. It is used by all of the search engines in some way, shape or form. Hilltop is: •Performed on a small subset of the corpus that best represents nature of the whole •Authorities: have lots of unaffiliated expert document on the same subject pointing to them •Pages are ranked according to the number of non-affiliated “experts” point to it – i.e. not in the same site or directory •Affiliation is transitive [if A=B and B=C then A=C] The beauty of Hilltop is that unlike PageRank, it is query-specific and reinforces the relationship between the authority and the user’s query. You don’t have to be big or have a thousand links from auto parts sites to be an “authority.” Google’s 2003 Florida update, rumored to contain Hilltop reasoning, resulted in a lot of sites with extraneous links fall from their previously lofty placements as a result. Google artificially inflates the placement of results from Wikipedia because it perceives Wikipedia as an authoritative resources due to social mediation and commercial agnosticism. Wikipedia is not infallible. However, someone finding it in the “most relevant” top results will certainly see it as so.

Computes PR based on a set of representational topics [augments PR with content analysis] Topic derived from the Open Source directory Uses a set of ranking vectors: Pre-query selection of topics + at-query comparison of the similarity of query to topics

8

Pew Internet Trust Study of Search engine behavior http://www.pewinternet.org/Reports/2012/Search-Engine-Use-2012/Summary-of-findings.aspx Moreover, users report generally good outcomes and relatively high confidence in the capabilities of search engines: • 91% of search engine users say they always or most of the time find the information they are

seeking when they use search engines • 73% of search engine users say that most or all the information they find as they use search

engines is accurate and trustworthy • 66% of search engine users say search engines are a fair and unbiased source of information • 55% of search engine users say that, in their experience, the quality of search results is getting

better over time, while just 4% say it has gotten worse • 52% of search engine users say search engine results have gotten more relevant and useful over

time, while just 7% report that results have gotten less relevant Using the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 2009 56% constructed poor queries 55% selected irrelevant results 1 or more times 38% overwhelmed by amount of information in results 34% found critical information missing from results

9

10

11

12

13

14

15

16

17

18

19

20

21

22