Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern...

24
Ao-Jan Su Y. Charlie Hu Aleksandar Kuzmanovic Cheng-Kok Koh Northwestern University Purdue University How to Improve Your Google Ranking: Myths and Reality

Transcript of Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern...

Page 1: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su†

Y. Charlie Hu‡

Aleksandar Kuzmanovic†

Cheng-Kok Koh‡

† Northwestern University‡ Purdue University

How to Improve Your Google Ranking:Myths and Reality

Page 2: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality22

Motivation

● Internet search engines (e.g. Google) drive users to highly ranked pages

● Search engines ranking results greatly influence how people acquire knowledge from the Internet [Pan ‘07]

● It is desirable to understand how a search engine ranks web pages

● Search engines’ ranking algorithms are proprietary■ Publicly available information is very limited and out-

dated

Page 3: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality33

Current Approaches

● Guess-works by webmasters■ Trial and error■ Inefficient

● Based on experience of search engine optimization (SEO) experts

Lack of systematical studies leads to folkloresLack of systematical studies leads to folklores

Page 4: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality44

Various Ranking Feature OpinionsSEO expertsSEO experts Survey of

Internet usersSurvey of

Internet usersIndividual Internet marketing expert

Individual Internet marketing expert

Page 5: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality55

Goals & Challenges

● Goals■ Systematically approximate a search engine’s ranking

results■ Identify the importance of ranking factors

● Reverse-engineering a search engines’ ranking algorithms can be very complicated■ Numerous ranking factors

− Google claims to have over 200 ranking factors

■ Sophisticated ranking functions

Page 6: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality66

Our Approach

● Build our own ranking system to approximate search engines’ ranking results

Learning models:• Linear programming • SVM

Recursive partitioning algorithm:• Capture non-equational behavior of ranking functions.

New ranking system:Generate our own ranking results and compare to Google’s

Page 7: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality77

System Architecture

● Components of our ranking system■ Crawler■ Ranking Engine

Can we approximate Google’s ranking results (top 10 pages) by using our own ranking system?

Can we approximate Google’s ranking results (top 10 pages) by using our own ranking system?

Page 8: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality88

Ranking Features

Page 9: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality99

),()(11

jiDjicWn

ij

n

i

i

Learning Models

● Linear programming model■ Minimize the distance between our ranking system and

Google’s■ Minimize objective function

● Support vector machine (SVM) learning models■ General technique for learning to rank programs■ Support linear and polynomial kernels

Weight: highly ranked pages are more important

Weight: highly ranked pages are more important

Ranking difference between the 2 pages

Ranking difference between the 2 pagesDecision function:

Out of order => penalty Decision function:

Out of order => penalty

Sum up the penaltiesSum up the penalties

Page 10: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality1010

Recursive Partitioning Algorithm

● Multiple layers of indices● Non-equational ranking algorithm

While we need to partition the set of |S| pages

While we need to partition the set of |S| pagesPartition the |S| pages into top half and bottom half

Partition the |S| pages into top half and bottom halfReturn top half of the |S| pages

and continue the recursionReturn top half of the |S| pages

and continue the recursion

The algorithm ends when we found top X pages

The algorithm ends when we found top X pages

Train or apply ranking models to the set of |S| pages

Train or apply ranking models to the set of |S| pages

Page 11: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality1111

Experimental Evaluation

● Evaluate different ranking models■ Which model has better prediction accuracy?

● Evaluate the effectiveness of recursive partitioning algorithm■ Can recursive partitioning algorithm improve prediction

accuracy?

● Evaluate the relative weights of ranking features■ Which ranking feature is more important?

Page 12: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality1212

Experimental Setup

● Crawl top 100 pages of 60 random keywords

● Randomly select 15 keywords as the training set with the rest 45 keywords as the testing set

● Evaluate the accuracy of our ranking system by predicting Google’s top 10 pages for each keyword in the testing set

Page 13: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality1313

Comparisons of Ranking Models

The performance of our customized linear learning is better than SVM-linear modelThe performance of our customized linear learning is better than SVM-linear model

The performance of the polynomial model is better than both linear models.At the cost of: (1)Significant increase of learning time(2)No human readable equations

The performance of the polynomial model is better than both linear models.At the cost of: (1)Significant increase of learning time(2)No human readable equations

For 78% of the explored keywords, our ranking system successfully predicts 7 or more pages within the top 10 pagesFor 78% of the explored keywords, our ranking system successfully predicts 7 or more pages within the top 10 pages

Page 14: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality1414

The Power of Recursive Partitioning

The recursive partitioning algorithm does help to improve accuracy of the ranking system in every roundThe recursive partitioning algorithm does help to improve accuracy of the ranking system in every round

3 rounds of recursive partitioning successfully “smooth out” the non-linearity of Google ranking algorithm and achieve a high prediction accuracy

3 rounds of recursive partitioning successfully “smooth out” the non-linearity of Google ranking algorithm and achieve a high prediction accuracy

Page 15: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality1515

Weights in Different Rounds in a Linear Model

In different rounds, the learning model produces different set of weightsIn different rounds, the learning model produces different set of weights

Page rank score, keyword in title and hostname are the top 3 ranking feature

Page rank score, keyword in title and hostname are the top 3 ranking feature

Keyword in meta-description tag matters but in meta-keyword tag does not

Keyword in meta-description tag matters but in meta-keyword tag does not

Page 16: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality1616

Case Studies

● Can we improve our ranking system’s accuracy by isolating a subset of ranking features■ Example: remove the age factor by focusing on “young”

pages

● Can we use our ranking system to detect biases in search engines’ ranking algorithms?■ Example: blogs

● Can we validate or disapprove new ranking features?■ Example: HTML syntax errors

Page 17: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality1717

Isolating Subsets of Ranking Features

We crawl web pages less or equal to 24 hours old to remove ranking features of age and page rank

We crawl web pages less or equal to 24 hours old to remove ranking features of age and page rankOur ranking system’s hit rate

improves to 80% for 92% of evaluated keywords

Our ranking system’s hit rate improves to 80% for 92% of evaluated keywords

When the ranking features are more specific, our ranking system performs betterWhen the ranking features are more specific, our ranking system performs better

Page 18: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality1818

Negative Bias Toward Blogs

We categorized web pages to different categories (e.g. blogs, news and music) and add a new ranking feature (hypothesis) into our ranking system

We categorized web pages to different categories (e.g. blogs, news and music) and add a new ranking feature (hypothesis) into our ranking system

The accuracy of our ranking system improves and the weight of the new ranking feature (blog) is negative

The accuracy of our ranking system improves and the weight of the new ranking feature (blog) is negative

Page 19: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality1919

HTML Syntax Errors do not Matter

We add a new ranking feature (hypothesis) for the number of HTML syntax errors in each web page

We add a new ranking feature (hypothesis) for the number of HTML syntax errors in each web page

The performance of the new ranking model is very close to the original one -> the new ranking feature does not make an impact

The performance of the new ranking model is very close to the original one -> the new ranking feature does not make an impact

Page 20: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality2020

Conclusions

● In this work, we show that it is possible to systematically approximate Google’s ranking results with high accuracy■ By a linear learning model incorporated with a recursive

partitioning scheme

● We reveal the relative importance of ranking features in Google’s ranking function

● We illustrate our system can validate or disapprove ranking features and detect ranking bias

Page 21: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality2121

Thank you!

Page 22: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality2222

Backup Slides

Page 23: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

Linear Programming Model

Page 24: Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

Query Keywords