Information technology in business and society

50
INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 9 – SEARCH AND ADVERTISING SEAN J. TAYLOR

description

Information technology in business and society. Session 9 – Search and Advertising Sean J. taylor. Administrativia. Assignment 2 online d ue Saturday 2/25 at 1am Assignment 2 resources Assignment 3 preview Guest speaker on Tuesday 2/28: Chrys Wu discussing IT and Journalism - PowerPoint PPT Presentation

Transcript of Information technology in business and society

Page 1: Information technology in business and society

INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETYSESSION 9 – SEARCH AND ADVERTISING

SEAN J. TAYLOR

Page 2: Information technology in business and society

ADMINISTRATIVIA

• Assignment 2 onlinedue Saturday 2/25 at 1am

• Assignment 2 resources• Assignment 3 preview• Guest speaker on Tuesday 2/28:

Chrys Wu discussing IT and Journalism• Substitute on Thursday 3/1

Professor Dylan Walker

Page 3: Information technology in business and society

LEARNING OBJECTIVES

1. Learn how search engines rank pages

2. Learn how to design effectively for high rankings

3. Learn how online advertising works, especially search ads and keyword auctions

4. The future of search

Page 4: Information technology in business and society

SEARCH ENGINES AND WEB DIRECTORIESResources on the Web that help you find sites with the information and/or services you want.

• Directory search engine - organizes listings of Web sites into hierarchical lists.

• Search engine - uses software agent technologies (or “spiders”, or “bots”) to search the Web for key words and place them into indexes.

Page 5: Information technology in business and society

WEB DIRECTORIES EXAMPLE

Advantages? Disadvantages?

Page 6: Information technology in business and society

SEARCH ENGINE EXAMPLES

Advantages? Disadvantages?

Page 7: Information technology in business and society

SEARCH ENGINES DRIVE ECOMMERCE!

Page 8: Information technology in business and society

WHERE IS CONSUMERS ATTENTION?

Page 9: Information technology in business and society
Page 10: Information technology in business and society

EYETRACKING STUDY OF GOOGLE RESULTS

Page 11: Information technology in business and society

– Search engines discover new pages by following links

– Keep track of words that appear in pages and when you enter a query, the search engine returns a ranked list

– Text content is important! But is not enough! (Why?)

How do search engines rank pages?(why does this matter?)

HOW SEARCH ENGINES WORK

Page 12: Information technology in business and society

PAGERANK IS REALLY A “RANDOM SURFER” MODEL

Random Surfer Model:

T 1 W)1( 22)1( WW)1(1

1

What about getting stuck in loops? takes care of that

Let’s count the surfer’s that pass through each point:

Transfer Matrix: The probability that a surfer follows a link from webpage i to webpage j is = [Prob. you were not “picked up”] * [prob. of following link i->j ]

The matrix if page i links to page j

Page 13: Information technology in business and society

MEASURING IMPORTANCE OF LINKING

PageRank Algorithm

Idea: important pages are pointed to by other important pages

Method:• Each link from one page to another is counted as a “vote” for the

destination page

• The number of incoming links is important!• But it is not enough!

• But each “vote” is different! PageRank places more importance to votes that come from pages with large number of votes (and so on, and so on)

Compare, for example, the cases for the circled page in cases A and B

A

B

Page 14: Information technology in business and society

People who bought this also bought…

BOOK A

book Bbook Cbook D

People who bought this also bought…

BOOK D

book CPeople who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book Abook C

(ignoring damping factor for illustration)

COMPUTING PAGERANK

Page 15: Information technology in business and society

People who bought this also bought…

BOOK A

book Bbook Cbook D

People who bought this also bought…

BOOK D

book CPeople who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book Abook C

COMPUTING PAGERANK

(ignoring damping factor for illustration)

Page 16: Information technology in business and society

PAGERANK

People who bought this also bought…

BOOK A

book Bbook Cbook D

People who bought this also bought…

BOOK D

book CPeople who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book Abook C.250 .250

.250 .250

(ignoring damping factor for illustration)

Page 17: Information technology in business and society

PAGERANK

People who bought this also bought…

BOOK A

book Bbook Cbook D

People who bought this also bought…

BOOK D

book CPeople who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book Abook C.250 .250

.250 .250

.250/3

.250

.250/3

.250/2

.250.250/3 .250/2

(ignoring damping factor for illustration)

Page 18: Information technology in business and society

PAGERANK

People who bought this also bought…

BOOK A

book Bbook Cbook D

People who bought this also bought…

BOOK D

book CPeople who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book Abook C

.250/3

.250

.250/3

.250/2

.250.250/3 .250/2

.375 .083

.083 .458

(ignoring damping factor for illustration)

Page 19: Information technology in business and society

PAGERANK

People who bought this also bought…

BOOK A

book Bbook Cbook D

People who bought this also bought…

BOOK D

book CPeople who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book Abook C

.375/3

.083

.375/3

.083/2

.458.375/3 .083/2

.375 .083

.083 .458

(ignoring damping factor for illustration)

Page 20: Information technology in business and society

PAGERANK

People who bought this also bought…

BOOK A

book Bbook Cbook D

People who bought this also bought…

BOOK D

book CPeople who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book Abook C

.375/3

.083

.375/3

.083/2

.458.375/3 .083/2

.500 .125

.125 .250

(ignoring damping factor for illustration)

Page 21: Information technology in business and society

PAGERANK

People who bought this also bought…

BOOK A

book Bbook Cbook D

People who bought this also bought…

BOOK D

book CPeople who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book Abook C.400 .133

.133 .333

.400/3

.133

.400/3

.133/2

.333.400/3 .133/2

(ignoring damping factor for illustration)

Page 22: Information technology in business and society

GAMING PAGERANK AND TRUST

TrustRank Algorithm

Initial votes come only from trusted pages

Compare, for example, the cases for the circled page in cases A and B B

trusted page

trusted page

Links from untrusted sources

A

Page 23: Information technology in business and society

SIMULATINGCHANGES IN PAGERANK

People who bought this also bought…

BOOK A

book Bbook Cbook D

People who bought this also bought…

BOOK D

book CPeople who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book Abook C

Change PR of A PR of C

C cuts link to A 0.18 0.50

C links to B 0.38 0.33

C links to D 0.24 0.40

C links to B & D 0.22 0.38

.400 .133

.133 .333

Page 24: Information technology in business and society

IMPORTANCE OF ANCHOR TEXT

<a href=http://www.sims…>INFOSYS 141</a>

<a href=http://www.sims…>A terrific course on search engines</a>

The anchor text summarizes what the website is about.

Page 25: Information technology in business and society

OTHER RANKING FACTORS

Location, Location, Location...and Frequency• Query words in title, or in first few sentences• The more frequent the query words, the better

Click through measurement• How often users click on your URL, when they

see it• How long do they stay (using toolbars!)

Page 26: Information technology in business and society

OUTLINE1. Learn how search engines rank pages

2. Learn how to design effectively for high rankings

3. Learn how online advertising works, especially search ads and keyword auctions

4. The future of search

Page 27: Information technology in business and society

ACHIEVING HIGHER RESULTS RANKINGS• Position your keywords (title, headings, early on page)

• Make text visible (no tiny fonts, no white-on-white)

• Frames can kill• Have relevant content• Do not change topics• Just say no to search engine spamming • Submit your key pages• Verify your listing often

Page 28: Information technology in business and society

Motives• Commercial, political, religious, lobbies• Promotion funded by advertising budget

Operators• Contractors (Search Engine Optimizers) for lobbies,

companies• Web masters• Hosting services

What are the techniquesused by rankings manipulators?

MANIPULATING RANKINGS

Page 29: Information technology in business and society

MANIPULATION TECHNOLOGIESCloaking

• Serve fake content to search engine robot• DNS cloaking: Switch IP address. Impersonate

Doorway pages• Pages optimized for a single keyword that re-direct

to the real target page Keyword Spam

• Misleading meta-keywords, excessive repetition of a term, fake “anchor text”

• Hidden text with colors, CSS tricks, etc.Link spamming

• Mutual admiration societies, hidden links, awards• Domain flooding: numerous domains that point or

re-direct to a target pageRobots

• Fake click stream• Fake query stream

Is this a SearchEngine spider?

N

Y

SPAM

FakeDoc

Cloaking

Meta-Keywords = “… London hotels, hotel, holiday inn, hilton, discount, booking, reservation, sex, mp3, britney spears, viagra, …”

Risky to use any of these as search engines aregetting better at detecting and punishing them

Page 30: Information technology in business and society

OUTLINE1. Learn how search engines rank pages

2. Learn how to design effectively for high rankings

3. Learn how online advertising works, especially search ads and keyword auctions

4. The future of search

Page 31: Information technology in business and society

PAID RANKING

Keyword bidding for targeted ads• Pay-per-click• Higher bids result in higher ranks for the ad• Higher percentage of clicks on the ad, increase

the rank as well (why?)

Google's AdWords is the biggest player• Google’s 2007 revenue was more than $16

Billion, 2008 ~ $22 Billion, mostly from such ads

Promoting without Manipulation: Paid placement

Page 32: Information technology in business and society

EXAMPLE

AdWordsPlacement

AdWords Placement

Most relevant sites

Page 33: Information technology in business and society
Page 34: Information technology in business and society

FUND YOUR WEBSITE: ADSENSEGoogle also delivers ads to other websitesSign-up for Google AdSense, and Google delivers ads to your website (common source of income for “professional” bloggers)

How ads are delivered:

• If website best for targeted keywords

• If users of website click on results

Strategies for successful ads:

• Place the ads on top

• Blend with the rest of the website

• Ads at the bottom are ignored consistently

Page 35: Information technology in business and society

EXAMPLE: WASHINGTON POSTWEBSITE

Page 36: Information technology in business and society

Analysis of Washington Post

Website

Page 37: Information technology in business and society

TARGETING BANNER ADS

Request for Ad from Ad Server

IP AddressCountry, Domain, CompanyBrowser, Operating System

Surfing Behavior from cookiesDemographic Data?

Targeted Ad isDelivered to

User

Context:Movie reviewsUser Profile:

NYU userNew York

Page 38: Information technology in business and society

UserVisits

PublisherSites

Ads Delivered By Dart For Advertisers

DART For

Advertisers

BoomerangCaptures User

Action DataData Analysis

Databank

Boomerang Compiles & Reports Response For Future Targeting

User Clicks &Visits

Advertiser’sSite

CLOSED LOOP MARKETING

Source: Doubleclick, Inc.

Page 39: Information technology in business and society

FUTURE OF SEARCH

1. Information Extraction:Search on Structured Data

2. Social Search3. Privacy Preserving Search

Page 40: Information technology in business and society

INFORMATION EXTRACTION

Information extraction applications extract structured relations from unstructured textMay 19 1995, Atlanta -- The Centers for Disease Control

and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire , is finding itself hard pressed to cope with the crisis…

Date Disease Name LocationJan. 1995 Malaria EthiopiaJuly 1995 Mad Cow Disease U.K.

Feb. 1995 Pneumonia U.S.May 1995 Ebola Zaire

Disease Outbreaks in The New York Times

Information Extraction System

(e.g., NYU’s Proteus)

Page 41: Information technology in business and society

RETURN STRUCTURED ANSWERS, NOT WEBPAGES

Page 42: Information technology in business and society

FUTURE OF SEARCH

1. Information Extraction:Search on Structured Data

2. Social Search3. Privacy Preserving Search

Page 43: Information technology in business and society

Y! ANSWERSLaunched in second half of 2005

Incentive system based on points and voting for best answers

Questions grouped by category

Some statistics: • over 60 million users• over 120 million answers, available in 18 countries and

in 6 languages

Page 44: Information technology in business and society
Page 45: Information technology in business and society

Y! ANSWERS

Page 46: Information technology in business and society

Y! ANSWERS

Page 47: Information technology in business and society

LONG-TERM PROSPECTSQuestions follow a power-law:

•Large number of questions will be asked by many people (20% of questions80% of requests)

•We only need one answer for each question•Acquire quickly high-quality answers for 80% of queries

•…people will take care in time of the “long tail” of the remaining questions

Page 48: Information technology in business and society

FUTURE OF SEARCH

1. Information Extraction:Search on Structured Data

2. Social Search3. Privacy Preserving Search

Page 49: Information technology in business and society

PRIVACY PRESERVING SEARCH

Page 50: Information technology in business and society

NEXT CLASS:SOCIAL NETWORKS

• Work on Assignment 2