Implicit feedback: Good may be better than best Steve Lawrence.

47
Implicit feedback: Good may be better than best Steve Lawrence

Transcript of Implicit feedback: Good may be better than best Steve Lawrence.

Page 1: Implicit feedback: Good may be better than best Steve Lawrence.

Implicit feedback: Good may be better than best

Steve Lawrence

Page 2: Implicit feedback: Good may be better than best Steve Lawrence.

Limitations of the web

• Dead links• Lack of support for author royalties• Poor indexing and navigation support• Better system?

– Enforce link consistency– Allow authors to collect royalties– Support for better navigation and indexing

Page 3: Implicit feedback: Good may be better than best Steve Lawrence.

Web

• Xanadu (1960)– “Improved” design, fixes all of these limitations– Essentially unused

• The web– Widely used

• Disadvantages of the “improved” design– Extra effort imposed on users– Added complexity in the system– Extended development time

• e.g., if link consistency is enforced, no longer can anyone make information available simply by putting a file in a specific directory

• The web has become very popular in part due to its limitations

• “Good may be better than best”

Page 4: Implicit feedback: Good may be better than best Steve Lawrence.

Web vs. Xanadu

• Ted Nelson– Much credit: “hypertext”, inspiration for the web, Lotus

notes, HyperCard– More to Xanadu not covered here (transclusion,

bidirectional links, version management)

• According to Nelson: – “On both the desktop and world-wide scale, culturally

and commercially, we are poorer for these bad tools [the web]”

– “The World Wide Web is precisely what we were trying to prevent”

Page 5: Implicit feedback: Good may be better than best Steve Lawrence.

CiteSeer

• CiteSeer– Metadata not required for submission– Specific citation formats not required

• More optimal system?– Require manual submission which specifies title, author, etc.

(CORR)– Require citations to be submitted in a specific form (Cameron)

• CiteSeer is likely to contain more errors• Error rate on articles not processed is 100%

– Value of explicit feedback not obtained is 0

• Much lower overhead and complexity for users

Page 6: Implicit feedback: Good may be better than best Steve Lawrence.

Implicit vs. explicit feedback

• Explicit feedback– Overhead for the user

• Implicit feedback– No overhead for the user

• Implicit feedback may be better than explicit feedback because you may not be able to get sufficient explicit feedback

• Other issues - accuracy of feedback

Page 7: Implicit feedback: Good may be better than best Steve Lawrence.

“Good may be better than best”

• Not a binary choice– Often many possible systems

• Also– “Worse is better”– “Best is the worst enemy of good”– “MIT approach” vs. “New Jersey approach” for design (Gabriel)

• The increased overhead, complexity and/or cost (for the system and/or the users), and extended development times of more optimal systems may make them far less successful than alternatives

Page 8: Implicit feedback: Good may be better than best Steve Lawrence.

Convenience of access

119,924 conference articles (bibliographical data from DBLP)

Page 9: Implicit feedback: Good may be better than best Steve Lawrence.

Explicit metadata usage

• Only 34% of sites use description or keywords tags on their homepage– Analyzed 2,500 random servers

• 0.3% of sites contained Dublin Core tags

• “Attention is the scarce resource.” Herb Simon (1967)

• Difficult to obtain explicit feedback

Page 10: Implicit feedback: Good may be better than best Steve Lawrence.

Implicit vs. explicit feedback

• Limitations of implicit feedback– Hard to determine the meaning of a click. If the best

link is not displayed, users will still click on something– Click duration may be misleading

• People leave machines unattended• Opening multiple windows quickly, then reading them all

slowly• Multitasking

• Limitations of explicit feedback– Spam– Inconsistent ratings

Page 11: Implicit feedback: Good may be better than best Steve Lawrence.

CiteSeer

Page 12: Implicit feedback: Good may be better than best Steve Lawrence.

CiteSeer

• Scientific literature digital library• Over 600,000 documents indexed

– Earth’s largest free full-text index of scientific literature– (Los Alamos arXiv about 200,000 papers)

• Over 20,000 hosts accessing the site daily• Accesses from over 150 countries per month• Over 10 requests per second at peak times

Page 13: Implicit feedback: Good may be better than best Steve Lawrence.

Improving implicit feedback

• Have to go to details page before getting link to article– Have seen abstract before downloading– Shown context of citations before downloading

Page 14: Implicit feedback: Good may be better than best Steve Lawrence.

No download link

Page 15: Implicit feedback: Good may be better than best Steve Lawrence.

Document information page

Page 16: Implicit feedback: Good may be better than best Steve Lawrence.

Citation context

Page 17: Implicit feedback: Good may be better than best Steve Lawrence.

CiteSeer: explicit feedback

• Document ratings and comments

Page 18: Implicit feedback: Good may be better than best Steve Lawrence.

CiteSeer: explicit feedback

• Allow users to correct errors• Authors may be motivated to correct errors

relating to their own work• How many explicit corrections? (About 600,000

papers)• How many explicit ratings? (percentage of

document accesses)

Page 19: Implicit feedback: Good may be better than best Steve Lawrence.

Explicit feedback

• Over 300,000 explicit corrections/updates– How many bogus updates?– (We require a validated email address)

• Explicit ratings: 0.17% of document accesses

Page 20: Implicit feedback: Good may be better than best Steve Lawrence.

Explicit corrections

• Over 100 bogus correction attempts

Page 21: Implicit feedback: Good may be better than best Steve Lawrence.

Comparison of feedback types

• How well do document access, document downloads, and explicit ratings predict high-citation papers?

• Low citation papers (<= 5 citations)• High citation papers (> 5 citations)• Ratio of downloads/accesses/ratings for high to

low-citation papers– Accesses ?– Downloads ?– Ratings ?

Page 22: Implicit feedback: Good may be better than best Steve Lawrence.

Comparison of feedback types

• Low citation papers (<= 5 citations)• High citation papers (> 5 citations)

• Ratio of downloads/accesses/ratings for high to low-citation papers

• Accesses 2.5• Downloads 3.1

• Ratings 0.96 (low 2.3 high 2.2)

Page 23: Implicit feedback: Good may be better than best Steve Lawrence.

CiteSeer: user profiling

• Profiling system not currently active (scale)• Profile contains documents, citations, keywords,

etc. of interest• User notified of new related documents or

citations by email or via the web interface• Both implicit and explicit feedback• Record the actions of a user for

recommendations– View – Download– Ignore

Page 24: Implicit feedback: Good may be better than best Steve Lawrence.
Page 25: Implicit feedback: Good may be better than best Steve Lawrence.
Page 26: Implicit feedback: Good may be better than best Steve Lawrence.

CiteSeer: user profiling

• Implicit feedback should be more successful in CiteSeer due to citation context, query-sensitive summaries, document details pages, and the expense of document downloads– Users can better determine the relevance of

documents before they request details or download articles

• Analyze co-viewed/downloaded documents to recommend documents related to a given document– Similar to one of Amazon’s book recommenders

Page 27: Implicit feedback: Good may be better than best Steve Lawrence.

Profile creation

• (Pseudo)-documents added to user’s profile whenever a user performs an action in the profile editor or on a real document when browsing

• Action “interestingness” a(.)Explicitly added to profile Very high positiveDownloaded High positiveDetails viewed Moderate positiveRecommendation ignored Low negativeRemoved from profile Set to zero

Page 28: Implicit feedback: Good may be better than best Steve Lawrence.

Paper recommendations

• New papers recommended periodically via email or the web interface

• New paper d* recommended if it has a sufficiently high “interestingness”

• Threshold initially set at a small positive value

Dd

ddD ddRwdI *),(*)(

Page 29: Implicit feedback: Good may be better than best Steve Lawrence.

Profile adaption

• Adaption occurs via manual adjustment and machine learning

• User can explicitly modify a profile by adjusting the weight of pseudo-documents

• Browsing actions implicitly modify the weight of corresponding pseudo-documents

• User response to recommendation of a paper d* is used to update weights that contributed to the recommendation

• where is the learning rate

*),(*)( ddRdaww ddd

Page 30: Implicit feedback: Good may be better than best Steve Lawrence.

Weight update rule properties

• Weights modified according to their contribution to recommendations

• Overall precision/recall threshold automatically adapted. Ignoring recommendations raises the threshold for recommending a paper. Explicitly adding papers lowers the threshold

• The influence of different relatedness measures is adapted separately

Page 31: Implicit feedback: Good may be better than best Steve Lawrence.

REFEREE

• Recommender framework where outside groups can test recommendation systems live on CiteSeer

• Implemented a version of Pennock’s Personality Diagnosis recommender for initial testing

ResearchIndex

RFD

Events,recommendation

requests

Recs

Broadcast events, request

recommendations from specific engines

Logs

Recs

Events

RFD Client

Recommender Engine

Requests to join,recommendations when requested

Page 32: Implicit feedback: Good may be better than best Steve Lawrence.

REFEREE

• Statistics on recommender performance available quickly

• For evaluation we focus on measuring impact on user behavior

• Implicit feedback more effective because users see a lot of information about documents before they can download them

• Which recommenders best?– Users who viewed x also viewed?– Exact sentence overlap?– Papers that cite this paper?– Citation similarity?

Page 33: Implicit feedback: Good may be better than best Steve Lawrence.

Recommendations followed

Recommendation type Recommendations followed

Sentence overlap 8.2%

Cited by 5.1%

CCIDF (bibliographic coupling) 3.1%

PD-1 2.1%

Users who viewed 2.0%

PD-2 2.0%

Co-citation 1.9%

Page 34: Implicit feedback: Good may be better than best Steve Lawrence.

NewsSeer

Page 35: Implicit feedback: Good may be better than best Steve Lawrence.

NewsSeer

• Primarily a single page with implicit feedback only

• Also supports explicit feedback but this is optional

Page 36: Implicit feedback: Good may be better than best Steve Lawrence.
Page 37: Implicit feedback: Good may be better than best Steve Lawrence.
Page 38: Implicit feedback: Good may be better than best Steve Lawrence.
Page 39: Implicit feedback: Good may be better than best Steve Lawrence.

NewsSeer statistics

• About 1 million pageviews• About 10,000 users (>= 5 requests)

– 5,000 users (>= 10 requests)

• How many users rated an article?• What percentage of requests were ratings on the

homepage?• What percentage of requests were for the source

ratings page?

Page 40: Implicit feedback: Good may be better than best Steve Lawrence.

NewsSeer statistics

• 1,000 users rated an article from the 10,000 with >= 5 requests– About 10%– About 20% of the top 2,500 users– About 30% of the top 1,000 users– 20 of 56 users that did >1,000 requests– 10 of 21 users that did >2,000 requests

• Homepage 51% (auto-reloaded)• View article 40%• Keyword query 4% (was not available initially)• Ratings on homepage 5%• Source rating page views 0.2%

Page 41: Implicit feedback: Good may be better than best Steve Lawrence.

MusicSeer

Page 42: Implicit feedback: Good may be better than best Steve Lawrence.

Music similarity

Page 43: Implicit feedback: Good may be better than best Steve Lawrence.

Music similarity

• Music similarity survey

• Erdös game

Page 44: Implicit feedback: Good may be better than best Steve Lawrence.

• Erdös game

Music similarity

Page 45: Implicit feedback: Good may be better than best Steve Lawrence.

• Simple survey

Music similarity

Page 46: Implicit feedback: Good may be better than best Steve Lawrence.

MusicSeer

• Survey– 713 users, 10,997 judgments

• Game– 680 users, 11,313 judgments

Page 47: Implicit feedback: Good may be better than best Steve Lawrence.

Summary

• Implicit feedback may be better because there is much lower overhead

• Much greater participation may more than compensate for the less accurate information received

• Can structure system to maximize implicit feedback gained

• Can obtain explicit feedback if enough incentive, or easy enough