Skills, Reputation, and Search

46
Skills, Reputation, and Search Pete Skomoroch Principal Data Scientist, LinkedIn

description

This keynote presentation describes the critical role that search and Lucene has in building next generation products that understand reputation and relevance. We also describe how data science and machine learning have been applied at LinkedIn to collect, interpret, and index data around topical reputation. Lucene Revolution is the biggest open source conference dedicated to Apache Lucene/Solr.

Transcript of Skills, Reputation, and Search

Page 1: Skills, Reputation, and Search

Skills, Reputation, and Search

Pete SkomorochPrincipal Data Scientist, LinkedIn

Page 2: Skills, Reputation, and Search

2

Vision: Create Economic Opportunity for Every Professional

TimeLocation

Page 3: Skills, Reputation, and Search

©2012 LinkedIn Corporation. All Rights Reserved.

3

LinkedIn: The Professional Profile of Record

200+MMembers 200M MemberProfiles

Page 4: Skills, Reputation, and Search

4

LinkedIn Search: Connecting Talent with Opportunity

Page 5: Skills, Reputation, and Search

5

Skills Correlated with the Job Title “Data Scientist”

Page 6: Skills, Reputation, and Search

6

Skills Related to “Big Data”

Page 7: Skills, Reputation, and Search

7

Information Retrieval

Page 8: Skills, Reputation, and Search

8

Soul Retrieval

Page 9: Skills, Reputation, and Search

9

Page 10: Skills, Reputation, and Search

10

Lucene on LinkedIn

Page 11: Skills, Reputation, and Search

11

Lucene Endorsement Graph

Page 12: Skills, Reputation, and Search

12

Solr on LinkedIn

Page 13: Skills, Reputation, and Search

13

Solr Endorsement Graph

Page 14: Skills, Reputation, and Search

14

Reputation: Building the Endorsement Graph

Page 15: Skills, Reputation, and Search

15

Viral Growth: 1 Billion Endorsements in 5 Months

Page 16: Skills, Reputation, and Search

16

How Did We Gather this Data?

1. Desire + Social Proof

2. Viral Loops + Network Effects

3. Data Foundation + Recommendation Algorithms

Page 17: Skills, Reputation, and Search

17

1) Desire & Social Proof

Page 18: Skills, Reputation, and Search

A endorses

B

B notified

B “accepts” endorsement

B endorses

C

B endorses

D

Endorsement recommendations

Email NotificationNews Feed2) Viral Loops & Network Effects

Page 19: Skills, Reputation, and Search

19

3) Data Foundation: Skills & Suggested Skills

Page 20: Skills, Reputation, and Search

20

Data Foundation: LinkedIn Skills

Page 21: Skills, Reputation, and Search

Social Tagging Accelerates Adoption

Suggested endorsements

Skill recommendations

Skill marketing

©2012 LinkedIn Cororation. All Rights Reserved.

Virality only

Page 22: Skills, Reputation, and Search

22

Outline

Skill discovery

Skill tagging

Skill recommendations

Suggested endorsements

Page 23: Skills, Reputation, and Search

23

Skill Discovery: Unsupervised Topics from Profiles

Extract

Page 24: Skills, Reputation, and Search

24

Topic Clustering & Phrase Sense Disambiguation

Page 25: Skills, Reputation, and Search

25

Deduplication Signals from Mechanical Turk

Page 26: Skills, Reputation, and Search

26

Sample Task for Mechanical Turk Workers

Page 27: Skills, Reputation, and Search

27

Skill Phrase Deduplication

Page 28: Skills, Reputation, and Search

28

Outline

Skill discovery

Skill tagging

Skill recommendations

Suggested endorsements

Page 29: Skills, Reputation, and Search

29

Lead designer and engineer for the implementation of a user-centric, fully-configurable UI for data aggregation and reporting.Developed over 20 SaaS custom applications using Python, Javascript and RoR.

Tagging Skill Phrases

Tagging: Extract potential skill phrases from text

Standardize unambiguous phrase variants

JavaScript RoR SaaS Python

ror

rubyonrails

ruby on rails development

ruby rails

ruby on rail

Ruby on Rails

Document (ex: Profile)

Tokenization

Skills Tagger

Phrases

(up to 6 words)

Skills Classifier

Skills

(unordered)

Skills

(ranked by relevance)

Page 30: Skills, Reputation, and Search

30

Outline

Skill discovery

Skill tagging

Skill recommendations

Suggested endorsements

Page 31: Skills, Reputation, and Search

31

Skill Inference

How suggested/inferred skills work:

– The skill likelihood is a conditional model

– Probabilities are combined using a Naïve Bayes Classifier

If you are an engineer at Apple, you probably know about iPhone Development.

Profile

Extract attributes

- Company ID

- Title ID

- Groups ID

- Industry ID

- …

Skills Classifier

Skills

(ranked by likelihood)

Feature

Vectors

Page 32: Skills, Reputation, and Search
Page 33: Skills, Reputation, and Search
Page 34: Skills, Reputation, and Search
Page 35: Skills, Reputation, and Search
Page 36: Skills, Reputation, and Search
Page 37: Skills, Reputation, and Search

37

Skill Recommendations for Your LinkedIn Profile

49% Conversion

4% Conversion

Page 38: Skills, Reputation, and Search

38

Outline

Skill discovery

Skill tagging

Skill recommendations

Suggested endorsements

Page 39: Skills, Reputation, and Search

39

Social Tagging via Skill Endorsements

Page 40: Skills, Reputation, and Search

Social Tagging Accelerates Adoption

Skill endorsements

Skill recommendations

Skill marketing

©2012 LinkedIn Cororation. All Rights Reserved.

Page 41: Skills, Reputation, and Search

41

Data Amplifies Desire

1. Desire + Social Proof

2. Viral Loops + Network Effects

3. Data Catalyst + Recommendation Algorithms

Page 42: Skills, Reputation, and Search

42

Over 58 Million Profiles are now Tagged with Skills

Page 43: Skills, Reputation, and Search

43

All This Data Flows Back Into Our Lucene Index

Page 44: Skills, Reputation, and Search

44

Helping us Connect Talent & Opportunity

TimeLocation

Page 45: Skills, Reputation, and Search

©2012 LinkedIn Corporation. All Rights Reserved.

45

Questions?

We’re hiring: data.linkedin.com

@peteskomoroch

Page 46: Skills, Reputation, and Search

CONTACTPete Skomoroch@peteskomoroch

http://data.linkedin.com