Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David...

29
Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei

Transcript of Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David...

Page 1: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Finding Tribes: Identifying Close-Knit Individuals fromEmployment Patterns

Lisa Friedland and David Jensen

Presented by Nick Mattei

Page 2: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Introduction

Tribes – groups with similar traits in a large graph

Distinguish those that work together and move together intentionally

Page 3: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Relationship Knowledge Discovery

Exploit connections among individuals to identify patterns and make predictions

Discover underlying dependencies Links must be inferred

Page 4: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Graph Mining

Discover Hidden Group Structures Animal Herds, Webpages, Employees

Time Series Analysis Co-integration (Economics)

Security and Intrusion Detection Dynamic Networks

Page 5: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Motivation

National Association of Securities Dealers

Fraud Collusion 4.8 Million Records 2.5 Million Reps at 560,000 Firms 100 Years of Data

Page 6: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Complications

Jobs not necessarily in order (or singletons) 20% of employees hold more than

one job at a time 10% begin multiple jobs (up to 16) on

one day Leave gaps between employment Mergers and acquisitions

Page 7: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Model

Page 8: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Finding Anomalously Related Entities Input:

Bipartite Graph: G = (R A, E) Entities: R = {r1, r2, …, rn} (People) Attributes: A = {a1, a2, …, am}

(Orgs.) Entities should connect several

attributes Model co-occurrence rates of pairs

of attributes

Page 9: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Algorithm

Page 10: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Simple Model Measures

JOBS = (Number of shared Jobs in the sequence)

YEARS = (Number of Years of overlap)

Page 11: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Example Sequences

Page 12: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Probabilistic Model

X = P(BrA -> BrB -> BrC -> BrD) = pa * tAB * tBC * tCD

Estimate: P(start branch i)

=(#reps ever at i) / (#reps in database) Tij = P(reps from i to j | #ever at i)

=(#reps leave i to go to j) / (ever at i)

ip

Page 13: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Probabilistic Model

Null Hypothesis of Independent Movement

Movement Not Random Split and Merge Markov Chains

Page 14: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Probabilistic Model (Different Paths)

Tij becomes Vij Vij = P(move to branch j at any point

after branch I | currently at i) = (# reps who go to branch j at any

point after working at i) / (# reps ever at i)

Now each vij >= tij and probabilities no longer sum to 1.

Page 15: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Probabilistic Model (Different Paths) Vij becomes Wij

Wij = P (move to branch j at any point simultaneous to or after branch i | currently at i)

= (# reps who start at j at any point simultaneous or after starting at i) / (# of reps ever at i)

Now less precise in respect to direct transitions but more general

Page 16: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

PROB - TIMEBINS Bins of 1 year or more 10 people worked at each branch in

a bin period PiX = # reps ever at i during time

X / # reps in DB yiXjY = # reps ever at I during time

X and at j during time Y, where Y >= X / # reps ever at i during time X

Page 17: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

PROB-NOTIME Ignores order of job moves Use original pi

Zij = raw number of reps who are at both branches I and j during career

Transition Pr from i to j: = (zij / # reps ever at i) != (zij / # reps ever at j) =transition Pr from j to i

Page 18: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Tribe Size

Page 19: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Pairs

Page 20: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Commonality of Job Sequence

Page 21: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Disclosure Scores

Page 22: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Homogenaity and Mobility

Page 23: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.
Page 24: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.
Page 25: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Discussion JOBS, PROB, PROB-TIME, PROB-

NOTIME create tribes with higher than average disclosure scores

PROB creates more cross zip code results

PROB-TIME has higher phi-squared than all others

PROB favors large firms

Page 26: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Discussion

JOBS and YEARS compute larger connected components

JOBS and PROB find same number of tribes but pick different groups as tribes

Page 27: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Conclusions

With no explicit knowledge we can discover: Job transitions Geography Career track

Page 28: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Conclusions

Needed: Ongoing process Multiple affiliations Arbitrary times Time is a paradox in domain

Page 29: Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei.

Thanks!

Time for: Questions Comments Smart Remarks