Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David...
-
Upload
jaheim-edge -
Category
Documents
-
view
216 -
download
2
Transcript of Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David...
Finding Tribes: Identifying Close-Knit Individuals fromEmployment Patterns
Lisa Friedland and David Jensen
Presented by Nick Mattei
Introduction
Tribes – groups with similar traits in a large graph
Distinguish those that work together and move together intentionally
Relationship Knowledge Discovery
Exploit connections among individuals to identify patterns and make predictions
Discover underlying dependencies Links must be inferred
Graph Mining
Discover Hidden Group Structures Animal Herds, Webpages, Employees
Time Series Analysis Co-integration (Economics)
Security and Intrusion Detection Dynamic Networks
Motivation
National Association of Securities Dealers
Fraud Collusion 4.8 Million Records 2.5 Million Reps at 560,000 Firms 100 Years of Data
Complications
Jobs not necessarily in order (or singletons) 20% of employees hold more than
one job at a time 10% begin multiple jobs (up to 16) on
one day Leave gaps between employment Mergers and acquisitions
Model
Finding Anomalously Related Entities Input:
Bipartite Graph: G = (R A, E) Entities: R = {r1, r2, …, rn} (People) Attributes: A = {a1, a2, …, am}
(Orgs.) Entities should connect several
attributes Model co-occurrence rates of pairs
of attributes
Algorithm
Simple Model Measures
JOBS = (Number of shared Jobs in the sequence)
YEARS = (Number of Years of overlap)
Example Sequences
Probabilistic Model
X = P(BrA -> BrB -> BrC -> BrD) = pa * tAB * tBC * tCD
Estimate: P(start branch i)
=(#reps ever at i) / (#reps in database) Tij = P(reps from i to j | #ever at i)
=(#reps leave i to go to j) / (ever at i)
ip
Probabilistic Model
Null Hypothesis of Independent Movement
Movement Not Random Split and Merge Markov Chains
Probabilistic Model (Different Paths)
Tij becomes Vij Vij = P(move to branch j at any point
after branch I | currently at i) = (# reps who go to branch j at any
point after working at i) / (# reps ever at i)
Now each vij >= tij and probabilities no longer sum to 1.
Probabilistic Model (Different Paths) Vij becomes Wij
Wij = P (move to branch j at any point simultaneous to or after branch i | currently at i)
= (# reps who start at j at any point simultaneous or after starting at i) / (# of reps ever at i)
Now less precise in respect to direct transitions but more general
PROB - TIMEBINS Bins of 1 year or more 10 people worked at each branch in
a bin period PiX = # reps ever at i during time
X / # reps in DB yiXjY = # reps ever at I during time
X and at j during time Y, where Y >= X / # reps ever at i during time X
PROB-NOTIME Ignores order of job moves Use original pi
Zij = raw number of reps who are at both branches I and j during career
Transition Pr from i to j: = (zij / # reps ever at i) != (zij / # reps ever at j) =transition Pr from j to i
Tribe Size
Pairs
Commonality of Job Sequence
Disclosure Scores
Homogenaity and Mobility
Discussion JOBS, PROB, PROB-TIME, PROB-
NOTIME create tribes with higher than average disclosure scores
PROB creates more cross zip code results
PROB-TIME has higher phi-squared than all others
PROB favors large firms
Discussion
JOBS and YEARS compute larger connected components
JOBS and PROB find same number of tribes but pick different groups as tribes
Conclusions
With no explicit knowledge we can discover: Job transitions Geography Career track
Conclusions
Needed: Ongoing process Multiple affiliations Arbitrary times Time is a paradox in domain
Thanks!
Time for: Questions Comments Smart Remarks