Huntingthetruﬄe:Predicting ... · Shulman et al., Predictability of popularity: Gaps between...

Hunting the truffle: Predictingsuccess in social systems by trackingkey individualsComputational Social Science SeminarZurich, 24.09.2019

Manuel Sebastian MarianiURPP Social Networks (University of Zurich)Institute of Fundamental and Frontier Sciences (UESTC, Chengdu)

| 31.01.2019

Hunting the truffle: Predicting success |

Individuals whose early adoptions predictsuccess 3

1. Are there individuals who repeatedly purchase inrecently-opened shops that later become successful? Whorepeatedly early adopt innovations that later succeed?

2. If there are, how to use them for success predictions?3. If there are, which socioeconomic, demographic, and

behavioral traits characterize them? Are they social hubs?

Research questions


Contribution 4Success prediction in social systems.■ Linking individual-level behavioral pa erns and theemergence of success.

■ Tracking and targeting the ”right” individuals can helpcompanies to predict and enhance the success of their newproducts’ diffusion.

Important nodes in social systems.■ Most of the literature on opinion leaders and influencers has

focused on the centrality.■ Differently from these studies, we search for individuals whose

adoptions predict success directly from the purchasetime-series, without looking into social network data.

■ We quantify the out-of-sample predictive power of differentgroups of individuals.


Outline 5

1 Success prediction in social systems: Background

2 Influencers: Do they have predictive power?

3 Discoverers of success

4 Quantifying the predictive power of different groups ofindividuals

5 Who are the discoverers of success?

6 Open challenges and take-home messages


Success prediction in social systems:Background

Hunting the truffle: Predicting success | 1. Success prediction in social systems: Background

Prediction in social systems 7There is increasing, interdisciplinary interest in prediction insocial systems.

Topics covered: Predictingonline cascades, election out-comes, scientific impact, policyimplications, and more.

……………

■ Increasing availability ofhigh-resolution data onsocioeconomic andinformation systems.(Lazer et al., Science, 2009;Gao et al, Physics Reports,2019).

■ Growing interest amongcomputational scientists intraditionally social scientifictopics, e.g., the evolution ofsocial networks, the diffusionof information, and thegeneration of inequality.(Hofman et al., Science, 2017).


Predicting success in social systems 8

■ Recent interdisciplinary efforts advanced our ability to predictsuccess in diverse social systems.

■ Success is viewed as a collective phenomenon that emergesas a result of the interactions and actions by the members ofthe social system.

■ It is typically measured in terms of popularity-based metrics.

Two main approaches to success prediction:■ Modeling the success dynamics.■ Machine learning.

Success prediction


Two main approaches in the literature on success 9

1. Modeling the success dynamics.■ We start by unveiling the basic mechanisms that govern the

dynamics of success.■ We design an aggregate dynamic model that includes the

observed mechanisms.■ We validate the model by using it to predict future success based

on early data.2. Machine learning.

■ We design a classification/regression model that includesmultiple features.

■ We aim to understand which combination of features leads to thebest predictive accuracy.

■ We a empt to interpret the best-predictive features.


The dynamics of success 10

In social systems, success can be o en modeled as a combi-nation of preferential a achment, fitness, and aging.

■ Preferential a achment. Future popularity increase isproportional to current popularity.

■ Fitness. The popularity you would get in the absence ofpreferential a achment.

■ Aging. A ractiveness decays over time.Medo et al., PRL 2011


Success and fitness/performance/talent are notequivalent 11

■ Found in a experimental study by Salganik et al., Science2006, and in empirical data of papers citations and WWW.

■ Present in success dynamics models.

■ Probability that a paper receives a new citation at time t :

Pi (t) ∼ ci (t) ηi f (t − ti ), (1)

where ci (t) denotes previous citations, ηi the fitness, f (t − ti ) an agingfunction (Medo et al., PRL 2011).

■ According to mean-field theory, the expected citation count of paper i is

ci (∞) ∼ exp (A ηi ), (2)

■ Small differences in fitness can lead to wide differences in success.

Relevance model


Predicting scientific impact and bestsellers sales 12

Success trajectories follow ”universal” pa erns. By properlyrescaling the citation trajectories of different papers, a singledynamic curve is obtained.

Wang et al., Science (2013); Yucesoy et al., EPJ Data Science (2018).


Early detection of milestone papers and patents 13

Network centralities can substantially outperform citationcounts in early detecting milestone papers and patents.

Mariani et al., Journal of Informetrics (2016), Technol. Forecast.Soc. Change (2019).


Predicting individual scientific impact 14

The timing of a researcher’s ”largest breakthrough” wasfound to be random.

Sinatra et al., Science (2016).Hunting the truffle: Predicting success | 1. Success prediction in social systems: Background

Predicting actors’ peak year 15

We can accurately predict if an actor has already achievedhis/her productivity peak or not.

Williams et al., Nature Communications (2019).


Machine-learning approach 16

Example: Predicting the future success of an online cascadebased on early activity.

■ Temporal features. Early-adoption speed.■ Structural features. The structure of the network around

the early adopters.■ Early adopters’ features. Specific information about the

early adopters (popularity, reputation, activity levels).■ Similarity features. How similar the early adopters are.

Difference between niche items vs. items of broad interest.

Features


Predicting online cascades based on earlyadoptions 17

The speed of early adoption is the best predictor of con-tent popularity, but properties of the early adopters mightbe predictive as well.

Chen et al., WWW (2014), Shulman et al., AAAI Conference onWeb and Social Media (2016).


Predicting online cascades a priori 18

Content that evokes high-arousal positive (awe) or negativeemotions (anger, anxiety) is more viral.

Berger et al., Journal of Marketing Research, 2011.


Researchers degree of freedom 19

■ Choose the data and the items. For example, online cascades.■ Define a success variable. Total number of reshares.

Hofman, Sharma and Wa s, Science 355, 486-488 (2017).


More degrees of freedom 20Problem choice.■ Ex-ante predictions. We a empt to only use information

available before each store is opened.■ Early detection. We are allowed to ”peek” into early activity

data on the store, e.g., on the early purchases made in the store.■ The general expectation is that predictive performance is larger

in the la er scenario.Which predictive model?■ Simple models might have a clear interpretation.■ Complex models with many features might lead to be er

performance, but reduced interpretability.■ There can be a tradeoff between accuracy andinterpretability.


Early detection: Peeking-based strategies 21

Given a set of items and data on their early adoptions, which arethe most likely ones to become successful?

Problem

■ One question, a broad range of formulations.■ How do we define the early-adoption window?■ How much activity can we look into?■ How do we define successful items?


Different studies, different choices 22

Shulman et al., Predictability of popularity: Gaps betweenprediction and understanding, 2016

Various choices in the literature, conclusions are not alwaysconsistent.


Predicting success in social systems: choices 23

■ Classification vs. regression.■ Evaluation metrics?■ Ex-ante predictions (explanation) vs. early detection.■ Accuracy vs interpretability.

Start from a question of interest, and then design the predictionproblem (and methods adopted) to answer the question.Hofman, Sharma and Wa s, Science 355, 486-488 (2017).

Hybrid approach


Predicting success in social systems: Maininsights 24

■ ”Success” can be quantified and predicted in diverse areas ofhuman activity: Online content (cascades), Science (papers,researchers), Art (artists), Show business (actors), Bestsellers(authors, books).

■ Two main approaches to the success prediction problem:■ Models of success dynamics.■ Machine learning.

■ Through success dynamics models, researchers have founddynamical pa erns that generalize across domains of humanactivities.

■ Machine learning approaches are typically used to predict thefuture popularity of online content.

■ Many researchers degrees of freedom need to be fixed, andthe choice can affect substantially the results.


Influencers: Do they have predictivepower?

Hunting the truffle: Predicting success | 2. Influencers: Do they have predictive power?

Linking individual-level behavioral pa erns andsuccess prediction 26

■ Usually, research on success and innovation diffusion studiesstudy individual-level behavioral pa erns andsuccess/diffusion dynamics in isolation.

■ A long-standing assumption is that some individuals – referredto as influencers – can have a disproportionate influence onspreading processes.

The main a empts to link individuals and success predictionshave been made for the influencers in online systems.

■ Can the influencers accelerate a diffusion process?■ Are the influencers adoptions predictive of diffusionsuccess?

Linking individuals and success


Influencers: three dimensions 27

■ Who one is. Personality traits, socio-demographicbackgrounds, and lifestyles

■ What one knows. Competence, such as her knowledge,expertise, or ability to provide information or guidance onparticular issues.

■ Whom one knows. The structural position of the person in anetwork.

Muller and Peres, International Journal of Research in Marketing(2019), 36 (1): 3-19


The role of influencers: triggering largercascades 28

According to diffusion models, influencers can trigger largespreading processes when targeted.

F. Iannelli, M. S. Mariani, and I. M. Sokolov. Physical Review E 98.6 (2018): 062302.Hunting the truffle: Predicting success | 2. Influencers: Do they have predictive power?

The role of influencers: rapidly dismantling asocial network 29

F. Morone, and H. A. Makse. Nature 524.7563 (2015): 65.Hunting the truffle: Predicting success | 2. Influencers: Do they have predictive power?

The role of influencers: accelerating empiricaldiffusion 30

The influencers’ adoptions are predictive of success

■ Goldenberg et al. (2009) found that in an online social network,a small sample of social hubs offers accurate early-stagesuccess predictions.

■ Chen et al. (2014) found that in predictive models with manydifferent features, the early adoptions by central individuals areamong the most significant predictors of success.


The role of influencers: inconsistent predictivesignal 31

When using network-related traits for prediction, results donot generalize across platforms.

■ Weng et al. (2013) found that the number of infectedcommunities by a piece of content is a more importantpredictor than the early adopters’ centrality.

■ Shulman et al. (2016) found that features based on the networkstructure around the early adopters are not consistentlypredictive of success.


The role of influencers: bo om-line 32

■ According to diffusion models, influencers can trigger largespreading processes when targeted.

■ Removing the influencers from a network can rapidly dismantlea network.

■ If our goal is to predict success, existing studes suggest thatthe predictive signal from the influencers is inconsistent.

Differently from most existing literature, we search for rele-vant individuals for success prediction directly from trans-action data, without looking into social network data.


Discoverers of success

Hunting the truffle: Predicting success | 3. Discoverers of success

Defining the discoverers 34

Individuals who are repeatedly among the first ones to adoptinnovations that later gain many adopters.

Discoverers

■ We will identify them from the purchase time-series through ageneral procedure.

1. We count the number of discoveries, d∗i per individual.

2. We introduce a null model: every one has the same likelihood tomake a discovery.

3. Is d∗i surprising?

The identification procedure does not require social net-work data.


A nationwide socioeconomic system 35We analyze two large, nationwide datasets:■ Credit-card records (CCRs) from a large bank over a

three-year temporal window (from June 2015 to May 2018).■ Call data records (CDRs) from a large mobile phone operator

over a one-year temporal window (2016).

We can partially match the two datasets:■ The bank customers had to provide a mobile phone number

to the bank.■ Among the 251, 405 telco customers, 146, 762 (58.4%) are

also bank customers.■ Among the 1, 417, 937 bank customers, 146, 762 (10.4%) are

also telco customers.

Matching


Networks 36

We can build two networks:■ CCRs → Individual-shop transaction network (18 months):

Who bought where at which time.We consider three networks corresponding to three shop categories:

■ Eating places. (Restaurants, bars, drinking places, etc.)■ Clothing stores. (Children’s wear stores, Shoe stores, etc.)■ Food stores. (Grocery Stores, Supermarkets, Candy stores, etc.)

■ CDRs → Individual-individual communication network (12one-month snapshots of the undirected, unweighted network):Who is connected with whom.


Defining the discoverers I: Counting discoveries 37

■ The discoverers are defined in terms of discoveries.■ Popular shops. A shop α is considered as popular if, compared

against shops of the same category launched in the samemonth, it is among the top-z% by final number of visitors, v .

■ Discovery. Individual i discovers shop α if i purchases in shopα no later than ∆ days a er α is opened, and α turns out to bea popular shop.

■ ∆ and z are parameters of the method(in the following, z% = 10% and ∆ = 90 dd ).

■ For each individual i , we count its number of discoveries, d∗i .

■ Is d∗i surprising?


Defining the discoverers II: Introducing a nullmodel 38

We define a statistical null model where each individual isequally likely to make a discovery.

■ L is the total number of links, D the total number of discoveries, ki is thenumber of shops visited by i .

■ There are L marbles inside a urn, D of which are discoveries.

■ Individual i extracts ki marbles without replacement.

■ We expect di = p ki discoveries, where p = D/L.

■ The number of discoveries by individual i follows the hypergeometricdistribution.

The null model


Defining the discoverers III: Is d∗i surprising? 39

The discoverers’ number of discoveries is so high that it cannotbe explained by chance (i.e., by the null model).

Observations vs. expectation

■ We define the statistical surprisal, Si(d∗i |ki), associated with

the observed number of discoveries d∗i .

Si(d∗i |ki) = − log (P(di ≥ d∗

i ))

■ P(di ≥ d∗i ) represents the probability that the individual would

have done under the null model.■ Low P → It is unlikely that individual i achieved di discoveries by

chance → High S .


How it works 40

……………

■ Individual DB9594 ...(”Mark”) visited kMark = 263eating places, and (s)hecollected d∗

Mark = 44discoveries.

■ Mark’s expected number ofdiscoveries under the nullmodel was 13.74.

■ The probability that Markachieved 44 discoveries ormore under the null modelwasP(dMark ≥ d∗

Mark) ∼ 10−11 .

■ Mark’s surprisal issMark = 25.28.


Comparing against the null model 41

……………

■ We resample the individuals’number of discoveries bydrawing from the distributionunder the null hypothesis(bootstrap).

■ The surprisal valuesachieved by the topindividuals is significantlylarger than that achieved bythe corresponding topindividuals obtained with thebootstrap.


Surprisal vs activity 42

……………

■ The surprisal metric ispositively yet weaklycorrelated with the numberof visited stores perindividuals.

■ Purchasing in many differentstores does not make younecessarily a discoverer.


Quantifying the predictive power ofdifferent groups of individuals

Hunting the truffle: Predicting success | 4. Quantifying the predictive power of different groups of individuals

Framing a predictive problem 44

Are there specific groups of individuals whose pres-ence/absence among the earliest customers of a shop isinformative about the shops’ future success?

Our predictive question

■ Our intuition: If the discoverers truly have the ”habit” to earlyvisit successful shops, their presence (absence) among a shop’searly visitors might be a signal that the shop will (not) bepopular.


Framing a predictive problem 45

■ Identification period (18 months). We identify four classes ofrelevant individuals:

■ From the CCRs: Discoverers, Store explorers.■ From the CDRs: Social Hubs / Influencers [Goldenberg et al.,

2009; Morone and Makse, 2015], Explorers by mobility traits.

■ Validation period (12 months). We a empt to predict whethera previously-unseen shop will be successful, given thepresence/absence of previously-identified individuals of agiven class among its earliest visitors.

■ For each group I of individuals, we build a classifier thatclassifies a shop as successful if and only if an individual in I isfound among the earliest V visitors.

■ The ground-truth group of successful shops comprises the shopsthat are ranked in the top-10% by total number of visitors, amongshops of the same category and launched in the same month.


Evaluation metrics 46

For each group I , we build its classifier and we measure:■ Precision / Success rate. Success rate of the stores that

received an early purchase by by an individual in I .■ We compare the success rate against the baseline given by a

random classifier, obtaining the success-rate fold increase.■ Recall. Fraction of successful stores that received an early visit

by an individual in I .■ Positive likelihood ratio. Probability of a store that received

an early visit by an individual in I being successful divided bythe probability of a store that did not receive a visit beingsuccessful.

■ Ma hews’ correlation.


Predictive performance: Precision 47

■ The discoverers have the largest predictive power.

■ Explorers by radius of gyration are highly competitive for clothing stores.

■ The discoverers’ predictive power generalizes across categories, whereasthe same does not hold for the other groups of individuals.

■ The social hubs’ predictive power is inconsistent.

Insights


Predictive performance: Ma hews correlation 48

■ The discoverers have the largest predictive power.

■ Explorers by radius of gyration are highly competitive for clothing stores.

■ The discoverers’ predictive power generalizes across categories, whereasthe same does not hold for the other groups of individuals.

■ The social hubs’ predictive power is inconsistent.

Insights


Combining groups of individuals 49

For each pair (I1, I2) of groups of individuals, we build a clas-sifier that classifies a shop as successful if and only if both anindividual in I and an individual in I2 are found among the ear-liest 30 visitors.

2-dimensional classifiers

■ Can the presence of early customers from different groups ofselected individuals be a stronger predictor of success?

■ Note: We are not a empting to maximize predictiveperformance, but to understand whether the co-presence ofpairs of groups of individuals is a be er predictor of success.


Combining groups of individuals 50

■ Combining the discoverers with other groups of individuals can improvethe predictive power.

■ Combining the discoverers with the groups of social hubs does notimprove the preditive power for clothing stores, where the hubs wereunderperforming when considered individually.

Insights


Who are the discoverers of success?

Hunting the truffle: Predicting success | 5. Who are the discoverers of success?

Who are the discoverers? 52

Once we have identified the discoverers and quantified theirpredictive performance, it is inevitable to investigate which traitscharacterize them.■ What is their typical age and gender?■ How socially well-connected are they?■ Do they explore many different stores?■ Do they spend/travel a lot?

Our unique combination of bank and telco data allows us toanswer these questions.


Socio-economic and demographic traits 53


Socio-economic and demographic traits 54

■ The discoverers and store explorers tend to be above themedian in network centrality, number of visited stores, mobilitydiversity.

■ The discoverers are not outstanding in any of these traits,though.

■ The discoverers’ demographic traits are not consistent acrosscategories.

■ The discoverers can have different traits than the storeexplorers (e.g., food stores).

■ The social hubs have above-median expenditures and numberof visited stores.


Open challenges and take-homemessages

Hunting the truffle: Predicting success | 6. Open challenges and take-home messages

Summary of results 56

We analyzed a three-year CCR from an entire nation, andinvestigated whether there exist individuals whose early visits to ashop reveal increased odds of success for the visited shop.■ Identification. The discoverers repeatedly early visit shops

that later become successful. Their behavior cannot beexplained by chance.

■ Predictive performance. The discoverers’ consistentperformance cannot be achieved by any other group oftop-individuals. The social hubs’ performance is weaker andinconsistent.

■ Characterizazion. The discoverers exhibit consistentsocio-economic traits (above- median centrality, expenditures,mobility), but they are not outstanding in any of them.


Limitations: Social network data 57

■ We implicitly assumed that mobile-phone communication dataprovide us with sufficiently good estimates of the individuals’centrality in society.

■ On the other hand, it can only provide us with a partialrepresentation of the actual communication flows in society.

■ Obtaining a complete representation of the socialcommunication pa erns across an entire nation is not feasible.

■ At the same time, our study mimics a real-world problem wherean organization has only access to incomplete socialinformation about its customers.

■ We are a empting to generalize our results to onlinecommunities where the complete social network is available.


Limitations: Predictive performance 58

■ We aimed to uncover the predictive power hidden in theactions by specific groups of selected individuals.

■ This allowed us to address specific research questions.■ At the same time, it might be limiting if one’s goal is purely

predictive performance.■ To that end, machine learning algorithms that include multiple

features might achieve a be er performance.


Beyond the discoverers: Anticipating rising anddeclining popularity trends 59

■ Behaving as a discoverer is only one possible behavioralpa ern.

■ We are currently searching for individuals with other interestingbehavioral traits.

■ In particular, individuals who consistently■ Anticipate rising popularity trends.■ Anticipate declining popularity trends.

■ We can find these kinds of individuals (with li le overlap withthe discoverers) that have out-of-sample predictive power.

■ We are developing an agent-based model to model theheterogeneous adoption pa erns of different groups ofidentified individuals.


Be er understanding the discoverers 60The discoverers’ number of contacts tend to be above themedian, but the social mechanisms behind their emergence needto be clarified.

■ Are they able to effectively influence their peers, acting asinfluencers?

■ Are they innovative individuals who adopt before all theirpeers?

■ Are they susceptible individuals who are rapid in followingtheir peers’ actions?

■ Can we show that they have a ”taste” that well resonate withtheir community or the population?

Research questions


References: Background 61

▶ J. Goldenberg, et al. ”The role of hubs in the adoption process.”Journal of Marketing 73.2 (2009): 1-13.

▶ B. Shulman, A. Sharma, and D. Cosley. ”Predictability ofpopularity: Gaps between prediction and understanding.” TenthInternational AAAI Conference on Web and Social Media. 2016.

▶ J. M. Hofman, A. Sharma, and D. J. Wa s. ”Prediction andexplanation in social systems.” Science 355.6324 (2017): 486-488.


References: Early detection of seminal paperand patents 62

▶ M. S. Mariani, M. Medo, and Y.-C. Zhang. ”Identification ofmilestone papers through time-balanced network centrality.”Journal of Informetrics 10.4 (2016): 1207-1223.

▶ M. S. Mariani, M. Medo, and F. Lafond. ”Early identification ofimportant patents: Design and validation of citation networkmetrics.” Technological Forecasting and Social Change 146(2019): 644-654.


References: Discoverers 63

▶ M. Medo, et al. ”Identification and impact of discoverers inonline social systems.” Scientific Reports 6 (2016): 34218.

▶ M. S. Mariani, Y. Gimenez, J. Brea, M. Minnoni, R. Algesheimer,C. J. Tessone. ”Predicting success in socioeconomic systems bytracking key individuals.” Working paper.


Thanks to my collaborators 64

URPP Social NetworksRene AlgesheimerFrancesco De CollibusClaudio Juan Tessone

……………GrandataJorge BreaYanina GimenezMartin Minnoni


Take-home messages 65

1. We framed the problem of predicting the future success of astore based on its early customers. We are able to comparethe predictive power of different groups of individuals.

2. The discoverers of success differ from the widely-studiedinfluencers / social hubs as they are not necessarily central inthe social network, but their monetary transactions consistentlyreveal future success for the recipient of the transaction.

3. Once identified, agent-level pa erns of adoptions/purchasescan be leveraged to predict a collective phenomenon(success).


Manuel Sebastian Mariani

URPP Social Networks (University of Zurich)Institute of Fundamental and Frontier Sciences (UESTC, Chengdu)

B [email protected]

mh ps://www.business.uzh.ch/en/research/professorships/networkscience/people/Dr-Manuel-Mariani.html

Huntingthetruﬄe:Predicting ... · Shulman et al., Predictability of popularity: Gaps between...

Documents

Transcript of Huntingthetruﬄe:Predicting ... · Shulman et al., Predictability of popularity: Gaps between...