Download - Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Transcript
Page 1: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Speaker: Enrico Palumbo, Data Scientist, Istituto Superiore Mario Boella

1

Page 2: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Innovation DiffusionPioneering work of Everett Rogers in 1962

2http://bryanmmathers.com/

Page 3: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Average Lifespan in S&P 500Average lifespan in S&P 500 has shrunk from 61 years in 1958 to 18 years in 2012

3

Page 4: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

S&P 500 turnoverSample companies that have entered and exited since 2002

4

Page 5: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Gartner’s Hype Cycle 2016

5

Predicting evolution of IT trends is paramount

Page 6: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Research questions

1. Are IT trends synchronous worldwide?

2. Is a good time for investing in the USA necessarily a good time for investing in Italy?

3. Is there a geographic diffusion process of IT trends?

4. Is there a delay between IT trends?

6

Page 7: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

A (Big) Data-driven approach

Datafication: digitalization is turning everything into data Google queries: enormous quantity of data (Big Data) reflecting people’s interestsGoogle Trend: measure of Google queries in a specific location and in a specific temporal interval

7

Study the geographic diffusion of IT trends through the analysis of Google Trend data

Page 8: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Google Trend Data

8

Page 9: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

From Google queries to Google Trend

1. Queries with semantically related keywords are aggregated in topic i2. Number of queries q(i,j,t) on topic i, in place j and at time t 3. Frequency of queries f(i,j,t) on topic i, in place j and at time t4. Rescaled frequency of queries: max f(i,j,t) = 100

9

Page 10: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Data Analysis

10

1. Raw Data

2. Trend

2a. Representation 2b. Peaks 2d. Delays

Smoothing

FittingMaxPlot

Country Time of peak

Japan 15-12-2013

2c. Clusters

Clustering

Page 11: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

SmoothingAdditive model: x(t) = m(t) + s(t) + e(t)

11

x(t)

m(t)

s(t)

e(t)

Trend

Raw data

Page 12: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Moving AverageMoving Average with annual resolution, to reduce random and seasonal effects

12

Page 13: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Data Analysis

13

1. Raw Data

2. Trend

2a. Representation 2b. Peaks 2d. Delays

Smoothing

FittingMaxPlot

Country Time of peak

Japan 15-12-2013

2c. Clusters

Clustering

Page 14: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

RepresentationDepiction of the Big Data trend from 2008 to 2015 for the eleven countries

14

Page 15: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Data Analysis

15

1. Raw Data

2. Trend

2a. Representation 2b. Peaks 2d. Delays

Smoothing

FittingMaxPlot

Country Time of peak

Japan 15-12-2013

2c. Clusters

Clustering

Page 16: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

PeaksCountry Time of peak Delay w.r.t. JAP [weeks]

Japan 15-12-2013 0

United States 21-12-2014 53

India 04-01-2015 55

Canada 14-06-2015 78

Great Britain 21-06-2015 79

Brazil 21-06-2015 79

Germany 21-06-2015 79

France 21-06-2015 79

Italy 21-06-2015 79

China 21-06-2015 79 16

Page 17: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Data Analysis

17

1. Raw Data

2. Trend

2a. Representation 2b. Peaks 2d. Delays

Smoothing

FittingMaxPlot

Country Time of peak

Japan 15-12-2013

2c. Clusters

Clustering

Page 18: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Clustering: algorithms

● Grouping similar curves together, taking into account global behavior● Time series = multi-dimensional vector (x(t0), x(t1) … x(tN))● K-means: need to set number of clusters and initial centroids upfront● Dbscan: need to set distance ε and min number of points upfront

Difficult to find a meaningful ε for time series

K-means with 3 clusters, initial centroids = (USA, Italy, China)18

Page 19: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Clustering: results

Grouping similar curves together, K-means algorithm: 3 groups

19

Page 20: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Data Analysis

20

1. Raw Data

2. Trend

2a. Representation 2b. Peaks 2d. Delays

Smoothing

FittingMaxPlot

Country Time of peak

Japan 15-12-2013

2c. Clusters

Clustering

Page 21: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Delays: model

● Model: Verhulst equation, model of growth with saturation, popular in innovation diffusion

EQUILIBRIASaturationP -> K: growth rate tend to 0

21

P-> 0: exponential growth with rate r

Page 22: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Delays: logistic Function

● Solution: with P(x0) = K/2, we obtain the logistic function

Initial growth rate

Carrying capacity

Mid-time

22

Page 23: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Delays: fitting

Method: non-linear least squaresModel: logistic function

Mid-time

23

Page 24: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Delays: formula

Delay: x0(Country 1) - x0(Country 2)

24

Page 25: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Delay with respect to JapanRank Country Delay w.r.t. JAP [weeks]

1 Japan 0

2 United States 30.4 ± 0.8

3 Great Britain 39.1 ± 0.9

4 Brazil 42.9 ± 1.2

5 India 50.1 ± 1.3

6 Germany 59.5 ± 0.9

7 Canada 64.9 ± 1.3

8 France 83.3 ± 0.9

9 Italy 116.3 ± 7.7

10 China 216.2 ± 5.9 25

Page 26: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Heatmap delay

CHI

JAP

CAN

FRA

ITA

DEU

IND

BRA

GBR

USA

CHIJAPUSA GBR BRA IND DEU ITA FRA CAN

200

150

100

50

0

-50

-100

-150

-200

26

Page 27: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Map of delay with respect to Japan

27

Page 28: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Other parametersCountry x0 K 1/r

Japan 188.9 土 0.8 48.0 土 0.3 22.8 土 0.7

United States 219.3 土 0.3 84.4 土 0.2 33.1 土 0.2

Great Britain 227.9 土 0.4 77.4 土 0.3 31.4 土 0.3

Brazil 231.7 土 0.9 42.8 土 0.4 22.0 土 0.8

India 239 土 1 84.1 土 0.7 36.5 土 0.7

Germany 248.3 土 0.5 77.7 土 0.4 33.5 土 0.4

Canada 254 土 1 80.4 土 0.7 41.7 土 0.6

France 272.1 土 0.4 70.5 土 0.3 38.0 土 0.3

Italy 305 土 8 84 土 4 66 土 3

China 405 土 6 221 土 20 44.3 土 0.7 28

Page 29: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Google search engine market share 2015Country Google market share

Japan 57%

United States 72%

Great Britain 90%

Brazil 95%

India 96%

Germany 94%

Canada 87%

France 92%

Italy 95%

China 9%*

http://returnonnow.com/internet-marketing-resources/2015-search-engine-market-share-by-country/ *http://www.chinainternetwatch.com/17415/search-engine-2012-2018e/

29

Page 30: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Correlation?

CH

JAUS

UKBR

IN

CA

FR

IT

GE

H0 = independenceKendall tau = -0.135P value = 0.653No correlation!

30

Page 31: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Summary: interest in Big Data

1. Clustering: USA, CAN, GBR, IND, DEU

2. Peak: JAP, USA, IND, CAN

3. Mid-time: JAP, USA, GBR, BRA, IND

USA, India, Canada, Great Britain, Japan31

Page 32: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Conclusions

1. Geography matters in studying the diffusion of trends, delays can be up to 4 years (Japan - China)

2. Big Data is still a growing trend in most countries under analysis

3. English speaking countries and Japan appear to be more responsive to the Big Data trend

32

Page 33: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Future work

1. Further validation extending the study to other trends and other countries

2. Use Google Trend Data to predict economic indicators (e.g. market share of the technology)

3. Model the diffusion process (e.g. complex network model)

33

Page 34: Innovation Diffusion: a (Big) Data-driven approach to the study of the geographic spreading of IT trends

Thank you!

Enrico Palumbo, [email protected], www.ismb.it, @enricopalumbo91

Innovation Development area, Istituto Superiore Mario Boella (ISMB)34