Social_Network_Analysis_Smoking

19
Analysis of Social Network Data: Estimating Peer Effects on Smoking Huizi Xu, Mar. 2014 (Course Project for Stat-695)

Transcript of Social_Network_Analysis_Smoking

Page 1: Social_Network_Analysis_Smoking

Analysis of Social Network Data:

Estimating Peer Effects on Smoking

Huizi Xu, Mar. 2014

(Course Project for Stat-695)

Page 2: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Outline

Introduction

Exploring Network Features

How to model peer influences?

0 / 12

Page 3: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Introduction

• The data were collected through questionnaire surveys,targeted at 4094 students from six middle schools.

• The questions in the survey fell into these categories:• Friends nominations (important information forconstructing the social networks)

• Demographics, economics status, academic status• Smoking status / attitudes / knowledges

• Goal of this presentation:• Exploring the data, exploring the network structure• How to model peer influences on the smokingbehavior

1 / 12

Page 4: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

2010 Adolescent Social Networks and Tobacco Use Survey

Figure 1: Word Cloud For The Questionnaire

Note: The survey happened in China mainland, where middleschool students of the same class spent most school time withinthe same classroom. The class became a natural cluster for thenetwork, and connections across classes are minimal.

2 / 12

Page 5: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Outline

Introduction

Exploring Network Features

How to model peer influences?

2 / 12

Page 6: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

The Network of FriendshipsFigure 2: Class 1 of School 1: measure of happinesslevel of happiness: 1 < 2 < 3 < 4 < 5 (The white indicate ‘NA’.)

3 / 12

Page 7: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Network FeaturesSubjects nominate up to 10 friends in the survey.

Figure 3 Figure 4

Table 1: Network feature by schoolSchool 1 2 3 4 5 6Size 710 641 945 783 408 607Edge count 4441 4578 7170 6487 2555 4398Dyad count 503390 410240 892080 612306 166056 367842Edgecount/Size 6.3 7.1 7.6 8.3 6.3 7.2

An average student has 7 friends (overall edge-size-ratio=7.24).4 / 12

Page 8: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Animation! Try Adobe Reader!

Black node for smoker, white node for non-smoker

Smoking status based on both self report and friend report,within 8 classes (each graph for a class).

5 / 12

Page 9: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Outline

Introduction

Exploring Network Features

How to model peer influences?

5 / 12

Page 10: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

A Weight Matrix to Capture theNetwork Structure

Table 2: A minimum example of the row-normalized adjacencymatrix

a b c d e f g h ia 0 0.5 0.5 0 0 0 0 0 0b 0.5 0 0.5 0 0 0 0 0 0c 0.5 0.5 0 0 0 0 0 0 0d 0 0 0 0 0 0 0 0 0e 0 0 0 0 0 0 0 0 0f 0 0 0 0 0 0 0 0 1g 0 0 0 0 0 0 0 0 0h 0 0 0 0 0 0 1 0 0i 0 0 0 0 0 1 0 0 0

6 / 12

Page 11: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Why Use Spatial Models?• Highlight: modeling the correlation structure +

controlling for common contexture effects

• Via a weighted linear regression, the mean effects frommultiple peers can hardly be identified separately fromother social effects (e.g. the social contexture that iscommon to the whole group) (Manski1993 ).

• Spatial autoregressive models are more effective in theidentification of network effects from multiple peers(Anselin1988; Lee2007 ).

• Rich literatures in spatial models on areal data.• Spatial autoregressive (SAR) model• Conditional autoregressive (CAR) model

7 / 12

Page 12: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Think of the network as a spatial random process:

The challenge is, while for real spatial data coordinates areavailable to locate the points, for network data the absoluteposition can hardly be defined.

8 / 12

Page 13: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

A Brainstorm

9 / 12

Page 14: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Smoking Status as the Outcome

Construct a big weight matrix including all the observations,and conduct a Monran’s I test to decide if the smoking statushas spatial autocorrelation in it.

• Moran’s I statistic standard deviate = 50.4622,p-value < 2.2e-16

• Moran’s I statistic = 0.3983, Expectation = -2.443e-4,Variance = 6.238e-5

10 / 12

Page 15: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Spatial Autoregressive (SAR) Model

Y = λWY +Xβ + Igroupα + u (1)

u = ρWu + ε (2)

Y : outcome variable (attitudes toward smoking),W : row-normalized adjacency matrix (spatial weight matrix).

Represent it to emphasize the “autocorrelation”:

(I − λW )Y = (I − λ′W )Xβ + Igroupα + ε′ (3)

To be specific, (I − λW )attitude ∼(I −W )[β1gender + β2family + β3weight + ...] + Igroupα + εR package: spdep, spautolm(). Ref: Lee2010

11 / 12

Page 16: Social_Network_Analysis_Smoking

Introduction Exploring Network Features How to model peer influences?

Fitted SAR Model

• Existence of peereffect, indicated byfitted spatialcoefficient of 0.55(p < 0.001)

• More work needs tobe done regardingmodel selection andmodel validation

12 / 12

Page 17: Social_Network_Analysis_Smoking

Backup Slides

Thank You!

,

0 / 0

Page 18: Social_Network_Analysis_Smoking

Backup Slides

Outline

Backup Slides

0 / 0

Page 19: Social_Network_Analysis_Smoking

Backup Slides

Just For Fun:How did those students pick their friends

Question C16 in thesurvey questionnaire.Subjects choose multipleof the 8 criteria.

The cluster dendrogram(on the left) is based onEuclidean distance.

Table: Count of subjects who choose each option

0 / 0