Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users...

25
Semi-supervised Geolocation via Graph Convolutional Networks Afshin Rahimi, Trevor Cohn and Tim Baldwin July 16, 2018 1 / 25

Transcript of Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users...

Page 1: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Semi-supervised Geolocation via GraphConvolutional Networks

Afshin Rahimi, Trevor Cohn and Tim Baldwin

July 16, 2018

1 / 25

Page 2: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Location Lost in Translation

SocietySocial Media

impacts

impacts

2 / 25

Page 3: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Applications: Public Health Monitoring

Allergy Rates (Paul and Dredze, 2011)

3 / 25

Page 4: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Applications: Emergency Situation Awareness: Bushfires, Floods andEarthquakes

Fight bushfire with #fire: Alert hospital before anybody calls(Cameron et al., 2012)

4 / 25

Page 5: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Location Location Location

64%Profile

1%GPS

35%Nothing

0 50 100%users

Profile field is noisy (Hecht et. al, 2011), GPS data is scarce(Hecht and Stephens, 2014), and biased toward younger urban

users (Pavalanathan and Eisenstein, 2015)

5 / 25

Page 6: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Geolocation: The three Ls

Ain’t this place a geographical oddity; two weeks away from everywhere!

Link

Location

LanguageGeolocation

User geolocation is the task of identifying the “home” location of asocial media user using contextual information such asgeographical variation in language use and in social interactions.

6 / 25

Page 7: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Huge amounts of unlabelled data, little labelled data

Multiple views of Data: Text, Network

7 / 25

Page 8: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Previous Work (not exhaustive)

Text-based Supervised Classification

Backstrom et al. (2008)

Cheng et al. (2010)

Wing and Baldridge (2011, 2014)

Network-based Semi-supervised Regression

Backstrom et al. (2010)

Davis Jr et al. (2011)

Jurgens (2013)

No Text No NetworkJoint/Hybrid Text+Network

Rahimi et al. (2015)

Do et al. (2017)

Miura et al. (2017)Don’t utilise unlabelled text data

Our work: Text+Network Semi-supervised Geolocation8 / 25

Page 9: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Twitter Geolocation Datasets

GeoText TwitterUS TwitterWorld103

104

105

106

107

108

9k

420k

1.4m

370k

38m

12muserstweets

9 / 25

Page 10: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Discretisation of Labels

Cluster continuous lat/lon: cluster ids are labels.

Use the median training point of the predicted region as thefinal continuous prediction.

Evaluate using Mean and Median errors between the knownand the predicted coordinates.

10 / 25

Page 11: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Text and Network Views of Data

@-mention GraphStevenTrevor

MarkTim Karin

Normalised Adj. Matrix: A

0.5 0 0 0 0.5

0 0.5 0 0 0.5

0 0 1 0 0

0 0 0 0.5 0.5

0.25 0.25 0 0.25 0.25

Karin

Mark

Steven

Tim

Trevor

Kar

in

Mar

k

Stev

en

Tim

Tre

vor

Karin

Mark

Steven

Tim

Trevor

...

...

...

...

….

Mel

bour

ne

#ACL

2018

Text BoW: X

Two users are connected if they have a common @-mention.

11 / 25

Page 12: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Baseline 1: FeatConcat

Concatenate A and X , and feed them to a DNN:Y = f ([X ,A])

The dimensions of A, and consequently the number ofparameters grow with the number of samples.

12 / 25

Page 13: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Baseline 2: DCCA

maximally correlated

FC sigmoid

FC softmax

X : text BoW A: Neighbours

predicted location: y

FC linear

Unsupervised DCCA Supervised Geolocation

FC ReLU

CCA loss backprop

Learn a shared representation using Deep Canonical CorrelationAnalysis (Andrew et al., 2013):

ρ = corr(f1(X ), f2(A)) = cov(f1(X ),f2(A))√var(f1(X )).var(f2(A))

Y = f ([f1(X ), f 2(A)])

13 / 25

Page 14: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Proposed Model: GCN

Highway GCN:

Highway GCN: ,

Output GCN:

X

A

A

Atanh

tanh

softmax

H0

H1

H l−1

H l

predicted location: y

W l−1, bl−1, W l−1h , bl−1

h

W 1, b1, W 1h , b

1h

W l , bl

GCN Layer: H(l+1) = ReLU(AH(l)W (l) + b

)Adding more layers results in expanded neighbourhood smoothing:control with highway gates W l

h, blh

14 / 25

Page 15: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Highway GCN: Control Neighbourhood Smoothing

1 2 3 4 5 6 7 8 9 10

50

100

200

400

800

#layers

med

ian

erro

r(k

m)

-highway+highway

layer gates: T (~hl) = σ(W l

h~hl + blh

)layer output: ~hl+1 = ~hl+1 ◦ T (~hl) + ~hl ◦ (1− T (~hl))︸ ︷︷ ︸

weighted sum of layer input and output

15 / 25

Page 16: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Neighbourhood Smoothing

0.5 0 0 0 0.5

0 0.5 0 0 0.5

0 0 1 0 0

0 0 0 0.5 0.5

0.25 0.25 0 0.25 0.25

Karin

Mark

Steven

Tim

Trevor

Kar

in

Mar

k

Stev

en

Tim

Tre

vor

...

...

...

...

….

Text BoW: X

Karin

Mark

Steven

Tim

Trevor

Normalised Adj. Matrix: A

Smoothing immediate neighbourhood: A · Xsmoothing expanded neighbourhood: A · A · X

16 / 25

Page 17: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Sample Representation using t-SNE

FeatConcat [X, A]

1 GCN A · X

DCCA

2 GCN A · A · X17 / 25

Page 18: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Test Results: Median Error

GeoText TwitterUS TwitterWorld0

200

400

600

800

1000M

edia

nE

rror

inkm

TextSocialHybrid

18 / 25

Page 19: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Test Results: Median Error

GeoText TwitterUS TwitterWorld0

200

400

600

800

1000M

edia

nE

rror

inkm

TextSocialHybridJoint DCCAJoint FeatConcatJoint GCN

19 / 25

Page 20: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Test Results: Median Error

GeoText TwitterUS TwitterWorld0

200

400

600

800

1000M

edia

nE

rror

inkm

TextSocialHybridJoint DCCAJoint FeatConcatJoint GCNJoint Miura et. al (2017)Joint Do et. al (2017)

20 / 25

Page 21: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Top Features Learnt from Unlabelled Data (1% Supervision)

Seattle, WA Austin, TX Jacksonville, FL Columbus, OH

#goseahawks stubb unf laffayettesmock gsd ribault #weareohiotraffuck #meatsweats wahoowa #arcgisferran lanterna wjct #slamminpromissory pupper fscj #ouhcchowdown effaced floridian #cowckrib #austin #jacksonville mommyhood#uwhuskies lmfbo #mer beering

Top terms for a few regions detected by GCN using only 1% ofTwitter-US for supervision. The terms that existed in labelleddata are removed.

21 / 25

Page 22: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Dev. Results: How much labelled data do we really have?

6040201052140

400

800

1,200

labelled data (%samples)

med

ian

erro

r(k

m)

GCNDCCA

FeatConcat

GeoText

10050201052150

500

1,000

1,500

labelled data (%samples)

med

ianerror(km)

GCN

DCCA

FeatConcat

Twitter-World

10050201052140

500

1,000

1,500

2,000

labelled data (%samples)

med

ian

erro

r(k

m)

GCNDCCA

FeatConcat

Twitter-US

GeoText TwitterUS TwitterWorld0

500

1000

1500

2000

2500

Med

ian

Err

orin

km

Joint DCCA 1%Joint FeatConcat 1%Joint GCN 1%

Test results with 1% labelled data

22 / 25

Page 23: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Confusion Matrix Between True Location and Predicted Location

ME

MA RI NH VT CT NY NJ DE MD PA VA NC SC WV

OH FL GA MI

KY IN AL TN WI IL M

S LA MO AR MN IA KS NE OK TX SD ND WY

CO NM UT MT AZ ID NV CA WA

OR

Predicted

OR

WA

CA

NV

ID

AZ

MT

UT

NM

CO

WY

ND

SD

TX

OK

NE

KS

IA

MN

AR

MO

LA

MS

IL

WI

TN

AL

IN

KY

MI

GA

FL

OH

WV

SC

NC

VA

PA

MD

DE

NJ

NY

CT

VT

NH

RI

MA

ME

True

Users from smaller states are misclassified in nearby largerstates such as TX, NY, CA, and OH.

Users from FL are misclassified in several other states possiblybecause they are not born in FL, and are well connected totheir hometowns in other states.

23 / 25

Page 24: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Conclusion

Simple concatenation in FeatConcat is a strong baseline withlarge amounts of labelled data.

GCN performs well with both large and small amounts oflabelled data by effectively using unlabelled data.

Gating mechanisms (e.g. highway gates) are essential forcontrolling neighbourhood smoothing in GCN with multiplelayers.

The models proposed here are applicable to otherdemographic inference tasks.

24 / 25

Page 25: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,

Thank you!

Code available at:https://github.com/afshinrahimi/geographconv

25 / 25