Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users...
Transcript of Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users...
![Page 1: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/1.jpg)
Semi-supervised Geolocation via GraphConvolutional Networks
Afshin Rahimi, Trevor Cohn and Tim Baldwin
July 16, 2018
1 / 25
![Page 2: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/2.jpg)
Location Lost in Translation
SocietySocial Media
impacts
impacts
2 / 25
![Page 3: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/3.jpg)
Applications: Public Health Monitoring
Allergy Rates (Paul and Dredze, 2011)
3 / 25
![Page 4: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/4.jpg)
Applications: Emergency Situation Awareness: Bushfires, Floods andEarthquakes
Fight bushfire with #fire: Alert hospital before anybody calls(Cameron et al., 2012)
4 / 25
![Page 5: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/5.jpg)
Location Location Location
64%Profile
1%GPS
35%Nothing
0 50 100%users
Profile field is noisy (Hecht et. al, 2011), GPS data is scarce(Hecht and Stephens, 2014), and biased toward younger urban
users (Pavalanathan and Eisenstein, 2015)
5 / 25
![Page 6: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/6.jpg)
Geolocation: The three Ls
Ain’t this place a geographical oddity; two weeks away from everywhere!
Link
Location
LanguageGeolocation
User geolocation is the task of identifying the “home” location of asocial media user using contextual information such asgeographical variation in language use and in social interactions.
6 / 25
![Page 7: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/7.jpg)
Huge amounts of unlabelled data, little labelled data
Multiple views of Data: Text, Network
7 / 25
![Page 8: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/8.jpg)
Previous Work (not exhaustive)
Text-based Supervised Classification
Backstrom et al. (2008)
Cheng et al. (2010)
Wing and Baldridge (2011, 2014)
Network-based Semi-supervised Regression
Backstrom et al. (2010)
Davis Jr et al. (2011)
Jurgens (2013)
No Text No NetworkJoint/Hybrid Text+Network
Rahimi et al. (2015)
Do et al. (2017)
Miura et al. (2017)Don’t utilise unlabelled text data
Our work: Text+Network Semi-supervised Geolocation8 / 25
![Page 9: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/9.jpg)
Twitter Geolocation Datasets
GeoText TwitterUS TwitterWorld103
104
105
106
107
108
9k
420k
1.4m
370k
38m
12muserstweets
9 / 25
![Page 10: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/10.jpg)
Discretisation of Labels
Cluster continuous lat/lon: cluster ids are labels.
Use the median training point of the predicted region as thefinal continuous prediction.
Evaluate using Mean and Median errors between the knownand the predicted coordinates.
10 / 25
![Page 11: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/11.jpg)
Text and Network Views of Data
@-mention GraphStevenTrevor
MarkTim Karin
Normalised Adj. Matrix: A
0.5 0 0 0 0.5
0 0.5 0 0 0.5
0 0 1 0 0
0 0 0 0.5 0.5
0.25 0.25 0 0.25 0.25
Karin
Mark
Steven
Tim
Trevor
Kar
in
Mar
k
Stev
en
Tim
Tre
vor
Karin
Mark
Steven
Tim
Trevor
...
...
...
...
….
Mel
bour
ne
#ACL
2018
Text BoW: X
Two users are connected if they have a common @-mention.
11 / 25
![Page 12: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/12.jpg)
Baseline 1: FeatConcat
Concatenate A and X , and feed them to a DNN:Y = f ([X ,A])
The dimensions of A, and consequently the number ofparameters grow with the number of samples.
12 / 25
![Page 13: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/13.jpg)
Baseline 2: DCCA
maximally correlated
FC sigmoid
FC softmax
X : text BoW A: Neighbours
predicted location: y
FC linear
Unsupervised DCCA Supervised Geolocation
FC ReLU
CCA loss backprop
Learn a shared representation using Deep Canonical CorrelationAnalysis (Andrew et al., 2013):
ρ = corr(f1(X ), f2(A)) = cov(f1(X ),f2(A))√var(f1(X )).var(f2(A))
Y = f ([f1(X ), f 2(A)])
13 / 25
![Page 14: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/14.jpg)
Proposed Model: GCN
Highway GCN:
Highway GCN: ,
Output GCN:
X
A
A
Atanh
tanh
softmax
H0
H1
H l−1
H l
predicted location: y
W l−1, bl−1, W l−1h , bl−1
h
W 1, b1, W 1h , b
1h
W l , bl
GCN Layer: H(l+1) = ReLU(AH(l)W (l) + b
)Adding more layers results in expanded neighbourhood smoothing:control with highway gates W l
h, blh
14 / 25
![Page 15: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/15.jpg)
Highway GCN: Control Neighbourhood Smoothing
1 2 3 4 5 6 7 8 9 10
50
100
200
400
800
#layers
med
ian
erro
r(k
m)
-highway+highway
layer gates: T (~hl) = σ(W l
h~hl + blh
)layer output: ~hl+1 = ~hl+1 ◦ T (~hl) + ~hl ◦ (1− T (~hl))︸ ︷︷ ︸
weighted sum of layer input and output
15 / 25
![Page 16: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/16.jpg)
Neighbourhood Smoothing
0.5 0 0 0 0.5
0 0.5 0 0 0.5
0 0 1 0 0
0 0 0 0.5 0.5
0.25 0.25 0 0.25 0.25
Karin
Mark
Steven
Tim
Trevor
Kar
in
Mar
k
Stev
en
Tim
Tre
vor
...
...
...
...
….
Text BoW: X
✕
Karin
Mark
Steven
Tim
Trevor
Normalised Adj. Matrix: A
Smoothing immediate neighbourhood: A · Xsmoothing expanded neighbourhood: A · A · X
16 / 25
![Page 17: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/17.jpg)
Sample Representation using t-SNE
FeatConcat [X, A]
1 GCN A · X
DCCA
2 GCN A · A · X17 / 25
![Page 18: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/18.jpg)
Test Results: Median Error
GeoText TwitterUS TwitterWorld0
200
400
600
800
1000M
edia
nE
rror
inkm
TextSocialHybrid
18 / 25
![Page 19: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/19.jpg)
Test Results: Median Error
GeoText TwitterUS TwitterWorld0
200
400
600
800
1000M
edia
nE
rror
inkm
TextSocialHybridJoint DCCAJoint FeatConcatJoint GCN
19 / 25
![Page 20: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/20.jpg)
Test Results: Median Error
GeoText TwitterUS TwitterWorld0
200
400
600
800
1000M
edia
nE
rror
inkm
TextSocialHybridJoint DCCAJoint FeatConcatJoint GCNJoint Miura et. al (2017)Joint Do et. al (2017)
20 / 25
![Page 21: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/21.jpg)
Top Features Learnt from Unlabelled Data (1% Supervision)
Seattle, WA Austin, TX Jacksonville, FL Columbus, OH
#goseahawks stubb unf laffayettesmock gsd ribault #weareohiotraffuck #meatsweats wahoowa #arcgisferran lanterna wjct #slamminpromissory pupper fscj #ouhcchowdown effaced floridian #cowckrib #austin #jacksonville mommyhood#uwhuskies lmfbo #mer beering
Top terms for a few regions detected by GCN using only 1% ofTwitter-US for supervision. The terms that existed in labelleddata are removed.
21 / 25
![Page 22: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/22.jpg)
Dev. Results: How much labelled data do we really have?
6040201052140
400
800
1,200
labelled data (%samples)
med
ian
erro
r(k
m)
GCNDCCA
FeatConcat
GeoText
10050201052150
500
1,000
1,500
labelled data (%samples)
med
ianerror(km)
GCN
DCCA
FeatConcat
Twitter-World
10050201052140
500
1,000
1,500
2,000
labelled data (%samples)
med
ian
erro
r(k
m)
GCNDCCA
FeatConcat
Twitter-US
GeoText TwitterUS TwitterWorld0
500
1000
1500
2000
2500
Med
ian
Err
orin
km
Joint DCCA 1%Joint FeatConcat 1%Joint GCN 1%
Test results with 1% labelled data
22 / 25
![Page 23: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/23.jpg)
Confusion Matrix Between True Location and Predicted Location
ME
MA RI NH VT CT NY NJ DE MD PA VA NC SC WV
OH FL GA MI
KY IN AL TN WI IL M
S LA MO AR MN IA KS NE OK TX SD ND WY
CO NM UT MT AZ ID NV CA WA
OR
Predicted
OR
WA
CA
NV
ID
AZ
MT
UT
NM
CO
WY
ND
SD
TX
OK
NE
KS
IA
MN
AR
MO
LA
MS
IL
WI
TN
AL
IN
KY
MI
GA
FL
OH
WV
SC
NC
VA
PA
MD
DE
NJ
NY
CT
VT
NH
RI
MA
ME
True
Users from smaller states are misclassified in nearby largerstates such as TX, NY, CA, and OH.
Users from FL are misclassified in several other states possiblybecause they are not born in FL, and are well connected totheir hometowns in other states.
23 / 25
![Page 24: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/24.jpg)
Conclusion
Simple concatenation in FeatConcat is a strong baseline withlarge amounts of labelled data.
GCN performs well with both large and small amounts oflabelled data by effectively using unlabelled data.
Gating mechanisms (e.g. highway gates) are essential forcontrolling neighbourhood smoothing in GCN with multiplelayers.
The models proposed here are applicable to otherdemographic inference tasks.
24 / 25
![Page 25: Semi-supervised Geolocation via Graph Convolutional Networks · GPS 1% Nothing 35% 0 50 100 %users Pro le eld is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens,](https://reader033.fdocuments.us/reader033/viewer/2022060521/604faea6a2654b4f2b2281ba/html5/thumbnails/25.jpg)
Thank you!
Code available at:https://github.com/afshinrahimi/geographconv
25 / 25