Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.
-
Upload
gerard-walsh -
Category
Documents
-
view
214 -
download
2
Transcript of Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.
![Page 1: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/1.jpg)
Mobility analysis from Twitter data
NTTS 2015 - satellite Workshop on Big Data
![Page 2: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/2.jpg)
Twitter as data source
NoSQL Database
Filter by: Geo-referenced Only
México
Real-time Tweets
INEGI
TwitterTwitter
![Page 3: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/3.jpg)
Why Tweeter?
• Availability• 1% of Tweets available without cost• Around 12 M accounts in Mexico• 700,000 accounts are geo-referenced• Collection of 150 M of tweets since
January 2014
![Page 4: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/4.jpg)
Devices generatingtweets in Mexico
Andr
oid
iPho
ne
![Page 5: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/5.jpg)
Tweet collection infrastructure
Unix “Red Hat”
NoSql Database “Elasticsearch”
Cluster (Hydra)
Big Data Layers
Test of Concept
![Page 6: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/6.jpg)
General Process
Every DayCollection
StoreGeo-Referenced
Tweets
15M
?
Set an Objective
Filter and Process
Generate outputs
![Page 7: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/7.jpg)
Topics
• Mobility– Internal flows– Tourism– Borders commuting– National Roads Networks: Use of roads (planned)– Urban influence zones (planned)
• Subjective wellness– Based on text– Based on emoticons
![Page 8: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/8.jpg)
![Page 9: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/9.jpg)
Geo-referenced Tweets 2014
![Page 10: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/10.jpg)
![Page 11: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/11.jpg)
DF
Internal mobility (from-to)
Méx
ico St
ate
To Mexico City
From Mexico
City
Where we go when tweeting?
![Page 12: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/12.jpg)
Internal Tourism
Origin of Tourists visiting
Guanajuato (1-3 February 2014)
![Page 13: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/13.jpg)
Internal Tourism
Origin of Tourists visiting
Puebla(1-3 February 2014)
![Page 14: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/14.jpg)
Use of twitter in long weekendsDisplacements to Puebla and Guanajuato before, on and
after 1-3 February period
![Page 15: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/15.jpg)
Border commuting
• México
• USA
![Page 16: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/16.jpg)
National Roads Network
![Page 17: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/17.jpg)
Urban Influence zones
![Page 18: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/18.jpg)
Subjective Wellness• Complement of existing survey
– Subjective perceived wellness (monthly)
• Two approaches– Based on emoticons (possible international
comparability)• Netherlands experiments
– Based on text (diversity of analysis, regionalisms)
• Text analysis infrastructure development
![Page 19: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/19.jpg)
Methods and Tools
• Pioanalisis: Tool for collection of the training set (crowdsourcing)
• Machine learning (supervised and unsupervised), Support Vector Machines, Incremental Learning
• Random forest, Latent Dirchlet Allocation (LDA)• SOM Neuronal Networks (SOM: Self Organizing
Map)• Classification Methods: Naive Bayes, Support
Vector Machines (SVM), KNN, Word Count• Dictionaries:Spanish Emotion Lexicon (SEL), KNN,
AFINN, WordNet, ANEW
![Page 20: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/20.jpg)
Partnerships• International
– UNECE• ICHEC
– UNSD– LAMBDoop– University of Pensylvania
• National– KioNetworks
• Dattlas
– TecMilenioINFOTEC– Centro Geo– CIDE– CIMAT– Sectur
• Internal– INEGI General Directions
![Page 21: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/21.jpg)
Conclusions• We are in a discovery stage:
– Findings going from ‘interesting’ to ‘valuable’
• Lot of research needed: – … but we are getting a lot of knowledge and experience
• Partnerships are a must• Combining other big data sources is an imminent next
step• New challenges and threats will appear
– Costs increase?– Legal issues?– Methodologies and quality frameworks re-engineering)?– Evolution of traditional statistics?
• A lot of etcetera?
![Page 22: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.](https://reader037.fdocuments.us/reader037/viewer/2022110400/56649dd05503460f94ac4e82/html5/thumbnails/22.jpg)
New statistics production landscape?