Post on 11-Jun-2015
description
Finding Your Friends and Following Them to Where You Are
Adam Sadilek, Henry Kautz, Jeffrey P. Bigham
University of Rochester, New York, USA
Presenter: Yoh Okuno #wsdm2012
• Name: Yoh Okuno
• R&D Engineer at Yahoo! Japan
• Interest: NLP (Natural Language Processing),
Machine Learning, and Data Mining.
• Skills: C/C++, Java, Python, and Hadoop.
• Website: http://yoh.okuno.name/
About Presenter
Overview
1. Introduction
2. Friendship Prediction
3. Location Prediction
4. Evaluation
5. Conclusion
1. Introduction
“Check-‐in” Services or Posts with Geo-‐tags
Figure 1: Tweets with Geo-‐tags at New York City
http://cs.rochester.edu/u/sadilek/research
Summary: Predicting Friendships and Locations
• Tasks: friendship and location prediction
• Approach: model interaction between them
• Data: real-‐world Twitter dataset
• Problem: private locations are not provided
• Result: 90% of private locations is revealed
Data: Crawled Twitter Search API f0r 1 Month • Focus on users who have >100 geo-‐tag tweets
FLAP: Friendship + Location Analysis and Prediction
Crawler
Visualizer
Learning and Inference
2. Friendship Prediction Task
Similarity Features: Text, Location, and Graph
1. Text: inner product without stop word
2. Co-‐location: overlap time in the same place
3. Graph : # of common friends (normalized)
Learning: Regression Decision Tree (DT)
• Used DT whose output is probability
• These 3 features had the maximum
information gain for DT
• Other features including Jaccard coefficient
were useless in this case
• LSH speeds up O(n^2) operation
3. Location Prediction Task
Figure 3: Dynamic Bayesian Network (DBN)
• People move between tweets t and t+1
– u_t: location of user u at tweet t
– fi_t: location of friend i at tweet t
– td_t: time of day at tweet t
– w_t: whether it is work day or not at tweet t All variables are discrete
Learning: Both Supervised and Unsupervised
• Supervised learning for each geo-‐active users
• Unsupervised: simulate “virtual” private users
– EM algorithm with forward-‐backward
– Simulated annealing to avoid local optimum
4. Evaluation
Evaluation for Friendship Prediction Task
• Evaluation settings
– Reconstructed friendship graphs via models
– Selected edges randomly from 0% to 50%
• Evaluation results
– FLAP outperforms previous works
– FLAP works well even if no edges were given
• Note: texts and locations are provided normally
Figure 4: Averaged ROC Curve
Evaluation for Location Prediction Task
• Evaluation settings – Data: first 20 days for learning / later 6 days for test
– Varied # of friends that the system considers
• Evaluation results – Supervised: 77% accuracy with only 2 friends
– Unsupervised: 57% accuracy with 9 friends
– “Locations can be inferred even for private accounts”
Table 6: Accuracy for Location Prediction Task
Conclusion
• For friendship prediction task:
– Combined text, location and graph features
– Reconstructed friendship graph with no seeds
• For location prediction task:
– Exploited friend’s locations to infer location
– Unsupervised result shows “private is not safe”
Future Work
• Text features (NER) for location prediction
• Joint model of locations and friendships
• Evaluate semi-‐supervised learning (hopefully)
• Consider the privacy issue as a tradeoff
Any Questions?
More Precisely: Belief Propagation