Credibility, Identity Resolution, Privacy, and Policing in Online Social Media
Transcript of Credibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing on Online Social Media
IIT GuwahatiSept 26, 2016
Ponnurangam Kumaraguru (“PK”)Associate Professor
ACM Distinguished Speakerfb/ponnurangam.kumaraguru, @ponguru
Who am I?
� Associate Professor, IIIT-‐Delhi � Ph.D. from School of Computer Science,
Carnegie Mellon University (CMU) � Research interests -Social Computing, Computational Social Science, Complex Networks pertaining to Human Behavior, specifically in the context of Security & Privacy
� Co-‐ordinate and manage Precog, precog.iiitd.edu.in
� ACM Distinguished Speaker
2
Training Data
� 500 Tweets per event� Used CrowdFlower
7
Event Tweets UsersBoston Marathon Blasts (2013) 7,888,374 3,677,531
Typhoon Haiyan / Yolanda (2013) 671,918 368,269
Cyclone Phailin (2013) 76,136 34,776
Washington Navy yard shootings (2013) 484,609 257,682
Polar vortex cold wave (2014) 143,959 116,141
Oklahoma Tornadoes (2013) 809,154 542,049
Total 10,074,150 4,996,448
Credibility Modeling
8
Feature set Features (45)
Tweet meta-‐data Number of seconds since the tweet; Source of tweet (mobile / web/ etc); Tweet contains geo-‐coordinates
Tweet content (simple)
Number of characters; Number of words; Number of URLs; Number of hashtags; Number of unique characters; Presence of stock symbol; Presence of happy smiley; Presence of sad smiley; Tweet contains `via'; Presence of colon symbol
Tweet content (linguistic)
Presence of swear words; Presence of negative emotion words; Presence of positive emotion words; Presence of pronouns; Mention of self words in tweet (I; my; mine)
Tweet author Number of followers; friends; time since the user if on Twitter; etc.
Tweet network Number of retweets; Number of mentions; Tweet is a reply; Tweet is a retweet
Tweet links WOT score for the URL; Ratio of likes / dislikes for a YouTube video
Harvard (1839) – Harvard – Harvard – Harvard – MIT –Northwestern – UIUC – WUSL – CMU (2009) – IIITD (2015)
12
Challenges
15
ProfessionalOpinion
Dating
Heterogeneous OSNs
Personal
Degree of Details
Quality and descriptive personal And professional information
Little personal information Descriptive opinions
Attribute Evolution
Time
Information evolved on one but not on other
{jainpari, Bangalore}
Registration with same information on both OSNs{paridhij, New Delhi}
Generic Identity Resolution
16
Extract available &
discriminativefeatures
Candidate Identities
IDENTITY SEARCH IDENTITY LINKING
Pairwise Comparisons
Heuristic Identity Search
17cerc.iiitd.ac.in
Profile
Content
Self-mention
Network Syntactic and Image
Search Linking
If self-identified / returned by
more than one search method
No
Yes
Candidate Identities
name, location,usernamemobile no,
post,friends,
followers
Paridhi Jain, Ponnurangam Kumaraguru, and Anupam Joshi. 2013. @I seek ‘fb.me’: Identifying Users across Multiple Online Social Networks. In Proceedings of the 22nd International Conference on World Wide Web, WWW ’13 Companion. ACM, New York, NY, USA, 1259-‐ 1268. DOI=http://dx.doi.org/10.1145/2487788.2488160 [Honorable Mention Award}
Harvard (1839) – Harvard – Harvard – Harvard – MIT –Northwestern – UIUC – WUSL – CMU (2009) – IIITD (2016)
18
20
How many of you have posted mobile numbers on Online Social
Networks?
How many of you have seen mobile numbers being posted on
Online Social Networks?
Data statistics
� Twitter: 12th October 2012 – 20th October 2013� Facebook: 16th November 2012 – 20th April 2013
25
Numbers Category +91 Category 0 Category void Total
Twitter Facebook Twitter Facebook Twitter Facebook Twitter Facebook
Mobile Numbers
885 2,191 14,909 8,873 25,566 25,294 41,360 36,358
User profiles
1,074 2,663 17,913 9,028 31,149 25,406 49,817 36,588
Takeaways
�Online Social Media is a different beast in terms of privacy, identity, and credibility-Research / technologies should be developed
�Multiple interesting research, engineering, and innovation waiting to be done
� Interested in hosting students – B.Tech., M.Tech., Ph.D.
29