Open Source Social Media Analytics for Intelligence and ... · Social Media Websites API Wikipedia...
Transcript of Open Source Social Media Analytics for Intelligence and ... · Social Media Websites API Wikipedia...
Open Source Social Media Analytics for Intelligence
and Security Informatics Applications
Invited Talk at 4th International Big Data Analytics Conference (BDA),
Hyderabad, India 2015
December 17, 2015
Swati AgarwalPhD Scholar at Information
Management and Data Analytics Group
IIIT Delhi, India
Ashish SurekaPrincipal Scientist at ABB Corporate
Research Center
Bangalore, India
Vikram GoyalAssociate Professor at IndraprasthaInstitute of Information Technology
New Delhi, India
Learning Objective
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
What
Why
How
• Open source social media Intelligence
• Focus of Tutorial (Sub-topics under the field of ISI Research)
• Need and Importance of research in the field of ISI
• Technical and Computational Challenges
• Case Studies showing the applications of CSR in the domain of ISI
• Future Directions
Learning Objective
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Remember
Understand
Application
Research
keywords and terminology
What is Intelligence and Security Informatics
Need and Importance of research in the field of ISI
Current state-of-the art and scope of research in the domain
What ideas of computer science
research you can apply in ISI
domain
Applying
CSR
methods for
solving the
problems
raised by ISI
Tutorial Structure
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Session I
Introduction to the field of Intelligence and Security Informatics
Need and Scope of Research in the field
Uniqueness of Problem/Area (Technical Challenges)
Session II
Technology Landscape and Current state-of-the-art
Case studies on open source social media intelligence applications in ISI
Leading and relevant venues for publication in the domain of ISI
Agenda
Importance of ISI
Introduction
Technical Challenges
Focus and Scope
Technology Landscape
Case Studies
ISI Leading Venues
Conclusions
Future Directions
References
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Importance of Intelligence
and Security Informatics
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
TerrorismAn illegal use of violence, aimed against civilians in
order to achieve different kind of political ends.
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Secret Intelligence
Service (MI6) in United Kingdom
Started after 9/11 attacks in United States
FBI, Nation’s Prime Federal
Law Enforcement Organization
Research and Analysis Wing(RAW), Started
after Sino-Indian and
Indo-Pakistan War
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Image Sources: Wikipedia
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Image Sources: RajComics, MarvelComics, Wikipedia
Intelligence and Security Informatics
Technologist
SociologistPsychologist
How Computer Science Research can be used
for border control, terrorism, crime prevention on
Cyber-Security
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
““The medium is the message”.
-Marshall McLuhan
Role of Social Media
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Role of Social Media
Al-Qaida encourages “homegrown terrorism” and
recruitment of young people
Information shared on social media has access to
more individuals to carry out “lone wolf” operations
against Western targets
42% of teenagers (15-17 yrs) in United States use
social networking websites.
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
50M tweets per day, 640M users on Twitter
100 hours of videos uploaded on YouTube
every minute
124.3B posts, 264.6M blogs on Tumblr
Popular Social Media Websites
Statistics
>75M daily active users, 40B photos uploaded
on Instagram
For any government organization, law
enforcement agency or security analysts-
constant monitoring and manual
annotation of each post made on social
media portals is overwhelmingly
impractical
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
3 men from Mumbai, Maharashtra join
ISIS in Iraq (age group 20-25)
Jake Bilardi, manipulated, radicalized,
Jihadist (18 yrs. old, 10th std. student)
Image Sources:
1. http://indiatoday.intoday.in/story/agencies-on-the-tail-of-
maharashtras-missing-jihadis/1/372442.html,
2. http://www.news.com.au/national/freshfaced-westerners-
are-being-lulled-into-terrorism-by-isis-propaganda/news-
story/8448148e3a0c33c01b95db4dc1bab492,
3. http://www.ibtimes.com/who-ali-shukri-amin-virginia-isis-
teenager-behind-pro-islamic-state-twitter-sentenced-
2073208
Ali Shukri Amin, (17 yrs. old) from
Virginia, sentenced to 11 yrs. of prison
for raising funds for ISIS through Twitter
@AmeerkiWitness
Introduction
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
“ Intelligence collected and inferred form
publicly available and overt sources of
information
Open Source Intelligence
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Infochimps
Social Media Websites API
Examples- Open Source
Data Portals
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Image Sources: Website Developer Page, Official Websites of Data Portals
“ A sub-field within OSINT with a focus on
extracting insights from publicly available
data in Web 2.0 platforms
Open Source Social Media
Intelligence
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Social Media Websites API
Wikipedia
Social Curation (Theme or topic based
content sharing)
Crowd Sourcing Websites
Examples- Web 2.0 Open
Source Data Portals
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Image Sources: Website Developer Page, Official Websites of Data Portals
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
“ Current generation of the web
Highly participative and collaborative
Allows users to post content, messages and
comments
Online Social Media
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Online Social MediaPopularity
Anonymity
High Reachability
Social Networking
Low Publication Barriers
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Image Sources: Google Images, http://maliaholleron.com/wp-content/uploads/2012/12/Social-Media-Icons-cloud-300x256.png
Video Sharing and
Hosting Website
Social Networking
Website
Micro-Blogging
Websites
Community Based
Questions and Answering
Social
Bookmarking
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Image Sources: Official Websites
“ A field of study concerning investigation and
development of counter terrorism, national
and international security support systems
and applications
Intelligence &
Security Informatics
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Online Radicalization
Image Sources: http://www.trbimg.com/img-5178e6b1/turbine/la-na-tt-internet-imams-20130425-001/600
“Online Radicalization
Hate and extremism promotion
Promoting certain ideology and beliefs
Forming virtual communities online
Recruiting members for hate promotion
Brainwashing people affected from several
incidents
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
boko$haram,$islam,$muslim,$radical$islam,$terrorist,$jihad,$isis,$liberals,$al$queda,$holy$quran,$je$suis$charlie,$Islamic$Jihad,$Israel,$ak47,$guns,$rifle,$hate,$an<=American,$violence,$ba@le,$weapon,$religion$of$peace$
Tags/Keywords-associated-with-the-post-
Online Radicalization
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Image Sources: tumblr.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Online Radicalization
Swarmcast: How Jihadist Networks Maintain a Persistent Online Presence- Ali Fisher (Perspectives on Terrorism, June 2015)
Image Sources: twitter.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Example- Constructing
Communities
A Focused Crawler for Mining Hate and Extremism Promoting
Users and Communities on YouTube Swati Agrawal, Ashish Sureka
Indraprastha Institute of Information Technology, Delhi (IIIT-D)
{swatia, ashish}@iiitd.ac.in
!
!
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Image Sources: Google Images,
antitrustlair.files
“ Planning and mobilizing of civil unrest related
events via web platforms
Examples: Protests, Public Demonstrations and
Riots
Online Civil Disobedience
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Online Civil Disobedience
Image Sources: twitter.com
“ Machine Learning Techniques
Social Network Analysis
Visualization
Text Mining and Analytics
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Technical Challenges
50M tweets per day, 640M users on Twitter
100 hours of videos uploaded on YouTube
every minute
124.3B posts, 264.6M blogs on Tumblr
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
1. Massive Size and High
Velocity
>75M daily active users, 40B photos uploaded
on Instagram
2. Rich User Interaction on
Social Media Websites
Following, Follower, Re-tweet, Favorite, Share
Like, Comment, Share, Subscriber, Subscription, Friend, Featured Channel
Ask, Follower, Follower, Like, Re-blog, Share, Submit, Reply
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
3. Multilingualism
Image Sources: twitter.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
4. Noisy Content
Presence of low quality messages and content of
low relevance
Image Sources: twitter.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
4. Noisy Content
Grammatical and spelling errors
Image Sources: twitter.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
4. Noisy Content
Presence of non-standard acronyms and
abbreviations
Image Sources: twitter.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
4. Noisy Content
Use of emoticons and incorrect capitalization-
informal nature of content
Image Sources: twitter.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
5. Data Annotation and Ground
Truth
Need/Importance
Effort Intensive
Imbalance Data
• basis for several machine learning algorithms- examining the performance
• Vast amount of data being uploaded on social media dataset in every second
• 1 in 100,000 posts is hate promoting or posted by extremist users
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
6. Manipulation, Fabrication
and Adversarial Behavior
Fake information
Image Sources: twitter.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
6. Manipulation, Fabrication
and Adversarial Behavior
Rumors
Image Sources: twitter.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
6. Manipulation, Fabrication
and Adversarial Behavior
Manipulative/Misleading Information
Commonly seen in videos
Textual
• Title
• Tags
• Description
Media
• Thumbnail
• Annotation
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
6. Manipulation, Fabrication
and Adversarial Behavior
Image Sources: youtube.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Focus and Scope
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Focus of Tutorial
Exploring two sub-problems within the broader area of
Intelligence and Security Informatics:
1. Online Radicalization Detection (presence of
extremist content, users and communities on
social media websites)
2. Online Civil Unrest Prediction (an early
prediction of civil unrest related events using
social media content)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Technology Landscape
Online Radicalization
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
• KNN
• SVM
• Naïve Bayes
• Boosting
• Logistic
Regression
• Topical Crawler
• Decision Tree
• EDA
• SVM
• OSLOM
• Rocchio
• Naïve Bayes
• Keyword Based
Flagging
• Honeypots
• Face Recognition
• Rule Based Classifier
• Regularized Least
Square
• TC
• EDA
• BFS
• DFS Link
Analysis
• Clustering-
Blog Spider
• Topical
Crawler (TC)
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
• Clustering-
Blog Spider
• Exploratory
Data Analysis
(EDA)
• TREC
• EDA
• Support Vector
Machine (SVM)
• Blockmodeling
• Multi-dimensional
Scaling
• Spring Embedder
• SVM
• Naïve Bayes
• Adaboost
• EDA
• Naïve Bayes
• OSLOM
• N-gram
• KNN
• SVM
• Best First
Search
• Shark Search
• Language
Modeling
Content Identification
User and Community Identification
Online Civil Unrest Prediction
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
• Stochastic
Hybrid Dynamic
Model
• Avatar
ensembles of
decision trees
Colb
au
gh
et.
Al.
Hu
a e
t. A
l.
• Clustering
Algorithm Nare
n e
t.
Al. • Map Reduce
• Apache Pig
• Naïve Bayes
• Logistic Regression
• Maximum Likelihood
• Generalized Linear
Model
• Dynamic Query
Expansion Jie
jun
Xu
et.
Al.
Ch
en
et.
Al.
• Heterogeneous
Graph Modeling
Com
pto
n e
t. A
l.
• Logistic
Regression Filch
en
ko
v e
t. A
l.
• Mathematical and
Theoretical Model
Sath
appan
et.
Al.
• Probabilistic
Soft Logic
Case Studies
1. Identification of Online
Extremist Content, Users and
Hidden Communities on YouTube
S. Agarwal and A. Sureka, A Focused Crawler for Mining Hate and Extremism Promoting Users, Videos and Communities on YouTube, 25th ACM Conference on Hypertext and
Social Media (HT 2014) 1-4 Sep 2014, Santiago, Chile
YouTubeLargest and most popular free video hosting and
sharing website- Found in 2005
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Image Sources: youtube.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Image Sources: youtube.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
“ Mining user generated content on social
media platforms to identify topic based hate
promoting content and locating hidden &
virtual communities of extremist users.
Research Problem
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Research Contributions
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Application of focused crawler for identifying
extremist content and locating hate
promoting users on YouTube Best First Search
Shark Search
Content based characterization of hate
promoting videos Focus of the content shown in video
Targeted audiences
Keywords present in the contextual metadata and spoken in
the video
Focused/Topical Crawler
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Source: Dongyang Hou et. al; 2014
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
1. Subscriber
2. Featured Channel
3. Public Contact
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
>th
0.55
0.850.35
{Subscriber, Featured Channel, Public Contact}
0.25
0.32
Threshold: 0.30
0.32
Proposed Framework
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Titles'of'Videos:'
Image Sources: youtube.com
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Experimental Setup Discriminatory Features Set:
YouTube Profile Summary
User Activity Feeds (Title of Videos Uploaded, Shared, Commented
and Favorited)
Training Data: Discriminatory Features for 35 Hate Promoting Channel
(HinduismIslam, IndiaEternal, ISIcyberAGENT etc.)
Dynamic Parameters: 10 different YouTube Channels as Seeds (PakistanRoxxx,
hiddenpakistani, GreaterPakistan etc.)
Character N-Gram (3, 5)
Threshold Value for Relevance Computation (-2.0, -2.5, -3.0)
Focused Crawler: Best First Search and Shark Search
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Experimental Results-
Focused Crawler
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Best First Search
Approach
Predicted
Relevant Irrelevant
Actual
Relevant 921 314
Irrelevant 125 67
Shark Search
Approach
Predicted
Relevant Irrelevant
Actual
Relevant 991 295
Irrelevant 55 29
TPR OR
RecallTNR
PPV OR
PrecisionNPV F1-Score Accuracy
BFS 0.75 0.35 0.88 0.18 0.81 0.69
SSA 0.77 0.35 0.95 0.09 0.85 0.74
Confusion Matrix for Best First Search and Shark Search Algorithm (60 iterations each)
Accuracy Results for Best First Search and Shark Search Algorithm (60 iterations each)
Social Network Analysis
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Community Graph of YouTube Users After Applying Focused Crawlers: Best First Search
(Left), Shark Search (Right)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Videos YT CategoryAvg. Length
(Sec)
Content
FocusTarget Audience Keywords
43News, Non-
profit151.68
Honor
Killing,
Harassment
Women, Refugee
People, Children
Child Marriage, Rape,
Responsibility, Protest,
Women, Asylum, Arrested,
Brutal, Slave
93
News, Auto-
Vehicle,
Politics
2526.16Islam
Promotion
Jewish, Muslim
People
Taliban, Bombs, Battle,
Courage, Allah, Islam,
Courage, Belief, Macca,
Money, Shaheed, Enemies
Of Islam
30Entertainme
nt319.61 Anti-India India Haters
Kashmir, Poverty, Liberate,
Hindu, Beggars, Pundit,
Untouchable, Extremism,
Attack, Killed, Anti-Muslim,
Anti-Pakistani, Hatred,
Masks, Freedom.
Examples- Content Based
Characterization of Videos
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Videos YT CategoryAvg. Length
(Sec)
Content
FocusTarget Audience Keywords
25
News,
Politics &
Education
1225.56Liberate
KashmirKashmiri People
Muslim, Army, Military,
1947, Partition, Azad
Kashmir, Liber- ate
Kashmir, Pakistan, India,
Killing, Murder, Border,
Fighting, Democracy,
Martyr, Torture.
83News &
Politics349.28
Anti-
MuslimsPakistan Haters
Kashmir, Jihad, Pakistan,
India, Quran, Muslim,
Hindu, Qatil, Zakir Naik,
Hate Speech, Masjid,
Pandit, Defence, Madarsa,
Tribute, Bharat, America,
Attack, Napak, Holy,
Kabba,
Examples- Content Based
Characterization of Videos
Content Based
Characterization of Videos
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Type of Video Content: Speech, News Segments,
Drawing, Interviews, Group Discussion, Animated
Videos, Lectures, Cartoon and Comics, Debate,
Recorded Videos, Textual Messages, Pictures with
Background Music
2. Early Prediction of
Civil Unrest Related
Events on Twitter
TwitterLargest and most popular free micro-blogging and
social networking website- Found in 2006
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
An expression of a moment or idea
Text, Photos, Videos, Location, Poll, External URLs, hashtag, @
Maximum of 140 character lengthTweet
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Comment on a Tweet and join the conversation
Share a Tweet with your followers and add your thoughts before sharing
Favorite a Tweet and let author to know that you like their post
Assigning a topic to the Tweet using ‘#’ and make Tweet easily searchable
Follow other micro-bloggers to receive their Tweets in your Newsfeeds
Activities
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
“Research Problem
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
A characterization study on open source
Twitter dataset to investigate the feasibility of
building event forecasting model
Evaluating the performance of machine
learning and statistical based forecasting
model on real world data
Research Problem
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
• location l where the protest is going to happenSpatial
• time expression ti (day, date and time) of the protestTemporal
• what the protest is about- to - root cause of the protestTopic
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Research Contributions
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
A content based characterization and semantic
enrichment on raw tweets to classify “crowd-buzz &
commentary” and “mobilization & planning”
microposts
Investigating the application of trend analysis
(captured along the sliding window) for event
forecasting.
Proposed Framework
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Learning(Models:(
Named(En2ty(Recogni2on:(• • •
WebObservatory: https://web-001.ecs.soton.ac.uk/wo/dataset
Experimental Setup Dataset
‘Immigration’ tweets dataset downloaded from Southampton University
2M tweets (October 1, 2013 to February 28, 2014)
Collection of tweets containing terms ‘immigration’, ‘migration’,
‘immigrant’, ‘migrant’
Civil Unrest Related Events FastForFamilies in National Mall, USA
Christmas Island Hunger Strike, Australia
Sliding Window 7 days
For 16th January event (9 to 15 January)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Available Locations of User Profiles for Christmas Island Hunger Strike Event
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Lead Indicator Classifier
Crowd-Buzz and Commentary
Planning and Mobilization
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Event Forecasting Model
Pairs of named entities: (ti, to), (to, l)
and (l, ti)
Extract all expressions of x entity that
are θ frequent
Extract all expressions of paired entity
y that are Ψ frequent for at least one x
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
US#AU#
Right#of#Asylum#
Migrant#Worker#
Refugee#
EU#
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Experimental Results
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Crowd-Buzz
Classifier
Predicted
C&C NA
Actual
C&C 719 109
NA 137 82007
Mobilization and
Planning
Predicted
M&P_E M&P_G NA
Actual
M&P_E 127 6 9
M&P_G 7 1236 22
NA 13 18 81541
Precision Recall F1-Score
C&C 0.84 0.86 0.85
M&P
0.89 0.81 0.85
0.96 0.98 0.97
0.99 0.99 0.99
Confusion Matrix for Crowd-Buzz and Mobilization & Planning Classifiers
Accuracy Results for Crowd-Buzz and Mobilization & Planning Classifiers
Experimental Results
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Distribution of χ2 and p-value for Frequent Pairs of Locations (Australia (LEFT), U.S. (RIGHT)) and
Topics for 3 consecutive Days in Sliding Window (Christmas Island Hunger Strike)
ISI Leading Conferences
and Journals
SocialComm SASCVV
SocialComm, ISI, EISIC, WebSci, PAISI, ICWSM
Journals Security Informatics, CyberTerrorism, Criminal
Justice and Popular Culture
Machine Learning/NLP BDA, COMAD, ICDCIT
PAKDD, SIGKDD, CIKM, AAAI
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Conclusions
OSSINT is a growing research field and has attracted the
attention of several researchers in past decade
We discuss two major and important applications of
OSSINT- Online Radicalization and Online Civil
Disobedience
YouTube is most widely used platform for Online
Radicalization, Hate and Extremism Promotion
KNN, SVM, Clustering, Naïve Bayes, KBF, EDA are
commonly used techniques for hate promotion detection
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Conclusions Focused Crawler is an efficient approach to locate hate
promoting content on YouTube.
User activity feeds can be used as discriminatory features to
identify a hate promoting user channel.
SSA approach has higher precision, recall and f-score in
comparison to BFS approach.
Social Network Analysis helps us to locate hidden
communities and user playing major role in community.
Hate promoting users upload videos targeting some specific
audiences who are affected by the incidents shown in the
video
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Conclusions Twitter plays and important role in facilitating mobilization
and planning of civil unrest related events
Clustering, Logistic Regression and Dynamic Query
Expansion, Ensemble Learning are most commonly used
techniques for civil unrest event prediction
We present an approach for early detection of civil unrest
related events evaluated on real world dataset (open source
Twitter data)
We perform semantic enrichment on raw tweets and filter
event related tweets (crowd-buzz and mobilization)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Conclusions We develop a frequency based model on enriched tweets
and find those pairs of location, topics and time that are
significantly correlated
Detecting trend analysis of spatial, temporal and topic based
entities in sliding window is an efficient approach for event
forecasting
Early identification of crowd-buzz and mobilization tweets is
value added for event forecasting
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
Future Directions
Investigating the application of parallel corporal for
detecting online radicalization communities over cross
platforms
Detecting protest related events in real time and events
with overlapping sliding window (multiple events occurring
in same time frame)
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
References
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
1. Agarwal, S., Sureka, A.: Copyright infringement detection of music videos on youtube by mining
video and uploader meta-data. In: Big Data Analytics (BDA). pp. 48–67 (2013)
2. Agarwal, S., Sureka, A.: A focused crawler for mining hate and extremism promot- ing videos on
youtube. In: 25th ACM Conference on Hypertext and Social Media (HT). pp. 294–296 (2014)
3. Agarwal, S., Sureka, A.: Learning to classify hate and extremism promoting tweets. In: Intelligence
and Security Informatics Conference (JISIC). pp. 320–320 (2014)
4. Agarwal, S., Sureka, A.: Topic-specific youtube crawling to detect online radicaliza- tion. In:
Databases in Networked Information Systems (DNIS), 2015. pp. 133–151 (2015)
5. Agarwal, S., Sureka, A.: A topical crawler for uncovering hidden communities of extremist micro-
bloggers on tumblr. In: 5th Workshop on Making Sense of Micro- posts (MICROPOSTS) (2015)
6. Agarwal, S., Sureka, A.: Using common-sense knowledge-base for detecting word obfuscation in
adversarial communication. In: Workshop on Future Information Security (FIS) (2015)
7. Agarwal, S., Sureka, A.: Using knn and svm based one-class classifier for detecting online
radicalization on twitter. In: Distributed Computing and Internet Technol- ogy (ICDCIT). pp. 431–442
(2015)
8. Aggarwal, N., Agarwal, S., Sureka, A.: Mining youtube metadata for detecting privacy invading
harassment and misdemeanor videos. In: Privacy, Security and Trust (PST). pp. 84–93 (2014)
References
Agarwal S., Sureka A., Goyal V. - Invited Talk @ 4th International Big Data Analytics Conference, Hyderabad, India (December 17, 2015)
9. Budak, C., Georgiou, T., Agrawal, D., El Abbadi, A.: Geoscope: Online detection of geo-correlated
information trends in social networks. Proceedings of the VLDB Endowment 7, 229–240 (2013)
10. Compton, R., Lee, C.: Detecting future social unrest in unprocessed twitter data:emerging
phenomena and big data. In: Intelligence and Security Informat- ics (ISI). pp. 56–60 (2013)
11. Fu, T., Huang, C.N., Chen, H.: Identification of extremist videos in online video sharing sites. In:
Intelligence and Security Informatics, 2009. ISI ’09. IEEE Inter- national Conference on. pp. 179–
181 (June 2009)
12. Hua, T., Lu, C.T., Ramakrishnan, N.: Analyzing civil unrest through social media. Computer 46(12),
80–84 (Dec 2013)
13. Kwok, I., Wang, Y.: Locate the hate: Detecting tweets against blacks. In: AAAI (2013)
14. Qazvinian, V., Rosengren, E., Radev, D.R., Mei, Q.: Rumor has it: Identifying mis- information in
microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language
Processing. pp. 1589–1599. Stroudsburg, PA, USA (2011)
15. Ramakrishnan, N., Butler, P., Muthiah, S.: ’beating the news’ with embers: Fore- casting civil unrest
using open source indicators. In: Proceedings of the 20th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining. pp. 1799–1808. KDD ’14, ACM, New York, NY, USA
(2014)
16. Wang, M., Alan, C.G.: Intelligence and security informatics. Pacific Asia Workshop (PAISI) (2011)