Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on...
-
Upload
humoyun-ahmedov -
Category
Technology
-
view
31 -
download
0
Transcript of Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on...
Akhmedov Khumoyun
Storm based Real Time Analytics for Recom-
mending Trending Topics and Sentiment Analysison Cloud Computing En-
vironment
Konkuk 2015
SMCCLab
Social Media Cloud ComputingResearch Center
Outline
• Motivation• Real Time Systems and CEP• Storm Introduction• Used Technologies• Related Work• System Overview• System Architecture• Use Case: Social Media Analytics by SAS
Motivation
• Real time computation is on demand• Responding to the problem almost instantly• Business value• Tightly connected to Cloud Computing• Batch processing limitations• and …
Real Time Systems and CEP
• Real Time System? Real-time system has been described as one which “controls an envi-
ronment by receiving data, processing then, and returning the results sufficiently and quickly to affect the environment at that time”. Real-time response latency is often in the order of seconds, or milliseconds.
• CEP(Complex Event Processing)? CEP is event processing that combines data from multiple sources to in-
fer events or patterns that suggest more complicated circumstances. The goal of CEP is to identify meaningful events (such as threats of attacks) and respond to them asap.
Apache Storm is
• Fast & scalable• Fault-tolerant• Guarantees messages will be processed• Easy to setup & operate• Free & open source
distributes real time computation system- originally developed by Nathan Marz at BackType (acquired by Twitter)
Conceptual and Physical View of Storm
Real Time StreamingApache Storm and Apache Kafka
Why we need Kafka
Apache Kafka is an ideal source for Storm topologies. It provides everything necessary for :
- At most once processing - At least once processing - Exactly once processing Apache Storm includes Kafka spout implementations for all levels of reliabil-
ity. Kafka supports a wide variety of languages and integration points for both
producers and consumers.
Used Technologies
• Apache Storm• Apache HBase• MySQL• Hadoop2• Apache ZooKeeper• Apache Kafka (message broker)• Java and some Python• jQuery and Bootstrap• Play Framework(Java) or Django(Python)
System Overview
• Trending Topics? “Twitter Trends are automatically generated by an algorithm that
attempts to identify topics that are being talked about more now than were previously.” The Trends list is designed to help people discover the most hottest topics, breaking news from across the world, in real-time.
• Sentiment Analysis? Generally speaking, sentiment analysis aims to determine the attitude
of a speaker or a writer with respect to some topic or the overall con-textual polarity of a document.
Trending Topics
Fashion
uniqlo
adidas
shanel
#armani
Politics
putin
NATO
#obama
ISIS
Sports
#messi
UEFA
#NBA
archery
Econom-ics
crisis
#Greece
loan
finance
Health
#MERS
CDO
#Cardio
Cancer
100 1000 10000 100K0
2
4
6
8
10
#MERSobamaNATO#cancershanel#crisis
#MERS…..
Trending Topics (real time feel)
Sentiment Analysis (of tweets)
• Positive• Negative• Neutral
Top Ten Trending TweetsN User Tweets Sentiment
1 BigData Red Hat Offers Apache Hadoop Big Data Services For Business Critical Workloads : http://tinyurl.com/qb83boj
Positive
2 Checkmax Secure your source code. http://bit.ly/1MnVRwQ Get a full vulnerability report and prevent security breaches
Negative
3 Time.com 5 players to follow in the Women’s World Cup http://ti.me/1LkM0Ku
Neutral
4 …. …. ….
. …. …. ….
. …. …. ….
8 …. …. ….
9 Iran #Iran, #Russia discuss regional development, #SCO membership http://theiranproject.com/blog/2015/06/20/iran-russia-discuss-regional-development-sco-membership/ …
Negative
10 ….. …. ….
Sentiment Analysis
To find sentiment of incoming tweets I will use some Machine Learning algorithms such as Naïve Bayesian Algorithm (predictive learning) and other related techniques.
Besides, I will use predefined reference sentiment dictionary as a model for efficiently determine sentiment value of tweets.
System Architecture
TCrawler
TCrawler
TCrawler
Dashboard
System Workflow
TrendingTopicsBolt
TweetManipulation-
Bolt
SentimentAnalyser-
Bolt
TweetSpout
TweetSpout
DBWriter-Bolt
MySQL
Dash-board
AllTweets
HBase
Social Media Analytics by SAS
THANK YOUAny Questions are welcome…