Business Case Analysis Finding Tweets With Criminal Intentions

download Business Case Analysis Finding Tweets With Criminal Intentions

of 6

description

jhkjkkhjk

Transcript of Business Case Analysis Finding Tweets With Criminal Intentions

Business Case Analysis

ESCI-ISTTMBusiness Case AnalysisFinding Tweets with Criminal Intentions

-Vikas R, Askok K, Prem Kumar,Swetha Reddy,Ramya Sravanthi,Veenasri,T,Chaitanya8/25/2015

The document contains a basic study about an analysis done on the Business case of identifying the tweets with criminal intentions and the approaches followed.

Business Case Objectives

To help the police force to identify the tweets with criminal intentions. To classify the problem as Classification, Regression or Optimization. To identify the problems that we think will become important in solving this. To design a dashboard that will provide the needed insights.Classification:

Here the main objective is to find out any tweets which are tweeted has the intention of doing a criminal act or not.So there are only two scenarios possible in which a tweet picked up randomly might fall in after cross checking with some predefined models. Criminal intention tweet or Not a Criminal intention tweetTherefore the problem falls under the category of CLASSIFICATION

To Identify the Problems:

As per the R and D done, below Vs were considered to be important while solving this business case

1. VOLUME: Since there are nearly millions of tweets (Lot of Data which might be in KB, MB, TB, GB..) tweeted on a daily basis. We have to have a constant check on the Volume of the data.

2. VELOCITY: Each minute the rate at which the end users keep on tweeting increases, therefore we should also have a check on the Velocity.3. VARIETY: There will be different kind of tweets which will be tweeted by the users for e.g. Texts, Images, Videos, use of Hashtags etc.

4. VALUE: Reduction and control of Life and Infrastructure damages of the Nation.

5. VISUALIZATION: Classification problem is all about partitioning the space. Therefore we will be using Scatter Plots in order to partition the space where one partition shows the tweets containing Criminal intentions and other partition shows the tweets containing No Criminal intentions.

Initial Steps to be followed:

1. DATA COLLECTION: We will be collecting all the databases from the police department which has the criminal intention tweets and will try to find out the commonly used words in these criminal tweets. For e.g. Usage of Criminal intention words like Guns, Blast, Shooting, Havoc, Kill, Bomb, Suicide bomber.

For every word used in the criminal tweets we will try to plot a bar chart depicting the number of criminal tweets with respect to that word.

2. DATA COLLECTION: We will be collecting the data (tweets on a regular basis) and analyze the tweet with respect to already Predefined criminal words and classify them.

Criminal Tweet= F (Guns, Blast, Shoot, Havoc, Bombs, etc.)

If the temporary tweet taken at random contains either of the above words like Guns, Blast, Shoot, Havoc, Bombs, etc. then the tweet will be considered as a Criminal Intention tweet else Not a Criminal Intention tweet.

3. VISUALIZATION: Based on the Scatter Plot, we can apply a simple logistic regression and divide the space into two partitions.

Any tweet containing the criminal words like Guns, Bombs, Kill etc. will be classified as Criminal intention tweets else they will be placed in the Non-Criminal Intention tweets partition.

Attributes: Age Gender Demographics Religion Marital Status Employment Prior Criminal based tweets

The above business case analysis is solved only keeping in mind to check whether a random tweet done is a Criminal Intention tweet or Not a Criminal intention tweet.However some tweets which contain the criminal words like Guns, Bombs, Kill etc. might not always be Criminal intention tweets since the tweets might be tweeted as jokes, to show sarcasm and also for fun.How to eliminate these kind of tweets will be done by still applying the deeper Analytics which we are not aware right now.