SA

Comprehensive sentimental data mining, analysis and visualization to improve Business Outcomes

The Team:

Arjit SachdevaPrashast Kumar SinghNishtha PandeDhruv Mahajan

The Problem:

Making sense of data to facilitate consumer-centric companies, governments etc in taking decisions to improve their products.

Scenario #1

Imagine you’re a mobile manufacturer, say HTC. You just launched your flagship phone, the HTC X, which boomed on the Internet with users posting their reactions about it.

But, is there any way you could actually go through hundreds of thousands of those reviews individually, and use that data for your organization at all ?

Is it possible to analyze the sentiments of the user in what he wrote about the HTC X ?

How about analyzing not just the positive/negative quotient in the posts, but also getting a summarized feedback on what users liked the most, and disliked the most as well, about the HTC X ?

Scenario #2

The government wants to connect to hundreds of thousands of people and analyze their views. How to directly connect to people to answer questions like:

Government wants to know how the people are reacting to a new policy announcement.

• What parts of the policy do the voters like? (Example Tax cuts)• What parts of the policy need to be changed of modified?

Getting feedback on proposed laws

• What do the people think about a proposed law (positive/negative response)?

• How the proposal be improved?• Analyse the negative comments.

Our approach towards a Comprehensivesentimental analysis and visualization tool

Break up a review into sentences, and parse each sentence using the rules of English grammar.

Identify the various relationships(dependencies) existing between all pairs of words.

Filter the relevant relationships and make a list of relevant nouns and adjectives.

Assign scores using a self-learning scoring algorithm.

Use the generated data structures to visualize data to provide answers to businesses’ questions.

Parsing is the process of assigning structural descriptions to sequences of words in a natural language.

PARSING

IDENTIFYING RELATIONSHIPS

The Stanford typed dependencies representation was designed

to provide a simple description of the grammatical relationships

in a sentence that can easily be understood.

SCORING NOUNS

We find the scores of the Adjectives present using the

SentiWordNet API. These scores are then assigned to the

corresponding Nouns and stored in Guava structures.

VISUALIZATION

Intuitive 2D and 3D visualizations of every aspect of data,

mapping changes in sentiments about your brand,

demographics and other analytics

Analysis of sentiments inside data is a very complex task for a machine because of the multiple and often unpredictable soft and hard variables that come into play when interpreting it. The main problem being that the sentiment of a sentence only rarely lies in the sentence itself and is instead rooted in the cultural context around that sentence.

A few challenges:

This requires the algorithm to compute a vast amount of densely interconnected information to answer a fairly simple question in human terms. Just a few keywords taken separately won’t do the job. A bit like: The Government is wrong in its decision because it is a racist one.

We need to consider a lot of combinations together to figure out WHY the decision is thought wrong by people.

Retrieve Data from various Social media

channels

Load the collected data into Database

ANALYSIS

Behavior

Segmentation

Share Of Voice

Affinity Relation

Summarized feedback with

intuitive 2D and 3D visualizations of

every aspect of data

PERT CHART

VISUALIZATION

How STARK attempts to answer a few generic scenarios?

Company A: Can you summarize what the user talked about my product, in specific detail?

STARK shows the summary of the reviews

Company A: We had incorporated a new kind of a camera having a super-fast zoom. How strongly did the user talk about the camera?

STARK processes the reviews and generates the following meter graph for CAMERA. The meter graph shows that the user has responded positively to the quality of camera.

Company A: Overall, how strongly did he express his views about my product?

STARK shows the mean sentiment distribution of various components i.e the aggregated mean sentiment shown by all users towards each component.

Company A: Since we had many new things in our product this time, I'd like to know that feature which was talked about the most by him.

STARK shows the percentage distribution of various components in the review. It gives an overview of the components that are being talked about and towhich extent.

Company A: I still need one more detail. Did he talk about the camera positively only? Or was it negatively Or both? How many times positive and how many negative?

STARK shows the sentiment distribution of various components. Sentiments distribution means the sentiment shown by user towards each component.

Company A: Could you finally quantify the scores assigned to each feature?

STARK shows the scatter plot and line graph of all the features.

Cheers to BIG DATA in a SMALL WORLD

• Arjit Sachdeva• Dhruv Mahajan• Nishtha Pande• Prashast Kumar Singh

SA

Technology

Transcript of SA