MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network,...
Transcript of MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network,...
![Page 1: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/1.jpg)
ii
ABSTRACT
Recently, social media has become important for social networking and content
sharing. Twitter, an online social network, allows users to upload short text messages,
also known as tweets, with up to 140 characters. A lot of people use sentiment analysis
on Twitter to do opinion mining. People choose Twitter because Twitter serves as a good
platform for sentiment analysis because of its large user base from different sociocultural
zones. The objective of Sentiment Analysis is to identify any clue of positive or negative
emotions in a piece of text reflective of the authors’ opinions on a subject.
Twitter API, twitter4j, is processed to search selected popular electronic products
on Twitter. K-means cluster approach is used to find some clusters that have similar
sentences. Similar sentence means the sentences have the same keywords. It means the
tweets in the cluster are about how people think about similar features of selected popular
electronic products. Each cluster is entered into feature-based sentiment analysis to get
the score. After that, the total tweets also process in the sentiment analysis system to
analyze how people think about selected popular electronic products. The system uses
TF-IDF, k-means algorithm, SentiWordNet and Stanford tool to handle different level
steps.
![Page 2: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/2.jpg)
iii
TABLE OF CONTENTS
Abstract .............................................................................................................................. ii
Table of Contents ............................................................................................................... iii
List of Figures ......................................................................................................................v
List of Tables .................................................................................................................... vii
1. Introduction .................................................................................................................1
2. Background and Rationale ..........................................................................................2
2.1 Sentiment Computing and Classification ...........................................................2
2.2 Clustering ...........................................................................................................3
2.2.1 Twitter Clusters System ............................................................................4
2.2.2 K-means Algorithm ..................................................................................6
2.3 Sentiment Analysis ............................................................................................7
2.4 Feature-based Sentiment Analysis Systems.......................................................8
3. Clustering and Sentiment Analysis…. ......................................................................11
3.1 Problem Report ................................................................................................11
3.2 Project Objective ..............................................................................................11
3.3 The Steps of Project .........................................................................................12
3.3.1 TF-IDF ....................................................................................................12
3.3.2 K-means Algorithm ................................................................................13
3.3.3 Sentiment Analysis System .....................................................................13
4. Implementation and Results…. .................................................................................15
4.1 Environment .....................................................................................................15
![Page 3: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/3.jpg)
iv
4.1.1 Microsoft Visual C# ................................................................................15
4.1.2 Java Swing ..............................................................................................15
4.1.3 Twitter4j ..................................................................................................16
4.1.4 NetBeans IDE .........................................................................................16
4.2 Software Modules ............................................................................................16
4.3 Clustering Tweets ............................................................................................19
4.4 Sentiment Analysis ..........................................................................................21
5. Testing and Evaluation…. ........................................................................................25
5.1 iPhone 6 ...........................................................................................................25
5.2 Play Station 4 ...................................................................................................27
5.3 Xbox One .........................................................................................................28
6. Conclusion and Future Work…. ...............................................................................32
Bibliography and References .............................................................................................33
![Page 4: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/4.jpg)
v
LIST OF FIGURES
Figure 2.1. Sentiment Computing and Classification .........................................................3 Figure 2.2. Clustering .........................................................................................................4 Figure 2.3. Twitter Clusters System Design .......................................................................5 Figure 2.4. K-means Algorithm ..........................................................................................6 Figure 2.5. Flow Diagram of the Proposed System ............................................................9 Figure 3.1. The TF * IDF of Term t in Document d is Calculated ...................................13 Figure 3.2. Project Steps ...................................................................................................14 Figure 4.1. Twitter4j Output .............................................................................................16 Figure 4.2. Tweets after Human Inspection ......................................................................17 Figure 4.3. Clustering Interface ........................................................................................17 Figure 4.4. Sentiment Analysis Interface ..........................................................................18 Figure 4.5. Cluster Interface: Enter Cluster Number ........................................................19 Figure 4.6. Cluster Interface: Enter Text Document .........................................................19 Figure 4.7. Cluster 1..........................................................................................................20 Figure 4.8. Cluster 2..........................................................................................................20 Figure 4.9. Sentiment Analysis: Score of the Cluster 1 ....................................................21 Figure 4.10. Sentiment Analysis: Score of the Cluster 2 ..................................................22 Figure 4.11. Sentiment Analysis: Tagging of the Cluster 1 ..............................................23 Figure 4.12. Sentiment Analysis: Tagging of the Cluster 2 ..............................................24 Figure 5.1. U.S. Sales of PS4 and Xbox One ...................................................................30 Figure 5.2. System Output for All Data ............................................................................31
![Page 5: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/5.jpg)
vi
LIST OF TABLES
Table 5.1. iPhone 6 Clusters and Score ............................................................................25 Table 5.2. Evaluation Report of iPhone 6 .........................................................................26 Table 5.3. PS4 Clusters and Score ....................................................................................27 Table 5.4. Evaluation Report of PS4 ................................................................................28 Table 5.5. Xbox One Clusters and Score ..........................................................................29 Table 5.6. Evaluation Report of Xbox One ......................................................................29 Table 5.7. Compare PS4 and Xbox One ...........................................................................30
![Page 6: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/6.jpg)
1
1. INTRODUCTION
Twitter is a microblogging website that has become increasingly popular with the
network community. Users update short messages, also known as Tweets, which are
limited to 140 characters. Users frequently share their personal opinions on many
subjects, discuss current topics and write about life events. This platform is favored by
many users because it is free from political and economic limitations and is easily
available to millions of people. As the amount of users increase, microblogging platforms
are becoming a place to find strong viewpoints and sentiment.
People use twitter to predict a lot of different areas. For example, people have
already predicted the stock market success by using data from Twitter [1]. People use
Twitter to forecast box-office revenues for movies [2]. From these case studies, we can
know that Twitter is really useful for predicting products, services, or markets. It is one
important reason why Twitter is chosen to predict how people think about the popularity
of electronic products. Another reason is because Twitter serves as a worthy platform for
sentiment analysis due to its large user base from a variety of social and cultural regions
worldwide. Twitter contains a vast number of tweets, with millions being added every
day. This can be easily collected through its APIs (Application Program Interface), which
makes it easy to build a great training set.
![Page 7: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/7.jpg)
2
2. BACKGROUND AND RATIONALE
2.1 Sentiment Computing and Classification
Sina Weibo is a Chinese microblogging website, similar to Twitter, which allows
users to post with a 140-character limit, mention or talk to other people using
"@UserName" format, add hashtags with "#HashName#" format. The Weibo is one of
the most popular sites in China, in use by well over 30% of Internet users, with a market
penetration similar to the United States' Twitter [3].
This approach builds a Sentiment Dictionary by using the Word2vec tool, which
is modeled after the Semantic Orientation Pointwise Similarity Distance (SO-SD) model
[4]. Once this step is completed, the Emotional Dictionary is used to get the emotional
trends from messages posted by users on Weibo. In this approach, Weibo contents are
categorized into three groups: positive, negative and neutral. After the grouping has been
completed, the approach uses the Paoding word-segmentation tool to separate Weibo
contents into different Chinese words. Next, 70% of the processed words from Weibo are
used to train the Word2vec tool and this gets an extended Weibo Sentiment Dictionary.
The remaining 30% of words are used to confirm the success of the approach. Last,
Weibo Sentiment Dictionary is used to estimate the Weibo sentiment trends. Figure 2.1
illustrates the steps in this approach.
![Page 8: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/8.jpg)
relate
helps
the m
Word
sentim
used
2.2
are br
than
F
An easy
ed word or c
s to complete
most closely r
This appr
d2vec tool. T
ment trends.
to extend th
Clusteri
One of th
road. Users
just the pro
igure 2.1. S
way to exam
common syn
e this task. F
related word
roach allow
The remaini
The most u
e basic dicti
ing
e issues with
discuss man
oduct review
entiment C
mine the res
nonym for th
For example
ds and their d
ws for 70%
ing 30% of
useful data is
onary.
h Twitter is
ny different t
w. Based on
3
omputing a
sulting depi
he word spe
e, if you ente
distances to
of the colle
collected w
s not enough
that users po
topics in thei
n this know
and Classific
ctions from
ecified by th
er 'Boston', t
'Boston'.
ected words
ords are use
h because the
ost many op
ir posts, so t
wledge, the
cation [3]
this is to f
he user. The
the distance
to be used
ed to estima
ere is so muc
inions and th
these posts f
collection o
find a closel
distance too
tool display
d to train th
ate the Weib
ch data that
hese opinion
focus on mor
of such wild
ly
ol
ys
he
bo
is
ns
re
d-
![Page 9: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/9.jpg)
rangi
use fi
most
such
simil
point
belon
2.2.1
there
tweet
time
and th
preve
ing data wou
first in order
important m
a way that o
ar to each ot
In Figure
t to know b
ng to the sam
Twitter C
Figure 2.3
is a set of st
ts are in Eng
frame, appro
he tweets wi
ent repetition
uld result in
to help disc
machine lear
objects in th
ther than to t
2.2, we can
because each
me cluster if t
Clusters Sys
3 shows the
teps to be fo
glish and pro
oximately 10
ith a minimu
n in news tw
n inaccurate
cover data w
rning proble
he same grou
those in othe
Figure 2.
n easily sepa
h object sho
they are clos
stem
whole desig
llowed. Firs
obable to cre
000 tweets is
um of 60 cha
weeted.
4
data, which
with similarit
em. It is the
up are called
er clusters.
.2. Clusterin
arate data to
ould belong
se, according
gn for the me
st, eight Twit
ate clusters.
s collected. T
aracters that
h is reason c
ties. Clusteri
task of gro
d a cluster [5
ng [6]
o 3 clusters.
to a cluste
g to the dista
ethod. In ord
tter feeds mu
Second, 9 d
Third, the Tw
are similar a
clustering is
ing can be c
ouping a set
5]. The clus
Distance is
er. Two or
ance.
der to apply t
ust be select
days out of a
weets must b
are removed
necessary t
onsidered th
of objects i
sters are mor
an importan
more object
this method,
ted so that al
a two months
be organized
d in order to
to
he
in
re
nt
ts
,
ll
s
d
![Page 10: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/10.jpg)
Fourt
splitt
speci
help i
for se
co-oc
both
“spec
the re
th, spaces m
ing words su
ific twitter st
in clustering
earch, we sh
ccurrence ma
the features
ctral clusterin
everse index
Figure 2
ust be added
uch as “U.S.
top words su
g, if we care
ould avoid s
atrix W” can
i and j. Afte
ng” using W
x to get tweet
2.3. Twitter
d around pun
” or “don’t”
uch as “alert”
about the wo
stemming. Se
n be created.
er that, the w
W to get word
t clusters. [7
5
Clusters Sy
nctuation suc
” is not wante
” and “break
ord clusters
eventh, with
Wij is set to
weight matrix
d clusters. La
].
ystem Desig
ch as , ; : - b
ed. Fifth, ba
king” need to
making sens
h these featur
o n, if there a
x needs to be
ast, in additi
gn [7]
ut not . ’ bec
asic stop wor
o be remove
se and mayb
res, a “word
are n tweets t
e used to per
ion to using
cause
rds and
d. Sixth, to
be use them
d
that contain
rform
the word, usse
![Page 11: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/11.jpg)
taken
adds
focus
choic
2.2.2
sets,
the w
divid
centr
mean
Unfortuna
n for data co
more to the
sed on findi
ce when tryin
K-means
The k-me
and is one o
well-known c
The Figu
des items int
oids of the
ns the midd
ately, the ne
ollection. Fu
amount of
ing a good
ng to save tim
Algorithm
eans clusterin
f the simple
clustering pro
Fig
ure 2.4 show
o k nonemp
clusters of
le point of
egative side
urthermore,
time used. M
center poin
me.
ng algorithm
st and the be
oblem [8].
gure 2.4. K-
ws the four
ty subgroup
the current
the cluster
6
to using th
this method
Most of the
nt. Therefore
m is known t
est known m
-means Algo
steps of the
ps. In the sec
divisions. T
group. The
his method i
d using clus
time, cluste
e, less clust
to be efficien
machine learn
orithm [9]
e k-means a
cond, the co
The centroid
e third step
is that too m
stering too
ring time co
tering is usu
nt in clusteri
ning algorith
algorithm. T
ompute seed
d is at the c
is when ea
much time
much, whic
onsumption
ually a bette
ing large dat
hms that solv
The first ste
points to th
center, whic
ach object
is
ch
is
er
ta
ve
ep
he
ch
is
![Page 12: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/12.jpg)
7
assigned to the cluster with the nearest seed point. The fourth and last step goes back to
Step 2 and stops when the assignment does not change [9].
The positive side for k-means is the simplest. All you need to do is choose k and
run it a number of times, especially if the clusters are circular shape. Most of people do
not need a complex cluster algorithm.
K-means process has some weaknesses. First, there is a problem with comparing
the quality of the clusters. Second, because there is a fixed number of cluster, it can be
hard to find out what K should be. Third, k-means only work well with circular cluster
shape. Fourth, when the original partitions are not the same, this may cause final clusters
that are also different. It is useful to run the program again by like and unlike K values, to
compare the outcomes gained [9].
2.3 Sentiment Analysis
Sentiment analysis, also called opinion mining, is the field of study that analyzes
people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards
things such as products, services, organizations, individuals, issues, events, topics, and
their attributes. It represents a large problem space. There are also many names and
slightly different tasks, e.g., sentiment analysis, opinion mining, opinion extraction,
sentiment mining, subjectivity analysis, affect analysis, emotion analysis, review mining,
etc. However, they are now all under the authority of sentiment analysis or opinion
mining [10].
We can know how users feel about a product or service and this can help,
especially in business decisions for corporates with sentiment analysis. Also, political
![Page 13: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/13.jpg)
8
parties and social organizations can collect feedback about their programs. Furthermore,
entertainers such as actors, musicians, and artists can connect with their fans and find the
viewpoints on their work. Mostly, this can act as an automatic surveying method, which
does not require manual entry [11].
2.4 Feature-based Sentiment Analysis
The document of people’s opinions is from the paragraphs, the paragraph is from
the sentences, the sentence is from the words. Therefore, the first feature that feature-
based sentiment analysis models discover is the word in a sentence. It determines if the
opinions are positive, negative or neutral. The opinions can be about a topic, event,
product, service, etc. Sentiment analysis separates document into paragraphs and then
separate paragraph into sentences. After that, sentences are separated into words. In the
next step, sentiment analysis forces feature from word-level, sentence-level, paragraph-
level, to document-level. Once this is complete, calculate the positive score, negative
score, or neutral score from each level and add the final score together. Finally, change
the opinion to number, and analyze the number to understand how people’s real thinking
is.
This feature-based sentiment analysis system uses Stanford tool and
SentiWordNet [12]. SentiWordNet is a resource for supporting opinion mining
applications. SentiWordNet relates to the positive, negative, and neutral opinions to tag
all the WordNet synsets [13]. It has two steps: preparing data and building processing
components [14]. First, this system uses SentiWordNet to create positive and negative
words lists, and lists with words that can reverse, increase or decrease the opinion.
![Page 14: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/14.jpg)
9
Second, this system uses the processing components and enters text files from Twitter to
find the product and the comments. This system uses an open source tool called Stanford
for stemming and tagging the parts-of-speech.
Figure 2.5. Flow Diagram of the Proposed System [14]
First, the Stemming part is when all data from the text document is collected.
Second, the Stanford POS Tagger is used to do the POS Tagging [15]. Third, the
SentiWordNet 3.0 is used to make the positive and negative word lists. Fourth, the
Enriching tag is used as the special tags for reversed word lists. For example, negation
Neg is positive. The increase and decrease words are tagged to increase the opinion
and/or decrease the opinion. Fifth, sentence-level opinion mining sets all opinion values
to begin at 0. The lpos, pos, vpo are +1, +2, +3. The lneg, neg, vneg are -1, -2, -3. For
example, good and easy to use are +2. Bad and hard to use are -2. Next, calculate the
![Page 15: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/15.jpg)
10
score by using sentence-level opinion combination methods. Last, add all totals of
sentence-level opinion together. There has a table to verify if the opinion text is positive
or negative. For instance, if the final score is more than 60%, this shows a strong positive.
However, if the final total is less than -60%, this shows a strong negative. For example, I
want to analyze a sentence: this phone is good and easy to use, and the sentence becomes
after process:
This/[POS_DT|Stm_this] phone/[POS_NN|Stm_phone] is/[POS_VBZ|Stm_be]
good/[POS_JJ|Stm_good|Opn_positive|pos] and/ [POS_CC|Stm_and]
easy/[POS_JJ|Stm_easy|pos] to/[POS_TO|Stm_to|pos] use/[POS_VB|Stm_use|pos].
The POS tag shows this word is adjective, noun, or verb. The Stm tag is for
separating the words from sentence. If the word is useful, pos is tagged in the end. In this
sentence, pos = +4 because +2 for good and +2 for easy to use, neg = 0,
result=(4*100)/(4+0+1)=80%. The score of the sentence is 80% after calculating the
score of positive and negative words.
The negative side to this method is that it is not able to manage wide ranging
opinions from users. It is necessary for the data need to do pro-process in the beginning
because this allows the sentiment analysis system to make better judgments about useful
opinions and if they are positive or negative.
![Page 16: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/16.jpg)
11
3. CLUSTERING AND SENTIMENT ANALYSIS
3.1 Problem Report
Feature-based sentiment analysis system already upgrades word-level and
sentence-level to text-level. It is acceptable to use this in the product review on Amazon
because people focus on what their experience after using the products when they post
product review. When we look at Twitter, people do not only talk about the experience of
using product, but also many different things. The tweets from Twitter are very noisy and
more spread out than the product review from Amazon. Therefore, we need to use
clustering to separate all tweets into clusters to check how people think about some
features of products. It can make the approach more accurate and better fit to Twitter.
3.2 Project Objective
This project objective is about receiving high accuracy sentiment analysis. First,
Twitter API is processed to collect the content that includes popular electronic product
name from Twitter and save to text document. In this paper, iPhone 6, Play Station 4, and
Xbox One are chosen to be study cases. Second, the clustering is used to pre-process the
text document and separate all tweets to some clusters. Each clusters has similar
sentences or words. Third, each cluster is chosen to process in the feature-based
sentiment analysis system to see the score for each cluster. Fourth, total tweets also
process in the feature-based sentiment analysis system.
![Page 17: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/17.jpg)
12
3.3 The Steps of the Project
Sentiment analysis has become a popular method to use for opinion mining on
social networks. Generally, this method is good enough to do the job. However, the
opinions on Twitter are complicated and as a result, the use of clustering is needed to
organized tweets into clusters that have similarities. Twitter API, twitter4j, is used to get
the tweets and save to text document [16]. K-means is chosen to do clustering to see what
people’s thinking is in different features of the products. Each cluster has a high
relationship and similar sentences are entered into feature-based sentiment analysis
system. In addition, total tweets also process in feature-based sentiment analysis system.
Before being able to run k-means on a series of text documents, the documents must be
signified as equally similar directions. To accomplish this, the documents can process the
TF-IDF score.
3.3.1 TF-IDF
The TF-IDF is short for term frequency-inverse document frequency. The main
idea of TF-IDF is this: If a word or phrase in an article appearing in the high frequency
TF, and rarely appears in other articles, you think this word or phrase has a good ability
to distinguish between categories [17].
TF: the term frequency means how many times a term occurs in a document. We
can calculate the term frequency for a word as the ratio of number of times the word
occurs in the document to the total number of words in the document.
IDF: the inverse document frequency is a way to measure if the term is common
or not for all documents. It is taken by dividing the total number of documents by the
![Page 18: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/18.jpg)
numb
[18].
highe
calcu
multi
docum
3.3.2
determ
the d
steps
3.3.3
cluste
system
tweet
steps
Secon
incre
is stro
ber of docum
The Figu
est when t o
ulation is lo
iple docume
ments [19].
Figure
K-means
K-means
mined. Seco
istance of ea
couple time
Sentimen
Figure 3.
er has simil
m to find ou
ts also proce
. First, POS
nd, SentiWo
asing or dec
onger than “
ments contai
ure 3.1 show
occurs many
wer when t
nts. Third, t
e 3.1. The T
Algorithm
algorithm h
ond, choose
ach object to
es until no ch
nt Analysis S
3 demonstra
lar sentence
ut how peopl
ess in the se
tagging is th
ordNet is use
creasing the
“good”. Four
ning the term
ws how to c
y times with
the term oc
the calculati
TF * IDF of
has some ste
k objects ran
o their closes
hanges on cl
System
ates the pro
s. Then eac
le think abou
ntiment ana
he method o
ed for word-l
score of the
rth, sentence
13
m, and then
calculate TF
hin a small
ccurs fewer
on is lowest
Term t in D
eps. First, ch
ndomly as th
st cluster. W
uster centers
oject steps.
ch cluster is
ut some feat
lysis system
f deciding if
level opinion
e positive or
e-level opini
taking the l
F and IDF.
number of
times in a
t when the t
Document d
hoose k, the
he initial clu
We need to re
s.
Some cluste
s putted into
tures of the p
m. Sentiment
f the word is
n tagging. T
r negative. F
ion mining c
logarithm of
First, the c
documents.
document,
term occurs
d is Calculat
number of c
uster center.
epeat the firs
ers are gott
o the sentim
product. In a
t analysis sy
s verb, adjec
Third, enrichi
For example,
calculates all
f that quotien
calculation
. Second, th
or occurs i
in almost a
ted
clusters to b
Third, assig
st and secon
ten, and eac
ment analys
addition, tota
ystem has fiv
ctive, or noun
ing tags is fo
, “very good
l positive an
nt
is
he
in
all
be
gn
nd
ch
is
al
ve
n.
or
d”
nd
![Page 19: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/19.jpg)
negat
sente
docum
tive scores
nce-level op
ments.
in the sent
pinion minin
tence. Fifth,
ng, but at th
Figure 3
14
, document-
he documen
.2. Project S
-level opinio
nt-level it ca
Steps
on mining
alculates the
is similar t
e score of a
to
all
![Page 20: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/20.jpg)
15
4. IMPLEMENTATION AND RESULTS
4.1 Environment
The suggested system is executed in C# and Java. For this, Java Swing and
Twitter4j parser are the main programs utilized. Microsoft Visual C# and Netbeans IDE,
are the programming environments used because they are more suitable for
programming.
4.1.1 Microsoft Visual C#
Microsoft Visual C# is Microsoft's implementation of the C# specification, and is
part of the Microsoft Visual Studio product suite [20]. C# was created by Microsoft and
is a multi-paradigm programming language covering many different programming
subjects, including strong typing, imperative, declarative, functional, generic, object-
oriented, and component-oriented programming disciplines. [21]
4.1.2 Java Swing
Java Swing, which was released by Oracle, is a Graphical User Interface (GUI)
toolkit [22]. This program lets programmers make GUI for java applications. It is stated
that the parts are not heavy because of a high flexibility. Swing offers many a lot of
innovative components including lists, tables, scroll panes and tabbed panels.
Furthermore, there are more familiar components offered, which include labels,
checkboxes and buttons. In addition, some of its components have drag and drop features
to allow for further ease of use.
![Page 21: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/21.jpg)
4.1.3
can e
4.1.4
with
[23].
deskt
on W
4.2
the te
of thi
use a
The n
inclu
tweet
Twitter4j
Twitter4J
easily integra
NetBeans
NetBeans
Java, but it
Additionall
top applicati
Windows, OS
Softwar
For this m
ext, so the us
is process.
Unfortuna
a combinatio
noisy tweets
de #HashNa
ts after huma
j
J is an unoff
ate your Java
s IDE
s is an integ
is also used
ly, NetBean
ions but othe
S X, Linux, S
re Module
module, Twi
ser name, lo
ately, there
on of compu
s are checke
ame, @User
an inspection
ficial Java li
a application
grated devel
d with other
ns is an app
ers as well. T
Solaris and o
s
itter4j is use
cation and ti
Figure 4.1.
are a lot of
uter and hum
ed manually
rName and w
n.
16
ibrary for th
n with the Tw
opment env
r languages,
plication pla
The NetBean
other platform
ed to collect
ime are all i
. Twitter4j O
noisy tweet
man inspecti
y to identify
website link
he Twitter A
witter servic
vironment (I
, such as PH
atform frame
ns IDE is wr
ms supportin
t the tweets.
ignored. Fig
Output
ts from Twit
ion to sort t
y and elimin
k are deleted
API. With T
ce.
IDE) that is
HP, C/C++,
ework for n
ritten in Jav
ng a compati
. The impor
gure 4.1 show
tter, so it is
through the
nate outliers
d. Figure 4.2
Twitter4J, yo
used mainl
and HTML
not only Jav
a and can ru
ible JVM.
rtant aspect
ws the result
beneficial t
noisy tweet
s. The tweet
2 displays th
ou
ly
L5
va
un
is
ts
to
ts.
ts
he
![Page 22: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/22.jpg)
begin
enter
new
click
the cl
text d
the da
The inter
nning, the n
ed into inter
document. T
the Start bu
lustering res
Another w
document. T
ata, click sta
Figure
rface for clu
umber of cl
rface. First
The next ste
utton after al
ults appear o
way to enter
Then click ad
art button.
F
e 4.2. Tweets
ustering use
lusters must
way is ente
ep is to click
ll text has b
on the right
tweets is fro
dd button to
Figure 4.3. C
17
s after Hum
es C# and
t be chosen.
ring the tex
k the Add b
een entered.
side.
om text docu
o enter the d
Clustering I
man Inspecti
can be seen
. Then the t
xt in each te
button once
. If these ste
ument. Click
data from te
Interface
ion
n in Figure
text has two
ext box field
the text is e
eps are follo
k file button
ext documen
e 4.3. At th
o ways to b
d represents
entered. The
owed the the
to choose th
nt. After ente
he
be
a
en
en
he
er
![Page 23: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/23.jpg)
this,
slider
the en
Figure 4.4
first enter th
r bar display
ntire docume
4 illustrates
he text in th
ys the senten
ent-level opi
Figur
the User-In
he text space
nce-level op
inion mining
re 4.4. Senti
18
nterface mod
e above the
pinion minin
g output.
iment Analy
dule and inp
slider bar. T
ng output an
ysis Interfac
put handler.
The text spa
nd the slider
ce
To complet
ace under th
r bar display
te
he
ys
![Page 24: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/24.jpg)
4.3
add b
Clusteri
Figure 4.5
Figure 4.6
button to add
ing Tweet
5 illustrates e
Figure 4.5.
6 displays cl
d the tweets f
Figure 4.6.
s
enter 3 to th
Cluster Int
lick file butto
from the tex
Cluster Int
19
e number of
erface: Ent
on to choose
xt document t
terface: Ent
f cluster.
er Cluster N
e the text doc
to the cluste
ter Text Do
Number
cument. Afte
ering.
cument
er that, click
k
![Page 25: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/25.jpg)
comp
comp
Figure 4.7
pleted.
Figure 4.8
pleted.
7 displays th
8 shows the
he cluster 1 o
Figure
cluster 2 onc
Figure
20
once all twee
e 4.7. Cluste
ce all data is
e 4.8. Cluste
ets are entere
er 1
s entered and
er 2
ed and the cl
d the clusteri
lustering is
ing is
![Page 26: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/26.jpg)
4.4
into t
100%
score
score
Sentime
Figure 4.9
the sentimen
% means the
e of each sen
es together an
F
ent Analys
9 shows how
nt analysis to
most positiv
ntence shows
nd outputs th
Figure 4.9. S
sis
w the cluster
o receive a sc
ve opinion. -
s in the end o
he final scor
Sentiment A
21
1 is selected
core. The ran
100% mean
of the senten
re.
Analysis: Sc
d and how th
nge of score
ns the most n
nce. After tha
core of the C
hat tweets ar
is from 100
negative opin
at, the system
Cluster 1
re inputted
0% to -100%
nion. The
m adds all
%.
![Page 27: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/27.jpg)
input
Figure 4.1
tted into the
F
10 illustrates
sentiment an
Figure 4.10.
s how the clu
nalysis to rec
Sentiment A
22
uster 2 is sel
ceive a score
Analysis: Sc
lected and ho
e
core of the C
ow that twee
Cluster 2
ets are
![Page 28: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/28.jpg)
enrich
is “Ju
Figure 4.
hing tags. F
ust/[RB] held
Fig
11 illustrate
or example,
d/[VBN] an/
gure 4.11. S
es stemming
the POS tag
/[DT] iPhone
Sentiment A
23
g, POS taggi
gging of the
e/[NNP] 6/[C
Analysis: Tag
ing, word-le
e sentence, “
CD] +/[CC]
gging of the
evel opinion
“Just held an
”.
e Cluster 1
n tagging an
n iPhone6 +”
nd
”,
![Page 29: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/29.jpg)
enrich
Figure 4.
hing tags.
Fig
.12 shows
gure 4.12. S
stemming, P
Sentiment A
24
POS taggin
Analysis: Tag
ng, word-lev
gging of the
vel opinion
e Cluster 2
tagging annd
![Page 30: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/30.jpg)
25
5. TESTING AND EVALUATION
iPhone 6, Play Station 4, and Xbox One were chosen as keywords to search on
Twitter. Tweets with these keywords were collected and saved to the text document.
Once the tweets are collected, the clustering is done followed by processing the sentiment
analysis system. This is because the tweets relative to different features of products. At
the time of clustering, the k-means algorithm is used to deal with the tweets, and k is set
to 3.
5.1 iPhone 6
In the iPhone 6, after human inspection, the data set has a total of 88 tweets. Once
the clustering is processed, 3 clusters are taken. Cluster 1 has 31 tweets, cluster 2 has 37
tweets, and cluster 3 has 20 tweets. The clusters are added into the sentiment analysis
system in order to compute the score. Table 5.1 shows the result of this computation.
Table 5.1. iPhone 6 Clusters and Score
Cluster Tweets Score (%) Feature 1 31 77 screen 2 37 63 battery 3 20 71 price
Total 88 71
Cluster 1 contains 80.6% tweets relative to screen size (25 out of 31 tweets).
Cluster 2 has 86.5% tweets relative to battery life (32 out of 37 tweets). Cluster 3
includes 85% tweets that mentioned price (17 out of 20 tweets). People are more satisfied
![Page 31: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/31.jpg)
26
with the iPhone 6 screen size compared with the battery life by looking at the scores. The
score of the iPhone 6 screen size is 77%, and the score of the battery life is only 63%.
A few people are asked to manually judge if this content is positive or negative.
After that, classifier evaluation metrics and confusion matrix are used to check the score
from this project and the judgment from the people who review the content [24].
Table 5.2 shows the evaluation report of iPhone 6. True positives (TP) means
human’s check and system output are both positive. True negative (FP) means human’s
check and system output are both negative. TP and FP mean the system output has
correct determine. False negative (FN) means human’s check is positive, but system
output is negative. False positive (FP) means human’s check is negative, but system
output is positive. FN and FP means the system output has wrong determine. ~FN and
~FP means the tweets are not about positive and negative.
Table 5.2. Evaluation Report of iPhone 6
Accuracy of this system developed means percentage of test set tuples that are correctly
classified. It is calculated by using the following formula.
Opinion Extraction Accuracy = (TP+TN)/(TP+TN+FP+FN)
= (42 + 12) / (42 + 12 + 3 + 2)
= 91.5 %
Manual(human)/System OutputPositive
(Score > 0%)
Neutral
(Score = 0%)
Negative
(Score < 0%)
Positive 42 (TP) 15 (~FN) 2 (FN)
Negative 3 (FP) 14 (~FP) 12 (TN)
![Page 32: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/32.jpg)
27
Precision means what % of tuples that the classifier labeled as positive is actually
positive. It is calculated by using the following formulas.
Precision = TP/(TP+FP)
= 42 / (42 + 3)
= 93.3 %
Recall means what % of positive tuples did the classifier labeled as positive. It is
calculated by using the following formulas.
Recall = TP/(TP+FN)
= 42 / (42 + 2)
= 95.5 %
5.2 Play Station 4
In Play Station 4 (PS4), data set has total of 92 tweets after human inspection.
After processing clustering, 3 clusters are retrieved. Cluster 1 has 34 tweets, cluster 2 has
21 tweets, and cluster 3 has 37 tweets. Each cluster is entered into the sentiment analysis
system to calculate the score. Table 5.3 shows the result.
Table 5.3. PS4 Clusters and Score
Cluster 1 contains 82.4% tweets relative to PS4 controller (28 out of 34 tweets).
Cluster 2 has 81% tweets are about PS4 game (17 out of 21 tweets). Cluster 3 includes
Cluster Tweets Score (%) Feature 1 34 51 controller 2 21 67 game 3 37 72 price
Total 92 64
![Page 33: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/33.jpg)
28
78.4% tweets mentioned price (29 out of 37 tweets). People are not satisfied with the PS4
controller compared with the price based on the scores. The score of the PS4 controller is
just 51%, whereas the score of the price is 72%.
Table 5.4 shows the evaluation report of PS4.
Table 5.4. Evaluation Report of PS4
Opinion Extraction Accuracy = (30 + 17) / (30 + 17 + 9 + 3)
= 79.7 %
Precision = 30 / (30 + 9)
= 76.9 %
Recall = 30 / (30 + 3)
= 90.9 %
5.3 Xbox One
For Xbox One, data set has total of 109 tweets after human inspection. After
processing clustering, 3 clusters are retrieved. Cluster 1 has 38 tweets, cluster 2 has 23
tweets, and cluster 3 has 48 tweets. Each cluster is entered into the sentiment analysis
system to calculate the score. Table 5.5 shows the result.
Manual(human)/System OutputPositive
(Score > 0%)
Neutral
(Score = 0%)
Negative
(Score < 0%)
Positive 30 (TP) 22 (~FN) 3 (FN)
Negative 9 (FP) 11 (~FP) 17 (TN)
![Page 34: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/34.jpg)
29
Table 5.5. Xbox One Clusters and Score
Cluster Tweets Score(%) Feature 1 38 60 game 2 23 -59 price 3 48 55 controller
Total 109 53
Cluster 1 contains 86.8% tweets relative to Xbox One game (33 out of 38 tweets).
Cluster 2 has 78.3% tweets are about price (18 out of 23 tweets). Cluster 3 includes
79.2% tweets mentioned Xbox One controller (38 out of 48 tweets). People are not
satisfied with the price of the Xbox and think it is too expensive. The score of the price is
negative (-59%).
Table 5.6 shows the evaluation report of Xbox One.
Table 5.6. Evaluation Report of Xbox One
Opinion Extraction Accuracy = (27 + 29) / (27 + 29 + 4 + 10)
= 80 %
Precision = 27 / (27 + 4)
= 87.1 %
Recall = 27 / (27 + 10)
= 73 %
Manual(human)/System OutputPositive
(Score > 0%)
Neutral
(Score = 0%)
Negative
(Score < 0%)
Positive 27 (TP) 22 (~FN) 10 (FN)
Negative 4 (FP) 17 (~FP) 29 (TN)
![Page 35: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/35.jpg)
30
Table 5.7. Compare PS4 and Xbox One
PS4 score(%)
Xbox one score(%)
game 67 60
price 72 ‐59
controller 51 55
total 64 53
Table 5.7 shows a comparison of the PS4 and Xbox One. In the game, people are
more satisfied with the PS4 game than the Xbox One game. In the price, most people
think the price of the PS4 is fine (72%), but they think the price of the Xbox One is too
expensive (-59%). In the controller, people like the Xbox One controller a little more.
Actually, the PS4 has better sales than the Xbox One in USA. Figure 5.1 shows
the cumulative U.S. sales since the release of Sony’s PS4 and Microsoft’s Xbox One.
Figure 5.1. U.S. Sales of PS4 and Xbox One [25]
Figure 5.2 shows the system output for all data.
![Page 36: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/36.jpg)
Figuure 5.2. Syst
31
em Output for All Dataa
![Page 37: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/37.jpg)
32
6. CONCLUSION AND FUTURE WORK
This project can find how people think about specific popular electronic products.
This project changes people’s words to numbers and then these numbers can be analyzed
to understand the different people’s thinking. The problem is making sure that the change
is correct. Therefore, I process the clustering and feature-based sentiment analysis system
to help with the accuracy of the change.
The clustering and feature-based sentiment analysis system processes the text
document from Twitter. Because the opinions on Twitter are too complex and dispersed,
clustering needs to be used to separate data into clusters. In this paper, Twitter API,
twitter4j, is used to get the data and save to text document. Then k-means algorithm is
used to do clustering. After that, feature-based sentiment analysis system is used to
process the data. The sentiment analysis system is done in seven main steps: stemming,
POS tagging, word-level opinion tagging, enriching tags, sentence-level opinion mining,
document-level opinion mining, and time-level opinion mining. the Stanford tool is used
to process the stemming and POS tagging. Then SentiWordNet is used to handle the
enriching tags and word-level tags.
Apart from the work done towards this system, future work mainly comprises of the
following objectives.
To handle the noisy data without human inspection.
To improve the speed with a large number of sentences and handle huge data.
To run this project on Cloud computing with Hadoop and Mahout.
Run sentiment analysis in Chinese on Weibo.
![Page 38: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/38.jpg)
33
BIBLIOGRAPHY AND REFERENCES
[1] Liu, B. Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock
market. Journal of Computational Science, 2(1), 1-8.
[2] Asur, S., & Huberman, B. A. (2010, August). Predicting the future with social media.
In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010
IEEE/WIC/ACM International Conference on (Vol. 1, pp. 492-499). IEEE.\
[3] Weibo. http://en.wikipedia.org/wiki/Sina_Weibo
[4] Xue, B., Fu, C., & Shaobin, Z. (2014, June). A Study on Sentiment Computing and
Classification of Sina Weibo with Word2vec. In Big Data (BigData Congress),
2014 IEEE International Congress on (pp. 358-363). IEEE.
[5] Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc..
[6] Text Documents Clustering using K-Means Algorithm.
http://www.codeproject.com/Articles/439890/Text-Documents-Clustering-using-
K-Means-Algorithm
[7] Tushar Khot,Clustering Twitter Feeds using Word Co-occurrence CS769 Project
Report. http://pages.cs.wisc.edu/~tushar/projects/cs769.pdf
[8] Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering
algorithm. Applied statistics, 100-108.
[9] Han, J., & Kamber, M. (2006). Data Mining, Southeast Asia Edition: Concepts and
Techniques. Morgan kaufmann.
[10] Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on
Human Language Technologies, 5(1), 1-167.
![Page 39: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/39.jpg)
34
[11] Bora, N. N. (2011). Feature Based Sentiment Analysis on Twitter (Doctoral
dissertation, Indian Institute of Technology Guwahati).
[12] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
(2008). “Introduction to Information Retrieval,” Cambridge University Press.
http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-
1.html
[13] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani (2010). “SentiWordNet
3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.”
[14] Srividya Venumbaka (Spring 2013). “An Enhanced Feature-Based Sentiment
Analysis System.” Graduate Project Report. Texas A&M University Corpus
Christi.
[15] The Stanford Natural Language Processing Group. (n.d.) “Stanford log-linear Part-
of-Speech Tagger.” http://nlp.stanford.edu/software/tagger.shtml
[16] Twitter4J. (2013). http://twitter4j.org/en/index.html
[17] Rajaraman, A., & Ullman, J. D. (2011). Mining of massive datasets. Cambridge
University Press.
[18] TF-IDF means. http://www.tfidf.com/
[19] The Stanford Natural Language Processing Group. TD-IDF weighting.
http://nlp.stanford.edu/IR-book/html/htmledition/tf-idf-weighting-1.html
[20] Microsoft Visual C#. http://en.wikipedia.org/wiki/Microsoft_Visual_C_Sharp
[21] C#. http://en.wikipedia.org/wiki/C_Sharp_(programming_language)
[22] Java Swing. http://en.wikibooks.org/wiki/Java_Swings
[23] NetBeans IDE. http://en.wikipedia.org/wiki/NetBeans
![Page 40: MingHsuanWu Graduate Reportsci.tamucc.edu/~cams/projects/431.pdfTwitter, an online social network, allows users to upload short text messages, also known as tweets, with up to 140](https://reader033.fdocuments.us/reader033/viewer/2022050207/5f5a07f48279f70fef48e52f/html5/thumbnails/40.jpg)
35
[24] Kohavi and Provost. (1998). ConfusionMatrix.
http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/confusion_matrix.
html
[25] Wall Street Journal. http://iknow.stpi.narl.org.tw/post/Read.aspx?PostID=9775