Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine...

43
Text Mining in R Sara Weston and Debbie Yee 03/24/2017

Transcript of Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine...

Page 1: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Text Mining in R

Sara Weston and Debbie Yee

03/24/2017

Page 2: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

What is Text Mining?

I Text mining is the process by which textual data is brokendown into pieces for analysis.

I Pieces can be words or phrases.I Pieces can be analyzed as they are or as the sentiment they

represent.I Text mining can be used to test hypotheses or gain descriptive

insight into data.

Page 3: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Necessary packages

library(tm) #for reading in text documentslibrary(tidytext) # for cleaning text and sentimentslibrary(topicmodels) # for topic analysislibrary(janeaustenr) # for free datalibrary(dplyr) # for data manipulationlibrary(tidyr)library(stringr) # for manipulating string/text datalibrary(ggplot2) # for pretty graphslibrary(wordcloud) #duh

Page 4: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Load in fun data

We’re using data from the janeaustenr package, which includes allsix of Jane Austen’s novels.

I Require some preprocessing.I We restructure the data so that each chapter is it’s own“observation”, with data on which book and which chapter itcomes from.

I Code is included in the Rmd file, but not shown here.

Page 5: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

## # A tibble: 6 × 3## book chapter## <fctr> <int>## 1 Sense & Sensibility 0## 2 Sense & Sensibility 1## 3 Sense & Sensibility 2## 4 Sense & Sensibility 3## 5 Sense & Sensibility 4## 6 Sense & Sensibility 5## # ... with 1 more variables: text <chr>

Page 6: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Types of information

We can analyze text data in a lot of ways. Today we will talk aboutthree kinds of ways to measure or ‘code’ text data:

I Word frequenicesI Word sentimentI Topics, based on word clusters

Each of these ways of measuring data require that data berestructured in different ways.

I Make use of the dplyr and tidyr packages

Page 7: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Types of data structures

I Long formI Each word gets its own row in a data frame.I Sometimes each word in each document (person).I Columns contain information about the word (and document).

I Short formI Each document (person) gets its own row.I Columns contain information about the documents, plus there is

one column for every unique word in the corpus.

Page 8: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Long formIt’s easy to get the data into a form where each word gets its ownrow.

long_austen <- original_books %>%unnest_tokens(output = word,

input = text,token = "words")

head(long_austen)

## # A tibble: 6 × 3## book chapter word## <fctr> <int> <chr>## 1 Sense & Sensibility 0 sense## 2 Sense & Sensibility 0 and## 3 Sense & Sensibility 0 sensibility## 4 Sense & Sensibility 0 by## 5 Sense & Sensibility 0 jane## 6 Sense & Sensibility 0 austen

Page 9: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

A note about stop words.

Stop words are words in the English language that connect otherwords, but often have little or no content.

Examples: - Conjunctions - Articles - Prepositions

You will likely want to remove these words before you proceed.

Page 10: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Remove stop wordshead(stop_words)

## # A tibble: 6 × 2## word lexicon## <chr> <chr>## 1 a SMART## 2 a's SMART## 3 able SMART## 4 about SMART## 5 above SMART## 6 according SMART

long_austen <- long_austen %>%anti_join(stop_words)

## Joining, by = "word"

Page 11: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Short form

And you can use the long-form version to create the short form.

short_austen <- long_austen %>%mutate(bookchap = paste(book, chapter, sep="_")) %>%select(-c(book, chapter)) %>%group_by(bookchap) %>%count(word) %>%cast_dtm(document = bookchap, term = word, value = n)

Page 12: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Back to information

We can use these two data structures to calculate frequenices,match sentiments and estimate topics.

First, frequencies.

Page 13: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Frequienceslong_austen %>%

count(word)

## # A tibble: 13,816 × 2## word n## <chr> <int>## 1 _a_ 2## 2 _accepted_ 1## 3 _accident_ 1## 4 _adair_ 1## 5 _addition_ 1## 6 _advantages_ 1## 7 _affect_ 1## 8 _against_ 1## 9 _agreeable_ 1## 10 _air_ 1## # ... with 13,806 more rows

Page 14: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

What can you do with frequencies?

I Summarize your dataI Estimate relationships between single words and other variables

Page 15: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Wordcloud

austen_freq <- long_austen %>%count(word)

wordcloud(words = austen_freq$word, freq = austen_freq$n, #necessary argumentsmin.freq = 200, # fun argumentsrandom.order = FALSE,random.color = F,colors=brewer.pal(6, "Dark2"))

Page 16: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Wordcloud

misstime

fann

y

dearladysir

day

emma

sisterhouse

elizabeth

elinorhopefriend

familymindfather

home

jane

mothercatherine

feelings

happy

mom

ent

half

love

till

crawford

marianne heartfound

heard

anne

pleasure

mor

ning

letter

poor

harrietsubject

woman

brother

worldleft

cried

looked

feel

speak

evening

hear

weston

repl

ied

manner

darcy

happiness

edmundparty

knightley

people

life

captain

told

opinion

spiritssuppose

acquaintanceimmediately

friends

elton

illshort

pass

ed

leave

hourid

ea

deal

eyes

word

attention

bennet

thomas

colonel

comfort

coming

sort

visit

return

brought

john

doub

t

obliged

rest

answerwoodhouse

character

affection

minutes

perf

ectly

walk

aunt

glad

account

pers

on

bingley

elliot

means set

feeling

business

situation

hand

henr

y

days

talked wished

agreeable

pretty

added

town

bertram

daughter

wife

received

talk

door

ladies

wal

ked

carriage

conversation

object

dashwood

unde

rsta

nd

manners

body

mar

y

reason

spoke

continued

called

ready

edward head

london

returned

voice

bath

dare

determined

fine

sist

ers

impossible

marriage

regard

settled

married

read

fairfax

girl

care

frank

power

children

nature

words

change

sense

country

expected

kindness

Page 17: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Estimate relationshipsshort_freq <- long_austen %>%

group_by(book, chapter) %>%count(word) %>%spread(key = word, value = n)

short_freq

## Source: local data frame [275 x 13,818]## Groups: book, chapter [275]#### book chapter `_a_` `_accepted_` `_accident_` `_adair_`## * <fctr> <int> <int> <int> <int> <int>## 1 Sense & Sensibility 0 NA NA NA NA## 2 Sense & Sensibility 1 NA NA NA NA## 3 Sense & Sensibility 2 NA NA NA NA## 4 Sense & Sensibility 3 NA NA NA NA## 5 Sense & Sensibility 4 NA NA NA NA## 6 Sense & Sensibility 5 NA NA NA NA## 7 Sense & Sensibility 6 NA NA NA NA## 8 Sense & Sensibility 7 NA NA NA NA## 9 Sense & Sensibility 8 NA NA NA NA## 10 Sense & Sensibility 9 NA NA NA NA## # ... with 265 more rows, and 13812 more variables: `_addition_` <int>,## # `_advantages_` <int>, `_affect_` <int>, `_against_` <int>,## # `_agreeable_` <int>, `_air_` <int>, `_all_` <int>, `_allow_` <int>,## # `_almost_` <int>, `_alone_` <int>, `_am_` <int>, `_amor_` <int>,## # `_amore_` <int>, `_and_` <int>, `_another` <int>, `_answer_` <int>,## # `_any_` <int>, `_anybody's_` <int>, `_appear_` <int>,## # `_appearance_` <int>, `_appearing_` <int>, `_appropriation_` <int>,## # `_are_` <int>, `_as` <int>, `_as_` <int>, `_assistance_` <int>,## # `_at_` <int>, `_be_` <int>, `_be'd_` <int>, `_been_` <int>,## # `_before_` <int>, `_begin_` <int>, `_being` <int>, `_being_` <int>,## # `_believe_` <int>, `_blunder_` <int>, `_boiled_` <int>, `_bon_` <int>,## # `_both_` <int>, `_boulanger_` <int>, `_bride_` <int>, `_broke_` <int>,## # `_brother_` <int>, `_brotherly_` <int>, `_can_` <int>,## # `_cannot_` <int>, `_caro_` <int>, `_cause_` <int>, `_chaperon_` <int>,## # `_choice_` <int>, `_coming` <int>, `_coming_` <int>,## # `_compassion_` <int>, `_compliments_` <int>, `_con_` <int>,## # `_conditionally_` <int>, `_conduct_` <int>, `_corps_` <int>,## # `_could_` <int>, `_count_` <int>, `_court_` <int>,## # `_courtship_` <int>, `_daughters_` <int>, `_deepest_` <int>,## # `_delightful_` <int>, `_did_` <int>, `_dined_` <int>,## # `_dislike_` <int>, `_dissolved_` <int>, `_dixon_` <int>,## # `_dixons_` <int>, `_do_` <int>, `_does_` <int>, `_double_` <int>,## # `_doubts_` <int>, `_du_` <int>, `_each_` <int>, `_early_` <int>,## # `_eclaircissement_` <int>, `_edmund_` <int>, `_eighteen_` <int>,## # `_eldest_` <int>, `_elegant_` <int>, `_elton_` <int>,## # `_endeavour_` <int>, `_engaged_` <int>, `_engagement_` <int>,## # `_ensemble_` <int>, `_esprit_` <int>, `_etourderie_` <int>,## # `_evening_` <int>, `_every_` <int>, `_exigeant_` <int>,## # `_expression_` <int>, `_family_` <int>, `_father's_` <int>,## # `_feel_` <int>, `_feelings_` <int>, `_felt_` <int>, `_few_` <int>, ...

Page 18: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Estimate relationships

ggplot(short_freq, aes(x = chapter,y = family,fill = book)) +

geom_bar(stat = "identity") +geom_smooth(se=F)+guides(fill=F) +facet_wrap(~book, scales = "free_x") +theme_bw()

Page 19: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Estimate relationships

Emma Northanger Abbey Persuasion

Sense & Sensibility Pride & Prejudice Mansfield Park

0 20 40 10 15 20 25 30 0 5 10 15 20 25

0 10 20 30 40 50 20 40 60 0 10 20 30 40 50

0.0

2.5

5.0

7.5

10.0

0.0

2.5

5.0

7.5

10.0

chapter

mar

riage

Page 20: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Sentiment

Words have sentimental value. There are three ways you canoperationalize the sentimental value of a word.

I Positive or negativeI Numeric (-3 to 3)I Emotion (joy, fear, trust, disgust, etc)

Use the get_sentiments function to get the operationalize you want.

Page 21: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Sentimentget_sentiments("bing")

## # A tibble: 6,788 × 2## word sentiment## <chr> <chr>## 1 2-faced negative## 2 2-faces negative## 3 a+ positive## 4 abnormal negative## 5 abolish negative## 6 abominable negative## 7 abominably negative## 8 abominate negative## 9 abomination negative## 10 abort negative## # ... with 6,778 more rows

Page 22: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Sentimentget_sentiments("afinn")

## # A tibble: 2,476 × 2## word score## <chr> <int>## 1 abandon -2## 2 abandoned -2## 3 abandons -2## 4 abducted -2## 5 abduction -2## 6 abductions -2## 7 abhor -3## 8 abhorred -3## 9 abhorrent -3## 10 abhors -3## # ... with 2,466 more rows

Page 23: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Sentimentget_sentiments("nrc")

## # A tibble: 13,901 × 2## word sentiment## <chr> <chr>## 1 abacus trust## 2 abandon fear## 3 abandon negative## 4 abandon sadness## 5 abandoned anger## 6 abandoned fear## 7 abandoned negative## 8 abandoned sadness## 9 abandonment anger## 10 abandonment fear## # ... with 13,891 more rows

Page 24: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Attaching sentiments

To use these, you can join the sentiments data frame with yourlong-form word data frame.

long_austen <- long_austen %>%inner_join(get_sentiments("afinn"))

Page 25: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Use sentiments as a new variable

ggplot(long_austen, aes(x = chapter, y = score, color = book)) +geom_smooth(se = F) +guides(color = FALSE) +facet_wrap(~book, scales = "free_x") +theme_bw()

Page 26: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Use sentiments as a new variable

Emma Northanger Abbey Persuasion

Sense & Sensibility Pride & Prejudice Mansfield Park

0 20 40 0 10 20 30 0 5 10 15 20 25

0 10 20 30 40 50 0 20 40 60 0 10 20 30 40 50

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

chapter

scor

e

Page 27: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Topics

You can try to infer what topics are coming up in your text data.

I Does require you to make some guesses.I Probably more useful for describing data and synthesizing

comments and pilot data than for inferential stats.I Use short-form data set.

Page 28: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Topics

Latent Dirichlet allocation

books_lda <- LDA(short_austen, k = 6,control = list(seed = 1234))

Page 29: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

TopicsI Extract from that the beta matrix.I In this, each word gets one row for each topic.I Beta is the probability of that term being generated from that

topic.

book_topics <- tidy(books_lda, matrix = "beta")head(book_topics)

## # A tibble: 6 × 3## topic term beta## <int> <chr> <dbl>## 1 1 austen 1.185131e-04## 2 2 austen 5.053303e-320## 3 3 austen 3.720076e-44## 4 4 austen 1.742293e-95## 5 5 austen 3.114776e-05## 6 6 austen 3.114320e-37

Page 30: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Find the top terms in each topicThe best way to work with these data is to find the “top terms” ineach topic, to try and figure out what the topic might be.

top_terms <- book_topics %>%group_by(topic) %>%top_n(10, beta) %>%ungroup() %>%arrange(topic, -beta)

head(top_terms)

## # A tibble: 6 × 3## topic term beta## <int> <chr> <dbl>## 1 1 emma 0.017916669## 2 1 miss 0.013341364## 3 1 harriet 0.009551112## 4 1 weston 0.008867266## 5 1 knightley 0.008115030## 6 1 elton 0.007271614

Page 31: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Plot top terms

ggplot(top_terms, aes(term, beta, fill=factor(topic))) +geom_bar(stat="identity", show.legend = F)+facet_wrap(~ topic, scales = "free") +coord_flip()

Page 32: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Plot top terms

4 5 6

1 2 3

0.000 0.005 0.010 0.015 0.000 0.005 0.010 0.015 0.000 0.005 0.010 0.015 0.020

0.000 0.005 0.010 0.015 0.000 0.005 0.010 0.015 0.000 0.005 0.010 0.015

dashwood

edward

elinor

jennings

marianne

miss

mother

sister

time

willoughby

bertram

crawford

edmund

fanny

miss

norris

rushworth

sir

thomas

time

anne

captain

charles

elliot

lady

miss

sir

time

walter

wentworth

allen

brother

catherine

friend

isabella

miss

morland

thorpe

tilney

time

dear

elton

emma

harriet

jane

knightley

miss

time

weston

woodhouse

bennet

bingley

darcy

dear

elizabeth

jane

lady

miss

sister

time

beta

term

Page 33: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Use in Psychology Research

I Data was generously provided by Alexa Lord.I Study on self-affirmation - “the act of affirming an important,

typically non-threatened, aspect of the self”I Does self-affirmation reduce rejection sensitivity

Page 34: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Experimental manipulation

I Self-affirmation condition: Rank values or traits on importance.Write for five minutes about the trait or value you listed asmost important.

I Control condition: Think of a T.V. character and rank valuesor traits on importance to that character. Write for fiveminutes about why that T.V. character values the number onecharacter or trait.

Page 35: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Load in data

#typical datasaf <- read.csv("saf.csv")#text datadocs <- VCorpus(DirSource("tm data/Text Files"))docs.tidy <- tidy(docs)docs.tidy$ID <- gsub("\\.txt", "", docs.tidy$id)

Page 36: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Tokenize text

docs.tidy2 <- docs.tidy %>%unnest_tokens(output = word,

input = text,token = "words") %>%

anti_join(stop_words) %>%select(ID, word)

saf <- merge(saf, docs.tidy2, by = "ID")

Page 37: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Frequencies

Which words are used more frequently in the SAF condition than inthe control condition?

saf.freq <- saf %>%group_by(SAF) %>%count(word) %>%ungroup()

saf.freq <- saf %>%group_by(SAF) %>%summarize(n.words = n()) %>%ungroup() %>%inner_join(saf.freq) %>%mutate(n.adj = n/n.words)

Page 38: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Plot word frequencies0 1

0.000 0.005 0.010 0.015 0.020 0.00 0.01 0.02 0.03

family

feel

friendly

friends

life

loving

people

person

relationships

respect

character

family

friends

humor

knowledge

life

makes

people

respect

sense

n.adj

wor

d

Page 39: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

SentimentWhich sentiments are found in each condition?

saf.freq %>%inner_join(get_sentiments("nrc")) %>%group_by(SAF, sentiment) %>%summarize(m.sent = mean(n.adj, na.rm=T))

## Joining, by = "word"

## Source: local data frame [20 x 3]## Groups: SAF [?]#### SAF sentiment m.sent## <int> <chr> <dbl>## 1 0 anger 0.0003602342## 2 0 anticipation 0.0006829609## 3 0 disgust 0.0004092731## 4 0 fear 0.0004422031## 5 0 joy 0.0007660650## 6 0 negative 0.0003877042## 7 0 positive 0.0007319573## 8 0 sadness 0.0004619886## 9 0 surprise 0.0004357961## 10 0 trust 0.0007097314## 11 1 anger 0.0004942628## 12 1 anticipation 0.0013592024## 13 1 disgust 0.0004737870## 14 1 fear 0.0007217309## 15 1 joy 0.0014757498## 16 1 negative 0.0004641179## 17 1 positive 0.0009689168## 18 1 sadness 0.0005495285## 19 1 surprise 0.0006640335## 20 1 trust 0.0011505550

Page 40: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Plot sentiments

anger

anticipation

disgust

fear

joy

negative

positive

sadness

surprise

trust

0.0000 0.0005 0.0010 0.0015

m.sent

sent

imen

t

Page 41: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Topics

What are people talking about?

short_saf <- saf.freq %>%filter(!grepl("[0-9]", word)) %>%select(SAF, word, n) %>%cast_dtm(document = SAF, term = word, value = n)

saf_lda <- LDA(short_saf, k = 2,control = list(seed = 1234))

Page 42: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Extract topics

saf_topics <- tidy(saf_lda, matrix = "beta")head(saf_topics)

## # A tibble: 6 × 3## topic term beta## <int> <chr> <dbl>## 1 1 abilities 4.014336e-04## 2 2 abilities 7.568858e-04## 3 1 ability 2.452691e-03## 4 2 ability 1.609864e-03## 5 1 abuse 2.138209e-04## 6 2 abuse 7.667266e-05

Page 43: Text Mining in R - WordPress.com · 3/24/2017 · What is Text Mining? ... mother catherine feelings happy moment half love till crawford ... thomas time anne captain charles elliot

Plot topics1 2

0.000 0.005 0.010 0.015 0.020 0.00 0.01 0.02 0.03

family

feel

friends

life

makes

people

person

relationships

respect

sense

care

family

friendly

friends

humor

knowledge

life

loving

people

relationships

beta

term