New Predictive Analysis of Text · 2020. 4. 20. · Predictive Analysis example of relational data:...
Transcript of New Predictive Analysis of Text · 2020. 4. 20. · Predictive Analysis example of relational data:...
-
Predictive Analysis of Text:Concepts, Features, and Instances
Sandeep [email protected]
-
Bias vs. Variance
-
Bias vs. Variance
Fig source: http://scott.fortmann-roe.com/docs/BiasVariance.html
-
Overfitting
How would you fit a curve/line through these points?
-
Overfitting
Fig source: https://algotrading101.com/blog/1543426/what-is-curve-fitting-overfitting-in-trading-optimization
-
Curse of Dimensionality
Fig source: https://medium.freecodecamp.org/the-curse-of-dimensionality-how-we-can-save-big-data-from-itself-d9fa0f872335
-
Optimization
Fig source: http://vrand.com/education.html
-
8
• Objective: developing and evaluating computer programs that automatically detect a particular concept in natural language text
Predictive Analysis of Text
-
9
Predictive Analysisbasic ingredients
1. Training data: a set of positive and negative examples of the concept we want to automatically recognize
2. Representation: a set of features that we believe are useful in recognizing the desired concept
3. Learning algorithm: a computer program that uses the training data to learn a predictive model of the concept
-
10
Predictive Analysisbasic ingredients
4. Model: a function that describes a predictive relationship between feature values and the presence of the concept
5. Test data: a set of previously unseen examples used to estimate the model’s effectiveness
6. Performance metrics: a set of statistics used to measure the predictive effectiveness of the model
-
11
Predictive Analysistraining and testing
machine learning
algorithmmodel
labeled examples
new, unlabeled examples
model
predictions
training
testing
-
12
color size #slides equalsides ... label
red big 3 no ... yes
green big 3 yes ... yes
blue small inf yes ... no
blue small 4 yes ... no....
........
........
....
red big 3 yes ... yes
Predictive Analysisconcept, instances, and features
conceptfeatures
inst
ance
s
-
13
Predictive Analysistraining and testing
machine learning
algorithmmodel
labeled examples
new, unlabeled examples
model
predictions
training
testing
color size sides equalsides ... label
red big 3 no ... yes
green big 3 yes ... yes
blue small inf yes ... no
blue small 4 yes ... no....
........
........
....
red big 3 yes ... yes
color size sides equalsides ... label
red big 3 no ... ???
green big 3 yes ... ???
blue small inf yes ... ???
blue small 4 yes ... ???....
........
........ ???
red big 3 yes ... ???
color size sides equalsides ... label
red big 3 no ... yes
green big 3 yes ... yes
blue small inf yes ... no
blue small 4 yes ... no....
........
........
....
red big 3 yes ... yes
-
14
• Is a particular concept appropriate for predictive analysis?
• What should the unit of analysis be?
• How should I divide the data into training and test sets?
• What is a good feature representation for a task?
• What type of learning algorithm should I use?
• How should I evaluate my model’s performance?
Predictive Analysisquestions
-
15
• Learning algorithms can recognize some concepts better than others
• What are some properties of concepts that are easier to recognize?
Predictive Analysisconcepts
-
16
• Option 1: can a human recognize the concept?
Predictive Analysisconcepts
-
17
• Option 1: can a human recognize the concept?
• Option 2: can two or more humans recognize the concept independently and do they agree?
Predictive Analysisconcepts
-
18
• Option 1: can a human recognize the concept?
• Option 2: can two or more humans recognize the concept independently and do they agree?
• Option 2 is better.
• In fact, models are sometimes evaluated as an independent assessor
• How does the model’s performance compare to the performance of one assessor with respect to another?
‣ One assessor produces the “ground truth” and the other produces the “predictions”
Predictive Analysisconcepts
-
19
Predictive Analysismeasures agreement: percent agreement
yes no
yes A B
no C D
• Percent agreement: percentage of instances for which both assessors agree that the concept occurs or does not occur
(?+?)
(?+?+?+?)
-
20
yes no
yes A B
no C D
• Percent agreement: percentage of instances for which both assessors agree that the concept occurs or does not occur
(A+D)
(A+B+C+D)
Predictive Analysismeasures agreement: percent agreement
-
21
• Percent agreement: percentage of instances for which both assessors agree that the concept occurs or does not occur
Predictive Analysismeasures agreement: percent agreement
yes no
yes 5 5 10
no 15 75 90
20 80
%agreement= ???
-
22
• Percent agreement: percentage of instances for which both assessors agree that the concept occurs or does not occur
Predictive Analysismeasures agreement: percent agreement
yes no
yes 5 5 10
no 15 75 90
20 80
%agreement= (5+75)/100=80%
-
23
• Problem: percent agreement does not account for agreement due to random chance.
• How can we compute the expected agreement due to random chance?
• Option 1: assume unbiased assessors
• Option 2: assume biased assessors
Predictive Analysismeasures agreement: percent agreement
-
24
• Option 1: unbiased assessors
Predictive Analysiskappa agreement: chance-corrected % agreement
yes no
yes ?? ?? 50
no ?? ?? 50
50 50
-
25
• Option 1: unbiased assessors
Predictive Analysiskappa agreement: chance-corrected % agreement
yes no
yes 25 25 50
no 25 25 50
50 50
-
26
• Option 1: unbiased assessors
Predictive Analysiskappa agreement: chance-corrected % agreement
yes no
yes 25 25 50
no 25 25 50
50 50
randomchance%agreement= ???
-
27
• Option 1: unbiased assessors
Predictive Analysiskappa agreement: chance-corrected % agreement
yes no
yes 25 25 50
no 25 25 50
50 50
randomchance%agreement= (25+25)/100=50%
-
28
Predictive Analysiskappa agreement: chance-corrected % agreement
• Kappa agreement: percent agreement after correcting for the expected agreement due to random chance
• P(a)=percentofobservedagreement
• P(e)=percentofagreementduetorandomchance
-
29
yes noyes 5 5 10no 15 75 90
20 80
Predictive Analysiskappa agreement: chance-corrected % agreement
• Kappa agreement: percent agreement after correcting for the expected agreement due to unbiased chance
yes noyes 25 25 50no 25 25 50
50 50
-
30
• Option 2: biased assessors
Predictive Analysiskappa agreement: chance-corrected % agreement
yes no
yes 5 5 10
no 15 75 90
20 80
biasedchance%agreement= ???
-
Predictive AnalysisProbability calculation
• When throwing a die twice, what is the probability that the first is even number and the second is a multiple of 3?
1st 2nd
Even numbers: 2, 4, 6 Multiples of 3: 3, 6
31Slide borrowed from Heejun Kim
-
Predictive AnalysisProbability calculation
• When throwing a die twice, what is the probability that the first is even number and the second is a multiple of 3?
1st 2nd
Even numbers: 2, 4, 6!"
= #$
Multiples of 3: 3, 6$"
= #!
12×13=16
32Slide borrowed from Heejun Kim
-
33
yes noyes 5 5 10no 15 75 90
20 80
Predictive Analysiskappa agreement: chance-corrected % agreement
• Kappa agreement: percent agreement after correcting for the expected agreement due to biased chance
-
34
yes noyes 5 5 10no 15 75 90
20 80
Predictive Analysiskappa agreement: chance-corrected % agreement
• Kappa agreement: percent agreement after correcting for the expected agreement due to biased chance
-
35
• INPUT: unlabeleddata,annotators,codingmanual• OUTPUT: labeleddata
1. usingthelatestcodingmanual,haveall annotatorslabelsomepreviouslyunseenportionofthedata(~10%)
2. measureinter-annotatoragreement(Kappa)
3. IF agreement<X,THEN:‣ refinecodingmanualusingdisagreementstoresolve
inconsistenciesandclarifydefinitions
‣ returnto1
• ELSE‣ haveannotatorslabeltheremainderofthedata
independently andEXIT
Predictive Analysisdata annotation process
-
36
• What is good (Kappa) agreement?
• It depends on who you ask
• According to Landis and Koch, 1977:
‣ 0.81 - 1.00: almost perfect
‣ 0.61 - 0.70: substantial
‣ 0.41 - 0.60: moderate
‣ 0.21 - 0.40: fair
‣ 0.00 - 0.20: slight
‣ < 0.00: no agreement
Predictive Analysisdata annotation process
-
Predictive Analysiskappa agreement: chance-corrected % agreement
Case P(a) P(e) Kappa
1 0.5 0.1
2 0.5 0.2
3 0.5 0.3
4 0.5 0.4
5 0.5 0.5
• Kappa agreement simulation
37Slide borrowed from Heejun Kim
-
Predictive Analysiskappa agreement: chance-corrected % agreement
Case P(a) P(e) Kappa
1 0.5 0.1 0.44
2 0.5 0.2 0.375
3 0.5 0.3 0.29
4 0.5 0.4 0.17
5 0.5 0.5 0
• Kappa agreement simulation
38Slide borrowed from Heejun Kim
-
39
• Question: requests information about the course content
• Answer: contributes information in response to a question
• Issue: expresses a problem with the course management
• Issue Resolution: attempts to resolve a previously raised issue
• Positive Ack: positive sentiment about a previous post
• Negative Ack: negative sentiment about a previous post
• Other: serves a different purpose
Predictive Analysisdata annotation process
-
40
Predictive Analysisdata annotation process
-
41
• Is a particular concept appropriate for predictive analysis?
• What should the unit of analysis be?
• What is a good feature representation for this task?
• How should I divide the data into training and test sets?
• What type of learning algorithm should I use?
• How should I evaluate my model’s performance?
Predictive Analysisquestions
-
42
• For many text-mining applications, turning the data into instances for training and testing is fairly straightforward
• Easy case: instances are self-contained, independent units of analysis
‣ topic categorization: instances = documents
‣ opinion mining: instances = product reviews
‣ bias detection: instances = political blog posts
‣ emotion detection: instances = support group posts
Predictive Analysisturning data into (training and test) instances
-
43
w_1 w_2 w_3 ... w_n label
1 1 0 ... 0 health
0 0 0 ... 0 other
0 0 0 ... 0 other
0 1 0 ... 1 other....
........ ...
0 ....
1 0 0 ... 1 health
conceptfeatures
inst
ance
sTopic Categorization
predicting health-related documents
-
44
w_1 w_2 w_3 ... w_n label
1 1 0 ... 0 positive
0 0 0 ... 0 negative
0 0 0 ... 0 negative
0 1 0 ... 1 negative....
........ ...
0 ....
1 0 0 ... 1 positive
conceptfeatures
inst
ance
sOpinion Mining
predicting positive/negative movie reviews
-
45
w_1 w_2 w_3 ... w_n label
1 1 0 ... 0 liberal
0 0 0 ... 0 conservative
0 0 0 ... 0 conservative
0 1 0 ... 1 conservative....
........ ...
0 ....
1 0 0 ... 1 liberal
conceptfeatures
inst
ance
sBias Detection
predicting liberal/conservative blog posts
-
46
• A not-so-easy case: relational data
• The concept to be learned is a relation between pairs of objects
Predictive Analysisturning data into (training and test) instances
-
47
Predictive Analysisexample of relational data: Brother(X,Y)
(exampleborrowedandmodifiedfromWittenet al. textbook)
-
48
name_1 gender_1 mother_1 father_1 name_2 gender_2 mother_2 father_2 brother
steven male peggy peter graham male peggy peter yes
Ian male grace ray brian male grace ray yes
anna female pam ian nikki female pam ian no
pippa female grace ray brian male grace ray no
steven male peggy peter brian male grace ray no....
........
........
........
........
anna female pam ian brian male grace ray no
conceptfeatures
inst
ance
sPredictive Analysis
example of relational data: Brother(X,Y)
-
49
• A not-so-easy case: relational data
• Each instance should correspond to an object pair (which may or may not share the relation of interest)
• May require features that characterize properties of the pair
Predictive Analysisturning data into (training and test) instances
-
50
conceptfeatures
inst
ance
sPredictive Analysis
example of relational data: Brother(X,Y)
(can we think of a better feature representation?)
name_1 gender_1 mother_1 father_1 name_2 gender_2 mother_2 father_2 brother
steven male peggy peter graham male peggy peter yes
Ian male grace ray brian male grace ray yes
anna female pam ian nikki female pam ian no
pippa female grace ray brian male grace ray no
steven male peggy peter brian male grace ray no....
........
........
........
........
anna female pam ian brian male grace ray no
-
51
gender_1 gender_2 sameparents brother
male male yes yes
male male yes yes
female female no no
female male yes no
male male no no....
........
....
female male no no
conceptfeatures
inst
ance
sPredictive Analysis
example of relational data: Brother(X,Y)
-
52
• A not-so-easy case: relational data
• There is still an issue that we’re not capturing! Any ideas?
• Hint: In this case, should the predicted labels really be independent?
Predictive Analysisturning data into (training and test) instances
-
53
Brother(A,B)=yes
Brother(B,C)=yes
Brother(A,C)=no
Predictive Analysisturning data into (training and test) instances
-
54
• In this case, what we would really want is:
‣ a method that does joint prediction on the test set
‣ a method whose joint predictions satisfy a set of known properties about the data as a whole (e.g., transitivity)
Predictive Analysisturning data into (training and test) instances
-
55
• There are learning algorithms that incorporate relational constraints between predictions
• However, they are beyond the scope of this class
• We’ll be covering algorithms that make independent predictions on instances
• That said, many algorithms output prediction confidence values
• Heuristics can be used to disfavor inconsistencies
Predictive Analysisturning data into (training and test) instances
-
56
• Examples of relational data in text-mining:
‣ information extraction: predicting that a word-sequence belongs to a particular class (e.g., person, location)
‣ topic segmentation: segmenting discourse into topically coherent chunks
Predictive Analysisturning data into (training and test) instances
-
57
• Examples of relational data in text-mining:
‣ information extraction: predicting that a word-sequence belongs to a particular class (e.g., person, location)
‣ topic segmentation: segmenting discourse into topically coherent chunks
Predictive Analysisturning data into (training and test) instances
e.g., President Barack Obama gives farewell speech in Chicago – as it happened
-
58
• Is a particular concept appropriate for predictive analysis?
• What should the unit of analysis be?
• How should I divide the data into training and test sets?
• What is a good feature representation for this task?
• What type of learning algorithm should I use?
• How should I evaluate my model’s performance?
Predictive Analysisquestions
-
59
• We want our model to “learn” to recognize a concept
• So, what does it mean to learn?
Predictive Analysistraining and test data
-
60
• The machine learning definition of learning:
A machine learns with respect to a particular task T, performance metric P, and experience E, if the system improves its performance P at task T following experience E. -- Tom Mitchell
Predictive Analysistraining and test data
-
61
• We want our model to improve its generalizationperformance!
• That is, its performance on previously unseen data!
• Generalize: to derive or induce a general conception or principle from particulars. -- Merriam-Webster
• In order to test generalization performance, the training and test data cannot be the same.
• Why?
Predictive Analysistraining and test data
-
62
Training data + Representationwhat could possibly go wrong?
x
-
63
• While we don’t want to test on training data, models usually perform the best when the training and test set are derived from the same “probability distribution”.
• What does that mean?
Predictive Analysistraining and test data
-
64
Predictive Analysistraining and test data
Data
positive instancesnegative instances
Test DataTraining Data
? ?
-
65
Predictive Analysistraining and test data
Data
positive instancesnegative instances
Test DataTraining Data
• Is this a good partitioning? Why or why not?
-
66
Predictive Analysistraining and test data
Data
positive instancesnegative instances
Test DataTraining Data
RandomSample
RandomSample
-
67
Predictive Analysistraining and test data
Data
positive instancesnegative instances
Test DataTraining Data
• On average, random sampling should produce comparable data for training and testing
-
68
Predictive Analysistraining and test data
• If you want to predict stock price by analyzing tweets, how the training and test data should be separated?
𝑡- 𝑡# 𝑡$ 𝑡! 𝑡.
Slide borrowed from Heejun Kim
-
69
Predictive Analysistraining and test data
• If you want to predict stock price by analyzing tweets, how the training and test data should be separated?
𝑡-
Test data
Training data
𝑡# 𝑡$ 𝑡! 𝑡.
Slide borrowed from Heejun Kim
-
70
• Models usually perform the best when the training and test set have:
‣ a similar proportion of positive and negative examples
‣ a similar co-occurrence of feature-values and each target class value
Predictive Analysistraining and test data
-
71
Predictive Analysistraining and test data
• Caution: in some situations, partitioning the data randomly might inflate performance in an unrealistic way!
• How the data is split into training and test sets determines what we can claim about generalization performance
• The appropriate split between training and test sets is usually determined on a case-by-case basis
-
72
• Spam detection: should the training and test sets contain email messages from the same sender, same recipient, and/or same timeframe?
• Topic segmentation: should the training and test sets contain potential boundaries from the same discourse?
• Opinion mining for movie reviews: should the training and test sets contain reviews for the same movie?
• Sentiment analysis: should the training and test sets contain blog posts from the same discussion thread?
Predictive Analysisdiscussion
-
73
• Is a particular concept appropriate for predictive analysis?
• What should the unit of analysis be?
• How should I divide the data into training and test sets?
• What type of learning algorithm should I use?
• What is a good feature representation for this task?
• How should I evaluate my model’s performance?
Predictive Analysisquestions
-
74
• Linear classifiers
• Decision tree classifiers
• Instance-based classifiers
Predictive Analysisthree types of classifiers
-
75
• All types of classifiers learn to make predictions based on the input feature values
• However, different types of classifiers combine the input feature values in different ways
• Chapter 3 in the book refers to a trained model as knowledge representation
Predictive Analysisthree types of classifiers
-
76
Predictive Analysislinear classifiers: perceptron algorithm
-
77
Predictive Analysislinear classifiers: perceptron algorithm
parameters learned by the modelpredicted value (e.g., 1 = positive, 0 = negative)
-
78
f_1 f_2 f_3
0.5 1 0.2
model weightstest instancew_0 w_1 w_2 w_3
2 -5 2 1
Predictive Analysislinear classifiers: perceptron algorithm
What is the output?predicted value (e.g., 1 = positive, 0 = negative)
-
79
f_1 f_2 f_3
0.5 1 0.2
model weightstest instancew_0 w_1 w_2 w_3
2 -5 2 1
output=2.0 +(0.50 x-5.0)+(1.0 x2.0)+(0.2 x1.0)
output=1.7
outputprediction=positive
Predictive Analysislinear classifiers: perceptron algorithm
-
80
Predictive Analysislinear classifiers: perceptron algorithm
(two-featureexampleborrowedfromWittenet al. textbook)
-
81
Predictive Analysislinear classifiers: perceptron algorithm
(source:http://en.wikipedia.org/wiki/File:Svm_separating_hyperplanes.png)
-
82
Predictive Analysislinear classifiers: logistic regression
(source:https://en.wikipedia.org/wiki/Logistic_regression#/media/File:Logistic-curve.svg)
when
Slide borrowed from Heejun Kim
-
83
Predictive Analysislinear classifiers: perceptron algorithm
x2
x1• Would a linear classifier do well on positive (black) and
negative (white) data that looks like this?
0.5 1.0
0.5
1.0
-
84
• Linear classifiers
• Decision tree classifiers
• Instance-based classifiers
Predictive Analysisthree types of classifiers
-
85
Predictive Analysisexample of decision tree classifier: Brother(X,Y)
sameparents
gender_1
gender_2
no
noyes
male female
male female
no
noyes
-
86
Predictive Analysisdecision tree classifiers
x2
x1• Draw a decision tree that would perform perfectly on
this training data!
0.5 1.0
0.5
1.0
-
87
Predictive Analysisexample of decision tree classifier
X1>0.5
X2>0.5 X2>0.5
noyes
whiteblack
noyes no
white black
yes
x2
x10.5 1.0
0.5
1.0
Slide borrowed from Heejun Kim
-
88
• Linear classifiers
• Decision tree classifiers
• Instance-based classifiers
Predictive Analysisthree types of classifiers
-
89
Predictive Analysisinstance-based classifiers
x2
x1• predict the class associated with the most similar
training examples
0.5 1.0
0.5
1.0
?
-
90
Predictive Analysisinstance-based classifiers
x2
x1• predict the class associated with the most similar
training examples
0.5 1.0
0.5
1.0
?
-
91
• Assumption: instances with similar feature values should have a similar label
• Given a test instance, predict the label associated with its nearest neighbors
• There are many different similarity metrics for computing distance between training/test instances
• There are many ways of combining labels from multiple training instances
Predictive Analysisinstance-based classifiers
-
92
• Is a particular concept appropriate for predictive analysis?
• What should the unit of analysis be?
• How should I divide the data into training and test sets?
• What is a good feature representation for this task?
• What type of learning algorithm should I use?
• How should I evaluate my model’s performance?
Predictive Analysisquestions
-
Topics posted• Fake News detection
• Gender bias on Twitter
• News to predict stock price
• Categorize posts on reddit forums
• Topic categorization for emails
• Chapel Hill restaurant recommender
• Detect cyber-bullying on social media
• Tweet sentiment
• Personality trait recognition