Explaining search metrics marked
-
Upload
james-keuning -
Category
Documents
-
view
214 -
download
1
description
Transcript of Explaining search metrics marked
![Page 1: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/1.jpg)
Explaining search metrics using fish.
Categores of search results are hard to explain. They are not hard to understand, just hard to explain. Anyone who has used the internet to search for anything realizes that of the millions of search results, some meet the needs of the searcher, while others do not. Furthermore, of the billions of items which were searched, some were delivered in the results, some were not. Taken even further yet, of the items which were not delivered, some meet the needs of the searcher, while others do not.
Results that meet the needs of the searcher are Responsive. Results that do not meet the needs are Nonresponsive.
This marks the difficult (yet, not difficult) point: the search results, those things that are presented to the searcher, indicate the way that the search protocol responded to the query. It could be said that those things (the search restults) are responsive to the search. In other words, after running a search, we might ask the question:
"How many responsive items are there?"
Do not fall into the trap of providing the number of items in the search results as the answer; some of the results may not meet the needs of the searcher and are thus nonresponsive.
Items are responsive irrespective of how the search handles them. In fact, we measure the effectiveness of the search based on how responsive documents are handled. And right now we are going to use fish to explain this.
"could" be said,
not "should"
draft1© fishandink.com
![Page 2: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/2.jpg)
Total fish: 10
5 Responsive fish5 Nonresponsive fish
("TR")("TN")
TR and TN: True Responsive and True Nonresponsive, respectively. These are counts before any type of query is run. This "true" number is a bit of a fiction because the responsiveness of a document is subjective. These numbers represent the the result of a 100% perfect search.
The bowl will contain the documents that the search calls responsive.
Now we run a search. The fish which the search deems responsive are put into the bowl. Keep in mind that the search will not be 100% accurate so some nonresponsive fish might end up in the bowl and some responsive fish might get left behind.
Search Results
We will call these"search responsive"
Example Contains:
draft2© fishandink.com
![Page 3: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/3.jpg)
This is what perfection would look like.
Search Responsive
Search Nonresponsive
Perfection rarely happens.These results are more realistic and will make sample calculations more useful:
4 ("SR")
6 ("SN")
draft3© fishandink.com
![Page 4: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/4.jpg)
Search Responsive
Search Nonresponsive
True ResponsiveTrue Nonesponsive
TRTN
55
These numbers did not change
Search Responsive 4Search Nonresponsive 6
Drop these numbers around the Contingency Table, aka
Confusion Matrix*
TR
TN
SR SN
5
5
4 6
3 2
41
SRSN
And take a look at the interior numbers: 3, 2, 1, and 4 correspond to a category of fish
Dichotomous binary classification – yes or no.
*No ranking*
fish in the water
fish in the bowl
Notice about this table:The ROWS represent TRUE valuesThe COLUMNS are SEARCH values
draft4© fishandink.com
![Page 5: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/5.jpg)
TR
TN
SR SN
5
5
4 6
3 2
41
Overlay these categories onto the table:
FN
FP CN
CP
The post-search fish categories have names:correctly marked responsive
incorrectly marked responsive
incorrectly marked nonresponsive
correctly marked nonresponsive
Correct Positive (CP)False Positive (FP)False Negative (FN)Correct Negative (CN)
black fis
h in the
bowl
white fish in the b
owl
black fish in the waterwhite fish in the water
draft5© fishandink.com
![Page 6: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/6.jpg)
Understand what is happening here:
Three fish were correctly indentified as responsive CP
One fish was in incorrectly indentified as responsive FP
Four fish were correctly indentified as nonresponsive CN
Two fish incorrectly nonresponsive
FN
Overlay some codes to facilitate formulas
TR
TN
SR SN
5
5
4 6
3 2
41
FN
FP CN
CP A B C
D E F
G H I
(Don't be scared)
10
(A, B, D, and E.)
draft6© fishandink.com
![Page 7: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/7.jpg)
TR
TN
SR SNC
F
G H
Realize that TR=C, TN=F, SR=G, and SN=H
TR
TN
SR SNC
F
G H
A B
D E
I
Each of nine values has a letter code, A-I
TR
TN
SR SN
5
5
4 6
3 2
41
FN
FP CN
CP A B C
D E F
G H I10draft7© fishandink.com
![Page 8: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/8.jpg)
A
D
BE
Apply the interior codes to the fish chart. Pretty simple.
A
D
BE
Now add the perimeter codes.
G
H
C
F
(A, B, D, and E.)
draft8© fishandink.com
![Page 9: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/9.jpg)
TR
TN
SR SN
5
5
4 6
3 2
41
A B C
D E F
G H I10
Contingency Table filled out
A
D
BE
G
H
C
F
For the sake of exhausting repetition, here is the table and the diagram. Get used to them. draft
9© fishandink.com
![Page 10: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/10.jpg)
The most basic and argubly most importantfor legal information seekers is recall.
RECALL is the percentage of responsivedocuments that the search found.
Our document set has five responsive documents. Our search found three of them.
RECALL = A/CA
B
C
RECALL = 3/5Recall is the number of responsive documents in thesearch results divided by the total number of responsivedocuments in the complete document set
.6
Now we will put together our first metric.
Recall is also know as:True Positive Rate SensitivityHit Rate
draft10© fishandink.com
![Page 11: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/11.jpg)
Another basic and accessible formula is Precision.Precision is important for retrieval tasks
PRECISION in the percentage of retrieveddocuments that are responsive.
A
DG
Our search retrieved four documents. Three of them are responsive.
PRECISION = A/G
PRECISION = 3/4Precision is the number of responsive documents inthe search results divided by the total number ofdocuments in the search results.
AKA Positive Predictive Value
such as internet searching.draft11© fishandink.com
![Page 12: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/12.jpg)
Another basic and accessible formula is Precision.Precision is important for retrieval tasks
PRECISION in the percentage of retrieveddocuments that are responsive.
A
DG
Our search retrieved four documents. Three of them are responsive.
PRECISION = A/G
PRECISION = 3/4Precision is the number of responsive documents inthe search results divided by the total number ofdocuments in the search results.
AKA Positive Predictive Value
such as internet searching.draft12© fishandink.com
![Page 13: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/13.jpg)
ELUSION is the percentage of nonretrienved documentswhich are responsive and should have been retrieved.
Our document set has six nonretrieved documents. Two of them are responsive and should have been retrieved.
BE H
ELUSION = B/H
ELUSION = 2/6Elusion allows us to assess whether our entire process has succeeded to the required level.
Baron, J. R & Thompson, P., Proceedings of the11th international conference onArtificial intelligence and law. 2007
False negatives
.33
Proportion of predicted negatives that are incorrect.
H. L. Roitblat, Measurement in eDiscovery2013 OrcaTec LLC
False negatives
Instead of counting the responsive documents that we found, we count the ones that we left behind.
Search nonresponsives that are responsive
draft13© fishandink.com
![Page 14: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/14.jpg)
FALLOUT measures how quickly PRECISION drops asRECALL increases.
If I want to increase my recall, I need to get more blackfish in the bowl. So I adjust my search. I get maximumrecall by leaving no black fish behind. So maybe I setmy search so that EVERY fish is caught.100% RECALL. But what about PRECISION?
Remember that PRECISION in the percentage ofretrieved documents that are responsive. So in thisexample, PRECISION dropped to 50%.
Time in.
Time out. What?draft14© fishandink.com
![Page 15: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/15.jpg)
FALLOUT measures how quickly PRECISION drops asRECALL increases.
D
E
F
FALLOUT is the percentage of all nonresponsivedocuments which were incorrectly retrieved.
Our document set has five nonresponsive documents. Our search incorrectly found one of them.
FALLOUT = D/F
FALLOUT = 1/5
That's a falsepositive
white fish
white fish in the bowl
.2Be careful with fallout because you can easily get a fallout of zero by marking zero documents responsive.
draft15© fishandink.com
![Page 16: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/16.jpg)
Before we look at another formula, let's look back at precision and recall.
TR
TN
SR SN
5
5
4 6
3 2
41
FN
FP CN
CP A B C
D E F
G H I10
Start with the full loaded contingency table1
2 Remember the recall and precision formulas:
RECALL = A/C
PRECISION = A/G
3 Notice the rows and columns that the formulasdraw from:
TR
TN
SR SNC
F
G H
A B
D E
I
4 Do the same thing for Elusion and Fallout.(for kicks)If that was
obvious to you and you did not need the diagram, this is for you:
BEHD
EF
Hin
t
draft16© fishandink.com
![Page 17: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/17.jpg)
NEGATIVE PREDICTIVE VALUE reflects thepercentage of non-retrieved documents that arein fact not responsive.
Our search yielded six non-retrieved documents. Of these, four were not respsonsive.
NPV = E/H
BE H
SN
CN
NPV = 4/6
A
DG
PRECISION = A/G
Note that NPVlogically complementsprecision.
.67
NPV is also 100% - ELUSIONminus
all fish in the water
white fish in the water
draft17© fishandink.com
![Page 18: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/18.jpg)
PREVALENCE is the percentage of all documents which are true responsive.
Our document set has ten documents. Five are true respsonsive.
all fish
black fish
Prevalence = C\I
Prevalence = 5/10
5/10
AKA yield
AKA richness
NOTICEThis metric doesnot care about search results
C
draft18© fishandink.com
![Page 19: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/19.jpg)
SPECIFICITY is the percentage of true nonresponsivedocuments that are corrently identified as nonresponsive
Our document set has five nonresponsive documents. Four were correctly identified. white fish in the water
white fish
Specificity = E/F
Specificity = 4/5
D
E
F
compare this to fallout (same denominator, switch the numerator)
"the bottom number" "the top number"credit: Black's Math Dictionary
AKA Correct Rejection Rate
AKA: True Negative RateAKA: Inverse Recall
draft19© fishandink.com
![Page 20: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/20.jpg)
FALSE NEGATIVE RATE
100% ‒ Recall
FNR=B/CA
B
C2/5
The percentage of True Responsivedocuments that are missed
.4
Note that FNRplus Recalleqauls 100%
AKA Miss Rate
True Positive Ratedraft20© fishandink.com
![Page 21: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/21.jpg)
AccuracyThe percentage of documents that are correctly coded
(A+E)/IA
E
I
(3+4)/1070%
Accuracy is 100% - Errorminus
In highly prevalent or rich data sets (Or sets with extremely low prevalence or richness), Accuracy is a poor measure. Consider a set with 95 percent nonresponsive documents - 95 percent accuracy can be achieved by marking everything nonresponsive.
draft21© fishandink.com
![Page 22: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/22.jpg)
ErrorThe percentage of documents that are incorrectly coded
D
B
I
(B+D)/I (2+1)/1030%
Error can also be calculated: 100% – Accuracyminus
The warning regarding extremes of prevalence or richness applies to Error as well. The utility of Error as a search metric goes down as richness gets extremely high or low.
draft22© fishandink.com
![Page 23: Explaining search metrics marked](https://reader033.fdocuments.us/reader033/viewer/2022042823/568bda7d1a28ab2034ab03b2/html5/thumbnails/23.jpg)
A
DG
Flase Alarm RateThe percentage of Search Responsive documents that aretruly nonresponsive.
D/G
False Positive Rate
This metric does not care about the null set.draft
23© fishandink.com