Explaining search metrics marked

Explaining search metrics using fish.

Categores of search results are hard to explain. They are not hard to understand, just hard to explain. Anyone who has used the internet to search for anything realizes that of the millions of search results, some meet the needs of the searcher, while others do not. Furthermore, of the billions of items which were searched, some were delivered in the results, some were not. Taken even further yet, of the items which were not delivered, some meet the needs of the searcher, while others do not.

Results that meet the needs of the searcher are Responsive. Results that do not meet the needs are Nonresponsive.

This marks the difficult (yet, not difficult) point: the search results, those things that are presented to the searcher, indicate the way that the search protocol responded to the query. It could be said that those things (the search restults) are responsive to the search. In other words, after running a search, we might ask the question:

"How many responsive items are there?"

Do not fall into the trap of providing the number of items in the search results as the answer; some of the results may not meet the needs of the searcher and are thus nonresponsive.

Items are responsive irrespective of how the search handles them. In fact, we measure the effectiveness of the search based on how responsive documents are handled. And right now we are going to use fish to explain this.

"could" be said,

not "should"

draft1© fishandink.com

Total fish: 10

5 Responsive fish5 Nonresponsive fish

("TR")("TN")

TR and TN: True Responsive and True Nonresponsive, respectively. These are counts before any type of query is run. This "true" number is a bit of a fiction because the responsiveness of a document is subjective. These numbers represent the the result of a 100% perfect search.

The bowl will contain the documents that the search calls responsive.

Now we run a search. The fish which the search deems responsive are put into the bowl. Keep in mind that the search will not be 100% accurate so some nonresponsive fish might end up in the bowl and some responsive fish might get left behind.

Search Results

We will call these"search responsive"

Example Contains:


This is what perfection would look like.

Search Responsive

Search Nonresponsive

Perfection rarely happens.These results are more realistic and will make sample calculations more useful:

4 ("SR")

6 ("SN")


Search Responsive

Search Nonresponsive

True ResponsiveTrue Nonesponsive

TRTN

55

These numbers did not change

Search Responsive 4Search Nonresponsive 6

Drop these numbers around the Contingency Table, aka

Confusion Matrix*

TR

TN

SR SN

5

5

4 6

3 2

41

SRSN

And take a look at the interior numbers: 3, 2, 1, and 4 correspond to a category of fish

Dichotomous binary classification – yes or no.

*No ranking*

fish in the water

fish in the bowl

Notice about this table:The ROWS represent TRUE valuesThe COLUMNS are SEARCH values


TR

TN

SR SN

5

5

4 6

3 2

41

Overlay these categories onto the table:

FN

FP CN

CP

The post-search fish categories have names:correctly marked responsive

incorrectly marked responsive

incorrectly marked nonresponsive

correctly marked nonresponsive

Correct Positive (CP)False Positive (FP)False Negative (FN)Correct Negative (CN)

black fis

h in the

bowl

white fish in the b

owl

black fish in the waterwhite fish in the water


Understand what is happening here:

Three fish were correctly indentified as responsive CP

One fish was in incorrectly indentified as responsive FP

Four fish were correctly indentified as nonresponsive CN

Two fish incorrectly nonresponsive

FN

Overlay some codes to facilitate formulas

TR

TN

SR SN

5

5

4 6

3 2

41

FN

FP CN

CP A B C

D E F

G H I

(Don't be scared)

10

(A, B, D, and E.)


TR

TN

SR SNC

F

G H

Realize that TR=C, TN=F, SR=G, and SN=H

TR

TN

SR SNC

F

G H

A B

D E

I

Each of nine values has a letter code, A-I

TR

TN

SR SN

5

5

4 6

3 2

41

FN

FP CN

CP A B C

D E F

G H I10draft7© fishandink.com

A

D

BE

Apply the interior codes to the fish chart. Pretty simple.

A

D

BE

Now add the perimeter codes.

G

H

C

F

(A, B, D, and E.)


TR

TN

SR SN

5

5

4 6

3 2

41

A B C

D E F

G H I10

Contingency Table filled out

A

D

BE

G

H

C

F

For the sake of exhausting repetition, here is the table and the diagram. Get used to them. draft

9© fishandink.com

The most basic and argubly most importantfor legal information seekers is recall.

RECALL is the percentage of responsivedocuments that the search found.

Our document set has five responsive documents. Our search found three of them.

RECALL = A/CA

B

C

RECALL = 3/5Recall is the number of responsive documents in thesearch results divided by the total number of responsivedocuments in the complete document set

.6

Now we will put together our first metric.

Recall is also know as:True Positive Rate SensitivityHit Rate


Another basic and accessible formula is Precision.Precision is important for retrieval tasks

PRECISION in the percentage of retrieveddocuments that are responsive.

A

DG

Our search retrieved four documents. Three of them are responsive.

PRECISION = A/G

PRECISION = 3/4Precision is the number of responsive documents inthe search results divided by the total number ofdocuments in the search results.

AKA Positive Predictive Value

such as internet searching.draft11© fishandink.com

Another basic and accessible formula is Precision.Precision is important for retrieval tasks

PRECISION in the percentage of retrieveddocuments that are responsive.

A

DG

Our search retrieved four documents. Three of them are responsive.

PRECISION = A/G

PRECISION = 3/4Precision is the number of responsive documents inthe search results divided by the total number ofdocuments in the search results.

AKA Positive Predictive Value

such as internet searching.draft12© fishandink.com

ELUSION is the percentage of nonretrienved documentswhich are responsive and should have been retrieved.

Our document set has six nonretrieved documents. Two of them are responsive and should have been retrieved.

BE H

ELUSION = B/H

ELUSION = 2/6Elusion allows us to assess whether our entire process has succeeded to the required level.

Baron, J. R & Thompson, P., Proceedings of the11th international conference onArtificial intelligence and law. 2007

False negatives

.33

Proportion of predicted negatives that are incorrect.

H. L. Roitblat, Measurement in eDiscovery2013 OrcaTec LLC

False negatives

Instead of counting the responsive documents that we found, we count the ones that we left behind.

Search nonresponsives that are responsive


FALLOUT measures how quickly PRECISION drops asRECALL increases.

If I want to increase my recall, I need to get more blackfish in the bowl. So I adjust my search. I get maximumrecall by leaving no black fish behind. So maybe I setmy search so that EVERY fish is caught.100% RECALL. But what about PRECISION?

Remember that PRECISION in the percentage ofretrieved documents that are responsive. So in thisexample, PRECISION dropped to 50%.

Time in.

Time out. What?draft14© fishandink.com

FALLOUT measures how quickly PRECISION drops asRECALL increases.

D

E

F

FALLOUT is the percentage of all nonresponsivedocuments which were incorrectly retrieved.

Our document set has five nonresponsive documents. Our search incorrectly found one of them.

FALLOUT = D/F

FALLOUT = 1/5

That's a falsepositive

white fish

white fish in the bowl

.2Be careful with fallout because you can easily get a fallout of zero by marking zero documents responsive.


Before we look at another formula, let's look back at precision and recall.

TR

TN

SR SN

5

5

4 6

3 2

41

FN

FP CN

CP A B C

D E F

G H I10

Start with the full loaded contingency table1

2 Remember the recall and precision formulas:

RECALL = A/C

PRECISION = A/G

3 Notice the rows and columns that the formulasdraw from:

TR

TN

SR SNC

F

G H

A B

D E

I

4 Do the same thing for Elusion and Fallout.(for kicks)If that was

obvious to you and you did not need the diagram, this is for you:

BEHD

EF

Hin

t


NEGATIVE PREDICTIVE VALUE reflects thepercentage of non-retrieved documents that arein fact not responsive.

Our search yielded six non-retrieved documents. Of these, four were not respsonsive.

NPV = E/H

BE H

SN

CN

NPV = 4/6

A

DG

PRECISION = A/G

Note that NPVlogically complementsprecision.

.67

NPV is also 100% - ELUSIONminus

all fish in the water

white fish in the water


PREVALENCE is the percentage of all documents which are true responsive.

Our document set has ten documents. Five are true respsonsive.

all fish

black fish

Prevalence = C\I

Prevalence = 5/10

5/10

AKA yield

AKA richness

NOTICEThis metric doesnot care about search results

C


SPECIFICITY is the percentage of true nonresponsivedocuments that are corrently identified as nonresponsive

Our document set has five nonresponsive documents. Four were correctly identified. white fish in the water

white fish

Specificity = E/F

Specificity = 4/5

D

E

F

compare this to fallout (same denominator, switch the numerator)

"the bottom number" "the top number"credit: Black's Math Dictionary

AKA Correct Rejection Rate

AKA: True Negative RateAKA: Inverse Recall


FALSE NEGATIVE RATE

100% ‒ Recall

FNR=B/CA

B

C2/5

The percentage of True Responsivedocuments that are missed

.4

Note that FNRplus Recalleqauls 100%

AKA Miss Rate

True Positive Ratedraft20© fishandink.com

AccuracyThe percentage of documents that are correctly coded

(A+E)/IA

E

I

(3+4)/1070%

Accuracy is 100% - Errorminus

In highly prevalent or rich data sets (Or sets with extremely low prevalence or richness), Accuracy is a poor measure. Consider a set with 95 percent nonresponsive documents - 95 percent accuracy can be achieved by marking everything nonresponsive.


ErrorThe percentage of documents that are incorrectly coded

D

B

I

(B+D)/I (2+1)/1030%

Error can also be calculated: 100% – Accuracyminus

The warning regarding extremes of prevalence or richness applies to Error as well. The utility of Error as a search metric goes down as richness gets extremely high or low.


A

DG

Flase Alarm RateThe percentage of Search Responsive documents that aretruly nonresponsive.

D/G

False Positive Rate

This metric does not care about the null set.draft

23© fishandink.com

Explaining search metrics marked

Documents

Transcript of Explaining search metrics marked