Demystifying Predictive Coding Technology

15
Demystifying Predictive Coding Technology Date: Wednesday, August 13, 2014 Time: 1 p.m. ET / Noon CT / 11 a.m. MT / 10 a.m. PT Anita Engles, VP Products and Marketing Daegis Doug Stewart, VP Sales Support Daegis
  • Upload

    daegis
  • Category

    Law

  • view

    132
  • download

    0

Transcript of Demystifying Predictive Coding Technology

Page 1: Demystifying Predictive Coding Technology

Demystifying Predictive Coding Technology

Date: Wednesday, August 13, 2014

Time: 1 p.m. ET / Noon CT / 11 a.m. MT / 10 a.m. PT

Anita Engles, VP Products and Marketing Daegis

Doug Stewart, VP Sales Support Daegis

Page 2: Demystifying Predictive Coding Technology

TAR Defined

A process for prioritizing or coding a collection of

electronic documents using a computerized

system that harnesses human judgments of one

or more Subject Matter Expert(s) on a smaller set

of documents and then extrapolates those

judgments to the remaining Document Population.* Grossman & Cormack 2012

Page 3: Demystifying Predictive Coding Technology

The TAR Frontlines

• Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery (2014)

• Maura R. Grossman and Gordon V. Cormack• http://cormack.uwaterloo.ca/cormack/calstudy/

Page 4: Demystifying Predictive Coding Technology

Key Findings

• Non-Random Selection Methods Work Best for Seed Set

• Active Learning Better than Passive Learning

• Senior Level Subject Matter Experts are NOT Required to Train System

Page 5: Demystifying Predictive Coding Technology

TAR Steps

Process Overview

ProducingTrainingAssessing Results

Creating the Seed

Set

Keyword Searching

Relatedness Scoring

Identifying the

Population

Page 6: Demystifying Predictive Coding Technology

Relatedness Scoring

Building the Map

• Build the MapStep

• Measure Relationships

Purpose

• AlgorithmsVariations

• Core to Predictive Functionality

Why It Matters

Page 7: Demystifying Predictive Coding Technology

Keyword Searching

Tried and True

• Validated & Iterative Keyword Searching

Step

• Inexpensive TrainingPurpose

• Not used in All ApproachesVariations

• Drives EfficiencyWhy It Matters

motorcycle or bike AND ((throttle or accel*) w/10 stick)

Page 8: Demystifying Predictive Coding Technology

Seed Set

Building the Seed Set

• Review Strategically Sampled Docs

Step

• Generates High-level Relevancy “Heat Map”

Purpose

• Random, Strategic, Judgmental Samples

Variations

• Drives EfficiencyWhy It Matters

Page 9: Demystifying Predictive Coding Technology

Predicting Responsiveness

The Prediction Engine

Prediction Engine

Relatedness Map

Seed Set / Search

TrainingDefinitely

Predictive Calls

Responsive?Definitely Not

The three categories of information we know are fed into the system’s algorithm, which evaluates the data to score the likelihood of each document’s being responsive.

Page 10: Demystifying Predictive Coding Technology

Assessing the Results

Building the Answer Key

•Assess Accuracy Based on Industry Standard Metrics Step

•Informs Decision to Stop TARPurpose

•Simple and Stratified Sampling

•Sample Once or Multiple Times

Variations

•DefensibilityWhy It

Matters

Definitely

Predictive Calls

Responsive?Definitely Not

Page 11: Demystifying Predictive Coding Technology

Training / Learning

Continual Refinement

Definitely

Predictive Calls

Responsive?Definitely Not

Refining keyword searches and manually reviewing documents with highest levels of uncertainty moves docs from the middle toward the endpoints.

• Reviewers Train and System LearnsStep

• Transfer Subject Matter Expertise to TAR System

Purpose

• Active Learning• Passive LearningVariations

• Dramatic Cost SavingsWhy It

Matters

Page 12: Demystifying Predictive Coding Technology

Post-TAR

Producing the Responsive Documents• Terminate TAR Review

• Decision based on Accuracy and Cost Metrics

• “Stabilization”• Harvest Predicted Calls• Review Responsive Docs• Sample Non-Responsive Docs• Document Entire Process

Page 13: Demystifying Predictive Coding Technology

Accuracy Metrics

How Accuracy is MeasuredTAR improves the F1 score by moving documents from false (incorrect) bins to the true bins where they belong.

Page 14: Demystifying Predictive Coding Technology

Selected TAR Bibliography

TAR Resources1. Search, Forward: Will Manual Document Review and Keyword

Searches be Replaced by Computer-assisted Coding? (2011)• Judge Andrew Peck• http://www.law.com/jsp/lawtechnologynews/PubArticleLTN.jsp?id=12

025165305342. Technology-Assisted Review in E-Discovery can be More Effective

and More Efficient than Exhaustive Manual Review (2011)• Maura R. Grossman and Gordon V. Cormack• http://jolt.richmond.edu/v17i3/article11.pdf

3. Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (2012)

• RAND Institute for Civil Justice: Nicholas M. Pace, Laura Zakaras

• http://www.rand.org/pubs/monographs/MG1208.html#abstract

Page 15: Demystifying Predictive Coding Technology

15

Thank You!

Q&A