Scientists See Promise in Deep-Learning Programs
Microsoft Seeks an Edge in Analyzing Big Data
Jeff Hawkins Develops a Brainy Big Data Company
Google Offers Big-Data Analytics
The Age of Big Data
How Big Data Became So Big
Why Hire a Lawyer? Computers Are Cheaper
Armies of Expensive Lawyers, Replaced by Cheaper Software
The total amount of digital data in the world is estimated toexceed 1.8 Zettabytes (1.8 TRILLION Gigabytes))
The digital universe is doubling every 2 years
85% of that data is owned or controlled by corporations at some point in its lifecycle
Source: International Data Corporation (IDC) Study, 2012
Big Data is HereAnd it’s coming soon to a litigation
near you…What’s changed?
The Great Comminglin
g
Redefining scalability in eDiscovery.
1
1000
1 X 1012
Predictive Coding is a Form of Machine Learning
What is Machine Learning?
voice recognition software, e.g., calling your bank or credit card company
handwriting, facial or fingerprint recognition
analyzing market trends and guiding investment decisions
making decisions on applications for credit or loans
modeling and predicting severe weather patterns
filtering spam in your email inbox
targeted marketing on the internet
robotics
It’s already a part of our lives. . .
KEY POINT: Predictive coding is just a part of a continuum of technology assisted review (TAR) methods that we are already very familiar with in searching and analyzing data.
Key WordsConcept
ClusteringConcept Search
Predictive Coding
Three supporting propositions:
1. Each successive approach incorporates the preceding approaches.2. Each successive approach contains more supporting criteria.3. All are ultimately based on the concept of pattern matching.
Key Words = Simple pattern matching
External input:“wild,” “wolf,” “pet”
dog
cat
rhino
ferretgoldfish
cow
wolfdomesticwild
pet
Concept Clustering = Organization based on internal relationships
dog
cat
domesticated
wild
pet
rhino
ferret
goldfish
cow
wolf
tiger
dog
cat
domesticated
wild
pet
rhinoferret
goldfishcow
wolf
tiger01110111011010010110110001100100 (wild)
011001000110111101100111 (dog)
011100000110010101110100 (pet)
Concept Searching
dog
cat
rhino
ferretgoldfish
cow
wolfdomesticwild
pet
dog cat
rhino
ferretgoldfishcow
wolf
domesticatedwild
pet
tiger
= Key words + Concept organization
External input:“zoo,” wild,” “domesticated”
farm
zoo01111010011011110110111 (zoo)
01110111011010010110110001100100 (wild)
0110010001101111011011010110010101110011011101000110100101100011011000010111
01000110010101100100 (domesticated)
Predictive Coding
dog
cat
rhino
ferretgoldfish
cow
wolfdomesticwild
pet
dog cat
rhino
ferretgoldfishcow
wolf
domesticatedwild
pet
tiger
= document-level input + probabilistic modeling
farm
zoo
external input:human-coded documents
output: doc-level probability rankings
01111010011011110110111 (zoo)
01110111011010010110110001100100 (wild)
0110010001101111011011010110010101110011011101000110100101100011011000010111
01000110010101100100 (domesticated)
InferStep 1. sample documents from entire set.
Step 2: attorney review of sample documents to create training and control set.
In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the Common Era), wolves in literature are portrayed as wicked villains and long-fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination.
Can the wolf be domesticated?
The domesticated dog isdescended from the wolf found in the wild.
While some people have occasionally attemptedto raise wolves as pets, their2 ½ inch fangs and tendencyto eat nearby small animals such as cats can create socially awkward situations withneighbors.
Responsive
Not Responsive
Step 3: create model from human coded training set (responsive and not responsive).
In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the Common Era), wolves in literature are portrayed as wicked villains and long-fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination.
Can the wolf be domesticated?
The domesticated dog isdescended from the wolf found in the wild.
While some people have occasionally attemptedto raise wolves as pets, their2 ½ inch fangs and tendencyto eat nearby small animals such as cats can create socially awkward situations withneighbors.
Can the wolf be domesticated?
The domesticated dog isdescended from the wolf found in the wild.
While some people have occasionally attemptedto raise wolves as pets, their2 ½ inch fangs and tendencyto eat nearby small animals such as cats can create socially awkward situations withneighbors.
wolves
wolf
pet
Word Pos. Neg.
wolf .98 .08
dog .56 .43
pet .42 .28
raise .61 .09
costner
dances
Word Assoc %
wolf pet .73
dog wolf .43
pet raise ..88
raise wolf .61
raise
werewolf
011001000110111101100111
011001000110111101100111
011001000110111101100111
011001000110111101100111
011001000110111101100111
011001000110111101100111
011001000110111101100111
Step 4: test model against sample (human coded) set.
"Dances With Wolves" has the makings of a great work, one that recalls a variety of literary antecedents, everything from "Robinson Crusoe" and "Walden" to "Tarzan of the Apes." Michael Blake's screenplay touches both on man alone in nature and on the 19th-century white man's assuming his burden among the less privileged.
Wolves are sometimes kept as exotic pets, and in some rarer occasions, as working animals. Although closely related to dogs (which are believed to have split from wolves between 10,000 and 100,000 years ago), wolves do not show the same tractability as dogs in living alongside humans. Wolves also need much more space than dogs, about 10- 15 sq. miles.
Yes
No
Apply model to remainder of documents that have not been reviewed
Responsive
Non-responsive
Step 5: Apply model to entire set and rank documents.100 %
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
PREDICTIVE CODING AND BIG DATA
NYLJ/Pangea3 WebinarApril 15, 2013
OUTLINE
1. Mitigating Big Data in E-Discovery2. Stakeholder Analysis3. The New Reality of Predictive Coding4. Long-Term Trends
MITIGATING BIG DATA IN E-DISCOVERY
Predictive Coding and Big Data
BIG DATA IN E-DISCOVERY
• Bigger haystack—more documents in general
• Corporate data culture—more relevant
documents
• More sources—poses collection/preservation
challenges
MITIGATING BIG DATA IN E-DISCOVERY
• Some mitigating factors:
• Principles of proportionality and cooperation
• Information governance tools and document management
• Technology-assisted review and predictive coding
STAKEHOLDER ANALYSISPredictive Coding and Big Data
PREDICTIVE CODING STAKEHOLDER ANALYSIS
• Judges: generally receptive
• Clients: cost efficiencies vs. risk management
• Lawyers: new model, building expertise
THE NEW REALITY OF PREDICTIVE CODING
Predictive Coding and Big Data
NEW REALITY OF PREDICTIVE CODING
Reduced Data Volumes
Increased Complexity and Density
Focused, High-Stakes Human Review
Battle of Expertise
Predictive Coding
LONG-TERM TRENDSPredictive Coding and Big Data
LONG-TERM TRENDS
• Over time, Big Data growth > predictive coding benefits
• Some document-by-document human review necessary
• Strategic nuances in a new discovery battleground
NEW YORK
Pangea3 LLC530 5th Avenue, 7th FLNew York, NY 10036
Tel. (US Main): +1-212-689-3819Fax: +1-212-820-9784
MUMBAI
Pangea3 Legal Database Systems Pvt. Ltd.102-B, Ground Floor, Leela Business ParkAndheri-Kurla RoadAndheri East, Mumbai 400 059, India
U.S. Line: +1-877-311-8528Tel.: +91-22-6191-7500Fax: +91-22-6191-7600
DALLAS
Pangea3 LLC2395 Midway RoadCarrollton, TX 75006
Tel. (US Main): +1-212-689-3819Fax: +1-212-820-9784
DELHI
Pangea3 Legal Database Systems Pvt. Ltd.B-23, Sector 58Noida UP 20 301, India
U.S. Line: +1-877-311-8528Tel: +91-120-425-5210/14/16Fax: +212-820-9783
CONTACT PANGEA3
31
SEARCH (1)
How do we search for discoverable ESI?• Manually?• With automated assistance?• Which is“better” and why?
– M.R. Grossman & G.V. Cormack, “The Grossman-Cormack Glossary of Technology-Assisted Review,” 7 Fed. Cts. Law R. 1 (2013)
– Maura R. Grossman & Gordon V. Cormack, “Technologically-Assisted Review in E-Discovery Can Be More Effective and More Efficient than Exhaustive Manual Review,” XVII Rich. J.L. & Tech. 11 (2011) (available at http://jolt.richmond.edu/v17i3/article11.pdf)
– For a “shorter” discussion, see Efficient E-Discovery, ABA Journal 31 (Apr. 2012)
32
SEARCH (2)
• Using search terms? How accurate are these? See In re National Ass’n of Music Merchants, Musical Instruments and Equipment Antitrust Litig., 2011 WL 6372826 (S.D. Ca. Dec. 19, 2011)
33
SEARCH (3)
Automated review or “predictive coding” as an alternative to the use of search terms. For decisions which address automated review, see:
• EORHB, Inc. v. HOA Holdings LLC, C.A. No. 7409 (Del. Ct. Ch. Oct. 15, 2012)
• In re Actos (Pioglitazone) Prod. Liability Litig., MDL No. 6:11-md-2299 (W.D. La. July 27, 2012)
• Da Silva Moore v. Publicis Groupe SA, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24), aff’d, 11 Civ. 1279 (ALC (AJP) (S.D.N.Y. Apr. 26, 2012)
• Global Aerospace Inc. v. Landow Aviation, L.P., Consol. Case No. CL 61040 (VA Cir. Ct. Apr. 23, 2012)
34
SEARCH (4)
WHAT LESSONS CAN BE DRAWN FROM THE DECISIONS?• Judge approved automated search at a “threshold” level.
“Results” may be subject to challenge and later rulings.• Threshold superiority of automated vs. manual review
recognized given volume of ESI and attorney review costs.• Large volumes of ESI in issue.• Party seeking to do automated review must offer
“transparency of process” or something close to it.• “Reasonableness” of methodology is key.• Speculation by the opposing party is insufficient to defeat
threshold approval.
35
SEARCH (5)
LET’S TAKE A DEEP BREATH AND RECAP WHERE WE ARE TODAY, VENDOR HYPE NOTWITHSTANDING:• We have yet to see a judicial analysis of process and
results in a contested matter.• Safe to assume that the proponent of a process will bear
the burden of proof (whatever that burden might be).• Safe to assume at least some transparency of process
may/will be expected.• If “reasonableness” is standard, how reasonable must
the results be? Is “precision” of 80% enough? 90%? Remember, there are no agreed-on standards.
36
INTERLUDE
Assume a party makes production of ESI based on search terms proposed by an adversary. Assume further that the adversary suspects “something” is missing.
Is suspicion enough to warrant direct access to the party’s databases by a consultant retained by the adversary?
If not, what proofs should be required?• Will an attorney’s certification or affidavit suffice?• Will/should the attorney become a witness?• Will experts be needed?Note, with regard to proofs, S2 Automation LLC v. Micron Technology,
Inc., No. 11-0884 (D.N.M. Aug. 9, 2012), where the court, relying on Rule 26(g)(1), required a party to disclose its search methodology.
37
INTERLUDEA collision between search and ethics?• Assume a party’s attorney knows that search terms proposed by
adversary counsel, if applied to the party’s ESI, will not lead to the production of relevant (perhaps highly relevant) ESI.
• Absent a lack of candor to adversary counsel or the court under RPC 3.4 (which implies if not require,s some affirmative statement), does not RPC 1.6 require the party’s attorney to remain silent?
• What if the “nonproduction” becomes learned later? If nothing else, will the party’s attorney suffer bad “PR” if nothing else?
• If the party’s attorney wants to advise the adversary, should the attorney secure her client’s informed consent? What if the client says, “no?”
(with thanks to the Hon. John M. Facciola)
38
INTERLUDE
AS WE THINK ABOUT SEARCH, THINK ABOUT THE ETHICS ISSUES THAT USE OF A NONPARTY VENDOR MAY LEAD TO!
Top Related