Lexbe eDiscovery Webinar- Best Practices: NearDup

Best Practices: NearDup

Gene AlbertPrincipal, Lexbe LC

Using Near Duplicate ID to Detect Key Docs, Protect Privilege & Speed Reviews

July 17, 2014

eDiscovery Webinar Series

○ Takes Place Monthly

○ Cover a Variety of Relevant eDiscovery Topics

○ Presentations Available for Download by Registrants.

Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014

Info

eDiscovery Webinar Series

Lexbe is an Austin, TX based eDiscovery software and services provider.

○ Lexbe eDiscovery PlatformLexbe eDiscovery Platform is a hosted eDiscovery processing and review tool. Users can load a variety of file types, process for review, OCR for search, and conduct document reviews, productions, prepare for depos & analyze transcripts, conduct case analytics, prepare for dispositive motions, and provide litigation support during trial.

○ Lexbe eDiscovery Services Lexbe performs large volume document culling, processing from native to PDF or TIFF, load file creation, high-volume OCR of image files, Rule 26 and project management consulting, and related eDiscovery Services.

About Lexbe

Lexbe Sales [email protected]

(800) 401-7809 x22


mailto:[email protected]

If you have any questions or technical issues, please e-mail them to:

[email protected]

Questions will be forwarded to Gene and answered during the webinar or via e-mail if we run out of time.

eDiscovery Webinar SeriesQuestions & Technical Issues



○ Principal of Lexbe LC, a provider of cloud-based litigation review and document management software & eDiscovery services.

○ Prior business experience in software, medical services and internet-based businesses. Prior legal experience as in-house counsel and in private practice.

○ Frequent speaker and author on eDiscovery and legal technology issues.

○ EducationMBA, University of Texas (2005)JD, Southern Methodist University (1983)BA, University of Texas (1979)

○ Contact Gene [email protected]


eDiscovery Webinar SeriesGene Albert Bio

Near Duplicate Detection

○ What is Near Duplicate Identification?

○ When is ‘NearDup’ Needed?

○ Inadvertent Privilege Release Example

○ Using ‘NearDup’ to:■ Group Similar Documents■ Find More Key Documents■ Enable Email Threading■ Prevent the Inadvertent Release of Privileged Information

○ NearDup Groupings+ service options from Lexbe


Agenda

What Is It?

○ NearDup technology automatically recognizes similar documents within an e-discovery document collection

○ Algorithm analyzes, evaluates and compares the actual text content of the documents to each other



Unstructured Documents NearDup Groupings

What Does It Do?



NearDup technology will group similar documents, even though not exactly the same. Examples include:

○ Separately scanned documents.

○ Multiple versions of a Word document that are slightly different due to minor edits, reformatting, etc.

○ An original document and one with handwritten notes on it.

○ Emails and responses that continue a conversational ‘chain’ or ‘thread’.

Data Types and Volume Keep Growing

Digital Information Created, Captured, Replicated Worldwide4

3

2

1

2005 2010 2015Source: IDC Digital Universe Study (2012)* 1 Zettabyte = 1 Trillion Gigabytes

Zettabytes*

2.8 zettabytes of information were created and replicated during 2012, a 56% increase from 2011 (IDC)

VoipEmail

iPhones Peer-to-Peer

Online StorageDigital Cameras

Facebook | LinkedIn DropBox | Backup Devices

Elastic Storage | SaaS | Google StreetsPersonal Blogs | Skype | World Satellite Images

Personal Scanners | Customer Service Recordings Public Webcams | Google Goggles | Netbooks | Cloud Instance Servers | PaaS

Need for Near Duplicate Detection


Main Applications of NearDup

There are 4 main applications of NearDup analysis:

1) Grouping similar documents:○ Bunch highly similar documents together for more efficient coding

and review

2) Finding hidden ‘key’ or ‘hot’ docs:○ Retrieve and mark unseen documents that have content highly

related to existing ‘hot’ or ‘key’ documents

3) Preventing the inadvertent release of privileged information○ Be automatically alerted to files containing similar content to

documents that have already been coded as privileged

4) Enable email threading:○ Maintain relationships between email conversations

Do I Need Near Duplicate Detection?


Applying Near Duplicate DetectionLarge Groupings Accelerate Review

Feature DescriptionReport identifies Near Dup Groups in a case based on extracted or OCRed text

Benefits⃝ Accelerate document review by batch coding (using multidoc edit) larger groups

⃝ Increase coding consistency of batched documents

⃝ Reduce privilege errors


Applying Near Duplicate DetectionFind Similar Versions of Key Documents

ExampleSimilar versions of a Key Document are shown in the Document Viewer

Benefits⃝ Follow the trail from one key document to others.

⃝ Find key documents that would otherwise be missed


Prevent Inadvertent Privilege Release

Setup & Planning Collection Culling &

Analysis Processing Depos & Motions

Review & Production

Beware of Inadvertent Privilege Release

○ Larger cases have put a strain on accurate privilege review.

○ Finding 9 versions of a privileged document doesn’t help if you release version 10.

○ Nothing is more costly than compromising or losing a case because of privilege disclosure.

○ Claw-back agreements a good idea, but no panacea. “You can’t unring a bell.”

Applying Near Duplicate Analysis


Prevent Inadvertent Privilege ReleaseApplying Near Duplicate Analysis

Example Case: Thorncreek Apartments III, LLC v. Village of Park Forest (N.D. Ill. 2011)

○ At issue were six documents produced by Defendants to Plaintiffs, but attorney-client privilege was claimed

○ Court determined that the Defendants were negligent by failing to check the production database created by a third-party e-discovery vendor before it became available to opposing counsel

○ Court found waiver, relying in part on long period of time after production before attempting to clawback documents and failure to timely prepare a privilege log.

○ Even if the court allowed clawback, the sensitive information would have already been disseminated.



Setup & Planning Collection Culling &

Analysis Processing Depos & Motions

Review & Production

Minimizing Risk of Privilege Release

○ Understand the Privilege Review process undertaken in detail.

○ Build dictionary of privileged sources and issues early in doc review.

○ Check for: untrained or sloppy review; unsearchable documents; incomplete search indices; poor redaction procedures; search not done in metadata and full-text; privilege text retained in natives, text files, load files, text-based PDFs.

○ Use specialized computerized privilege checks for container (email family) consistency, exact-dup and near-dup identification.

Applying Near Duplicate Analysis



Example

⃝ Privileged documents found 9 out 10 times, but one missed

Benefit⃝ Find privileged documents with text similarity that can be easily missed otherwise

Applying Near Duplicate Detection


Applying Near Duplicate DetectionCatch Privilege Inconsistencies

Feature DescriptionReport identifies inconsistently coded privilege and work product codings

Benefits⃝ Reduce privilege errors

⃝ Avoid sole reliance on human coding consistency

⃝ Establish safeguards to help maintain privilege


Applying Near Duplicate DetectionEmail Threading

Feature DescriptionGroup email messages that have similar text representing a conversation thread

Benefits

⃝ View email chains with similar text in date & time order

⃝ Avoid confusion of emails only tangentially related (<50% text overlap)

⃝ Consistently code email chains for responsiveness, privilege, attorney-eyes only, etc.


Included with Lexbe eDiscovery PlatformApplying Near Duplicate Analysis

○ Near Duplicate Identification is included at no additional cost in Lexbe eDiscovery Platform.


○ You can automatically apply ‘NearDup’ to documents you self-upload into the platform to group similar documents and review for privilege coding consistency.

Applying ‘NearDup’ in The CloudLexbe eDiscovery Platform

● Self-administration● Native (Office, etc.) processing● Automatic OCR● Early case analysis● Dual-index search● Exact & near-dup ID● Doc Review & issue tagging● Blended productions● Transcript management● Timelining, depo prep● Dispositive motions● Trial document management

Cloud-based litigation document management software

FEATURES


Included in Processing ServicesApplying Near Duplicate Analysis


We apply NearDup Groupings+ to the following processing services at no additional charge:

○ Native Processing+ (TIFF) Convert Outlook, Microsoft Office, and other native file types for review in in-house TIFF-based systems

○ Native Processing+ (PDF)Convert Outlook, Microsoft Office, and other native file types into searchable PDFs for review

○ Native Extraction+ Prepare case data for native or near native review

Security & Data Ownership

What to look for in litigation cloud service offerings:

○ EncryptionData encrypted (256-bit or above) in-place and in-transit.

○ Data Center CertificationsData centers should be certified, follow industry best standards, etc.

○ Clear Ownership RightsService agreements should clearly acknowledge client data ownership.

○ Redundant Back-Ups; RecoveryService provider should have robust and redundant backup & recovery protocols.

Applying ‘NearDup’ in The Cloud


Summary

Use ‘NearDup’ to Improve Doc Reviews


○ Faster ReviewGroup Incoming Documents by Similarity for faster, more efficient coding.

○ Find Hot DocsFind hidden ‘hot’ documents with similar content to files you’ve already marked as being particularly important to a case.

○ Prevent Privilege ReleaseIdentify documents containing privileged information that haven’t been consistently tagged before producing them to opposing counsel

○ Better Email ReviewEasily and coherently review through email conversations threads with different custodian sources.

Thank YouContact Info

Gene Albert:Lexbe Principal

[email protected](512) 686-3382

Stu Van Dusen:Marketing Manager

[email protected](512) 843-7672

Lexbe Sales: [email protected](800) 401-7809 x22

Webinar Questions: [email protected]










Lexbe eDiscovery Webinar- Best Practices: NearDup

Law

Transcript of Lexbe eDiscovery Webinar- Best Practices: NearDup