Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

24
Best Practices: eDiscovery Search Improve Speed and Accuracy of Reviews & Productions with the Latest Tools February 27, 2014 Karsten Weber Principal, Lexbe LC eDiscovery FAST

Transcript of Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Page 1: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices: eDiscovery Search

Improve Speed and Accuracy of Reviews & Productions with the Latest Tools February 27, 2014

Karsten Weber Principal, Lexbe LC

eDiscovery FAST

Page 2: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

eDiscovery Webinar Series

○ Takes Place Monthly

○ Cover a Variety of Relevant eDiscovery Topics Next Month:

Legal Timelines and Early Case Assessment

○ Presentations Available for Download by Registrants.

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Info & Future

eDiscovery FAST

Page 3: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

If you have any questions or technical issues, please e-mail them to:

[email protected]

Questions will be forwarded to Karsten and answered during the webinar or via e-mail if we run out of time.

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

eDiscovery Webinar SeriesQuestions & Technical Issues

eDiscovery FAST

Page 4: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

○ Current- Principal of Lexbe LC- Principal Architect of Lexbe eDiscovery Suites and Lexbe eDiscovery Services

○ Prior Experience- Consulting Expert, Lumin Expert Group- Director of Software, nLine Corporation- Software Engineering Manager, KLA-Tencor

○ Education

- MBA, University of Texas- M.S. Engineering, Danish Technical University

Karsten Weber bioeDiscovery Webinar Series

Contact Karsten [email protected]

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Page 5: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Use of Keyword Search In Discovery○ Early Stage Culling - Reduce amount of ESI to be reviewed by using

keywords to cull document collections.

○ Keyword-Based Responsive & Privilege Review - Construct search queries to return documents that are likely to be responsive, confidential. Search by name and email of counsel; privilege, work-product, confidential and related keywords.

○ ID Documents for Depo Prep - Find and assign key documents related to specific case participants to prepare for depositions. Search by email addresses used, names and nicknames used, important issues associated with deponent.

○ ID of Key Docs for Trial - Find and mark key case documents. Code documents that will be needed for trial.

eDiscovery FAST

Page 6: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Pros of Keyword Searching

eDiscovery FAST

○ Fast - Keyword search is very fast compared with other document search methodologies.

○ Inexpensive - Good results can be obtained at little cost compared with manual review or other computer assisted methodologies.

○ Quality - Search can deliver high quality results, particularly if keyword terms are carefully developed and tested.

○ Avoids Manual Review Errors/Inconsistencies - Search results are computer generated, and so avoid known human review errors that can result from fatigue, inadequate training, lack of focus, etc.

Page 7: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Cons of Keyword Searching

eDiscovery FAST

○ Search Can be Over or Under-Inclusive - Search terms can bring back too many junk results or miss good results. These are known as ‘false positives’ and ‘false negatives’.

○ Difficulty of Creating Good Search Terms - Constructing good search terms takes design time, testing, iterations, and analysis.

○ Non-Searchable Text - Search results can only be as good as the underlying searchable text. ESI collections and review tools can miss text that a human reviewer might catch for a variety of reasons.

○ Some file types can’t be indexed - There is little consistency in what files can be indexed across litigation databases.

Page 8: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Construct Quality Searches○ Start with Request for Production - Translate the demands of the RFP into

a keyword search strategy.

○ Interview Custodians - Ask key case participants / data custodians about their ESI. Use their insights and their terminology to find obscure key documents.

○ Include Jargon - Seek out industry or company, company sub-culture specific terms you may not be familiar with.

○ Included Misspellings - Include misspelled versions of keywords or (use ‘fuzzy search’ settings or boolean limiters) in your search string to account for emails, etc. with typos.

eDiscovery FAST

Page 9: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Use Search ExpandersSearch Expanders Enable Easy Expansion to Reduce False Negatives

○ Concept - Thesaurus lookup and synonym search. Conceptually expands search query.

○ Stemming - Expands query to include derivative terms associated with the search keywords.

○ Fuzzy - insertion deletion, or substitution of a character in the search query to account for search error, spelling errors within the document, and potential OCR error

○ Phonetic - Returns results that sound similar to the search query.

eDiscovery FAST

Page 10: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Use Search ExpandersConcept Search Example

eDiscovery FAST

‘Trade’ = ‘Swap’ = ‘quid pro quo’

Page 11: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Use Search ExpandersStemming Search Example

eDiscovery FAST

‘Trade’ = ‘Trading’ = ‘Trades’

Page 12: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Use Search ExpandersFuzzy Search Example - Misspelling

eDiscovery FAST

‘Fastow’ = ‘Fastaw’ = ‘Fasto’

Page 13: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Use Search ExpandersBoolean Search

eDiscovery FAST

○ Basic Boolean Operators: - AND: returns results including both terms - OR : looking for at least one of a list of terms - NOT : exclude terms you don’t want - ( ) : can be used to separate OR statements from the rest of the

boolean string. - PRE/n : First search term does not precede the second term by more

than n words. - Wildcard Characters: ‘*’ replaces a letter in your search term, ‘!’ allows

for stemming search within a boolean query

Page 14: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Use Search LimitersSearch Limiters Reduce False Positives (Noise)

○ Filter Out Unneeded File Types. Some file types are unlikely to lead to useful information and can be excluded.

○ Use Boolean Modifiers to Limit Overly Expansive Searches - Boolean modifiers can reduce the number of documents returned from a query while increasing the relevance of those files. Exclude certain words or combinations, and specify word order.

eDiscovery FAST

Page 15: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Use Search LimitersBoolean Search Example

eDiscovery FAST

‘Lay’! w/25 ‘Chewco’

Page 16: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Test Keyword Searching Results○ Look at Results Returned. Searching without review and testing may result

in low quality results.

○ Sample & Look for Ways to Limit Search - Create new queries that reduce false positives.

○ More new keywords. - Viewing search results may prompt the discovery of additional keywords that could be used to expand or reduce search queries.

○ Fuzzy and Concept Search - New keywords found by searching and returning synonyms and near identical words.

Keyword searching becomes an iterative process.

eDiscovery FAST

Page 17: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

There Are Traditionally Two Types of Search Indices:

○ Imaged and OCRed - The search text is coming from the files after they have been converted to TIFF / PDF.

○ Extracted Text - The search text is coming from text extracted from the original file.

Both approaches have significant limitations.

Best Practices for Keyword SearchCommon Indexing Methods

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014 eDiscovery FAST

Page 18: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

○ Description - Native files (email, attachments, spreadsheets, etc.) are converted to a paginated image file and then OCR is applied to make the text searchable. (ex. TIFF production with no extracted text).

○ How? - Conversion software uses a ‘print-driver approach’ to virtually image what would have been physically printed.

○ Data Not Indexed - Headers/footers/notes, comments and revisions, highlighted text, hidden sheets or text, print selections, applied filters,

Best Practices for Keyword SearchSearch Index Based on OCR of Imaged Files

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014 eDiscovery FAST

Page 19: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014 eDiscovery FAST

Search Index Based on OCR of Imaged Files

‘Chewco 2000 Pro Forma Sheet’

‘Body Text’

OCR Based Index Will Include:How Doc Appears Natively:

Page 20: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

○ Description - Available text from Native files (email, attachments, spreadsheets, etc.) is extracted and indexed by the search engine using text parsing. (ex. pure native review)

○ How? - Only available text is used. There is no OCR applied.

○ Data Not Indexed - Non-text files (ex. scanned documents) and embedded text, objects, or visuals will not be indexed. Different native extraction methods can also vary in their ability to recognize certain types of text.

Best Practices for Keyword SearchSearch Index Based on Native Extraction

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014 eDiscovery FAST

Page 21: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014 eDiscovery FAST

Search Index Based on Native ExtractionNative Extraction Index Will Include:How Doc Appears Natively:

Page 1/12

Chewco 2000 Pro Forma Balance Statement Sheet [S1: CRITICAL ENRON EVIDENCE]

Page 1/12

Page 22: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Dual Index

Benefits of Dual Index Approach

eDiscovery FAST

○ The Lexbe search engine indexes both text extracted from Native files (email, attachments, spreadsheets, etc.) and a paginated file converted from Native files into PDF or TIFF and OCRed.

○ Most comprehensive approach minimizes potential for lost and unsearchable data.

Index MethodCaptures

Embedded Text

Captures Text Excluded From

PrintCaptures

Hidden Text

Imaged/OCR Yes No No

Native Extraction No Yes Yes

Lexbe Dual Index Yes Yes Yes

Page 23: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Best Practices for Keyword Search

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Dual Index

eDiscovery FAST

Page 24: Lexbe eDiscovery Webinar- Best Practices: Advanced eDiscovery Search

Thank You for AttendingAbout Lexbe and Contact Information

Best Practices for Keyword Search, Lexbe eDiscovery Webinar Series February 27, 2014

Phone (Toll Free) (800) 401-7809

Webinar Questions: [email protected]

eDiscovery FAST

Next Month’s Webinar: Legal Timelines and Early Case Assessment

Lexbe is an eDiscovery software and services provider based in Austin, TX.