Still on Stage: Boolean Search. Your Speakers Speaker: Richard Cheng Richard Cheng, CISSP, CISA,...

Post on 28-Mar-2015

213 views 0 download

Tags:

Transcript of Still on Stage: Boolean Search. Your Speakers Speaker: Richard Cheng Richard Cheng, CISSP, CISA,...

Still on Stage: Boolean Search

Your Speakers

Speaker: Richard Cheng Richard Cheng, CISSP, CISA, directs digital forensics and e-

discovery cases and consults on IT audits, governance and compliance.  His experience includes the collection and processing of unique and/or proprietary ESI (Apple devices, mobile devices, collaboration sites, and the cloud).  Richard has provided testimony as a neutral expert and technology authority. He has two M.S. degrees from the University of New Haven and a B.S. from MIT.

Speaker: Megan Bell Megan Bell directs data analysis projects.  She is

experienced in the analysis of complex data sets, search and reporting technology and the automation of workflows that increase efficiency and deliver better outcomes.  Her case experience includes data/security breach, IP theft, insurance, and employment matters.  She also has extensive experience in the development and launch of new product technologies.  She has a degree in Chemical Engineering from WPI.

Speaker: Shawnna Childress, P.I.

Overview: Boolean Search Early eDiscovery

famous moments Martha Stewart

voicemail Lehman Brothers’

bankruptcy Merrill Lynch

analyst emails on “junk” investments

It’s not just e-discovery.

Universe of Search Types of Data

Sources: Databases, Email, Files, SharePoint Locations: Local computer, server, backup,

mobile device Search Technologies:

dtSearch Lucene Grep SQL

Automated “predictive” methods/ neural nets

Why Boolean?

Boolean search: Character-based searching. Toolbox of relationship connectors and

limiters to broaden or narrow search Benefits:

Identify important words/ phrases and how used

Research “written” language context and relationship

Easily vary breadth and scope of search Customizable search

Overview of Boolean Search Construction Boolean connectors

AND, OR, NOT

Overview of Boolean Search Construction Other Boolean elements

Proximity, Stemming, Fuzzy Searching Parentheses Wildcards Numeric terms and ranges Fields (i.e., email address)

Differences in Boolean connectors AND versus Proximity Stemming versus Wildcard use

Overview of Boolean Search Construction

Overview of Boolean Search Construction for Foreign Languages

Foreign LanguagesHow will you handle the multiple foreign languages?

Example: Chinese DialectsGan - 赣语 / 贛語 31 millionGuan (Mandarin) - 官话 / 官話 836 millionHui - 徽語 3.2 millionJin - 晋语 / 晉語 45 millionKejia (Hakka) - 客家話 34 millionMin - 閩語 / 闽语 60 millionWu - 吴语 / 吳語 77 millionXiang - 湘语 / 湘語 / 湖南话 / 湖南話 36 millionYue - 粵語 / 粤语 71 millionUnclassified not determined

Optimizing Boolean Search Statement Construction

1. Invest time in identifying relevant search terms and phrases.

2. Determine which search terms to search in combination.

3. Use the most appropriate Boolean logic.4. Adjust Boolean search statements to

account for variations in search term wording, spellings and abbreviations.

5. Modify Boolean search statement when special characters are present.

Examples

1. Capturing the Variation for a Word

Example: eDiscovery

Boolean:“e-Discovery” OR eDiscovery OR “electronic discovery” OR electronic w/1 discovery

2. Searching for Unique Phrases

Example: Search for the ratio 1:1

Boolean: 1?1 AND (NOT(101 OR 111 OR 121 OR

131 OR 141 OR 151 OR 161 OR 171 OR 181 OR 191))

3. Simplifying Complex Compound Phrases Example:

(“product rollout “ OR “product release”) AND (China OR Japan OR Korea OR Asia OR ASEAN OR Taiwan OR Hong Kong)

Boolean: (“product release”) AND (China OR Japan OR Korea

OR Asia OR ASEAN OR Taiwan OR Hong Kong) (“product rollout “) AND (China OR Japan OR Korea

OR Asia OR ASEAN OR Taiwan OR Hong Kong)

4. When Dates are Search Terms

Example: 1/6/11

Boolean: “1?6?11” OR “!1?6?2011” Others?

5. Compound Words

Example: Watch-out

Boolean: Watchout OR Watch?out “watch out”?

6. Noise Filter Issues

Example: The The

Boolean: “The The”

7. Improving Search Results for an Overused and Important Word Example:

When “confidential” is important as a search term and overused

Boolean: confidential AND NOT (“communication is confidential”

OR “confidentiality notice” OR “confidential personal”) confidential AND NOT (confidential w/3 communication) confidential AND NOT (confidential w/3 notice) confidential AND NOT (confidential w/3 personal)

Statistical Sampling Recent court opinions suggest that sampling as used in

Assisted Review is not only useful but may be required in certain cases. Several decisions in the past few years have penalized lawyers for not sampling documents before they were produced (waiver of privilege) and for not sampling the documents that were not produced (omission of responsive data).  In two landmark decisions, U.S. Magistrate Judges John M. Facciola and Paul W. Grimm issued key rulings discussing sampling. Specifically, they criticized counsel who hoped to be excused for inadvertent waiver of privilege because they did not sample the documents produced after key-word searches.

United States v. O’Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008) (Judge Facciola)

Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008) (Judge Grimm)

Smoking Gun

Even more recently, another court found waiver of privilege in a “smoking gun” attorney-client communication because counsel failed to sample.

Mt. Hawley Ins. Co. v. Felman Prod., Inc., 2010 WL 1990555 (S.D. W. Va. May 18, 2010)

Q&A