… .The ever increasing volume of ESI is a problem In a world of limited tools and resources…..
Searching the Haystack….
to find  all  the relevant needles…
ends up like searching in a maze…
Email is still the 800 lb. gorilla of ediscovery
To do “search” right, one has to know where relevant ESI may be found in the many branches of an organization
Is this still ‘best practice’ or have we moved beyond Boolean?  Example of Boolean search string from  U.S. v. Philip Morris, circa 2003 (((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboro AND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR ("brown & williamson") OR bat industries OR liggett group)
U.S. v. Philip Morris E-mail Winnowing Process & The Problem of Scale: (It’s Only Getting Worse Out There) 20 million     200,000     100,000     80,000     20,000  email  hits based  relevant  produced  placed on records  on keyword  emails  to opposing  privilege  terms used  party  logs (1%)     A PROBLEM: only a handful entered as exhibits at trial    A BIGGER PROGLEM: the 1% figure does not scale
A Hypothetical 1 billion emails, 25% with attachments Reviewed at 50 per hour Would take 100 people, 10 hrs per day, 7 days a week, 52 weeks a year …. 54 YEARS TO COMPLETE At $100/hr, $ 2 billion in cost Even 1% (10 million docs) … 28 weeks  and $20 million in cost …..
Judge Grimm writing for the U.S. District Court for the District of Maryland “ [W]hile it is universally acknowledged that keyword searches are useful tools for search and retrieval of ESI, all keyword searches are not created equal; and there is a growing body of literature that highlights the risks associated with conducting an unreliable or inadequate keyword search or relying on such searches for privilege review.”  Victor Stanley, Inc. v. Creative Pipe, Inc.,  250 F.R.D. 251 (D. Md. 2008);  see id., text accompanying nn. 9 & 10  (citing to Sedona Search Commentary & TREC Legal Track research project)
Judge Facciola writing for the U.S. District Court for the District of Columbia “ Whether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics.  See  George L. Paul & Jason R. Baron,  Information Inflation: Can the Legal System Adapt?',  13 RICH. J.L. & TECH.. 10 (2007)  *  *  * Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.” --  U.S. v. O'Keefe ,  537 F.Supp.2d 14, 24 D.D.C. 2008).
Judge Peck writing for the U.S. District Court for the Southern District of New York  William A. Gross Construction Associates Inc. v. American Manufacturers Mutual Ins. Co .,   2009 WL 724954 (S.D.N.Y. March 19, 2009) (in multi-million dollar dispute, where issue involved production of 3 rd  party emails where none of the three parties could agree on keyword search terms, court fashioned a compromise; court stated at the outset that “this Opinion should serve as a wake-up call to the Bar … about the need for careful thought, quality control, testing and cooperation with opposing counsel in designing search terms or ‘keywords’ to be used.”)
Judge Maas writing for the US District Court for the Southern District of New York in  Capitol Records, Inc. v. MP3Tunes, LLC, 2009 WL 2568431 (Aug. 13, 2009)   “ . . .[R]ather than sitting down with the Plaintiffs’ counsel to agre on search parameters and terms, MP3tunes’ counsel directed is client to conduct a search of MP3tunes’ emails . . . using the word ‘design’ as the only search term.  Remarkably, when I questioned the wisdom of that decision . . . MP3tunes’ attorney suggested that he actually considered this one-word search to be ‘overly broad.’  After I observed that MP3tunes’ unilateral decision regarding its search reflected a failure to heed Magistrate Judge Andrew Peck’s recent ‘wake-up-call’ regarding the need for cooperation concerning e-discovery . . . Counsel apologized for not having also used the word ‘development’ as a search term.”
Beyond Keywords: What Alternative Search Methods Are We Beginning to Encounter in Current Litigation? Greater Use Made of Boolean Strings Fuzzy Search Models Probabilistic models (Bayesian) Statistical methods (clustering) Machine learning approaches to semantic representation Categorization tools: taxonomies and ontologies Social network analysis Hybrid and fusion approaches Reference:  Appendix to The Sedona Conference® Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery (2007), available at  http://www.thesedonaconference.org   (link to publications)
inspect forage looking look for investigate explore SEARCH lookup seek examine hunt hunting Thesaurus c/o Herb Roitblatt
Droids Humanoid droids Nonhumanoid droids R-Series droids Droid Taxonomy C/o Herb Roitblatt http://starwars.wikia.com R5-D4 Astromech IG-88 Assassin R2-D2 Astromech C-3PO Protocol 2-1B Surgical T3-M4 Utility
 
What is TREC? Conference series co-sponsored by the National Institute of Standards and Technology (NIST) and the Advanced Research and Development Activity (ARDA) of the Department of Defense Designed to promote research into the science of information retrieval First TREC conference was in 1992 15 th  Conference held November 15-17, 2006 in U.S. in Gaithersburg, Maryland (NIST headquarters)
TREC Legal Track The TREC Legal Track was designed to evaluate the effectiveness of search technologies in a real-world legal context First of a kind study using nonproprietary data since Blair/Maron research in 1985  Hypothetical complaints and 100+ “requests to produce” drafted by members of The Sedona Conference® “ Boolean negotiations” conducted as a baseline for search efforts  New Interactive Task added in 2008 using Topic Authorities and a post-adjudication round Documents to be searched were drawn from a publicly available 7 million document tobacco litigation Master Settlement Agreement database In 2009, a second Enron data set was added as a separate task Participating teams of information scientists from around the world contributing computer runs, plus in 2008 and 2009 from legal service providers (12 in 2009).
TREC Legal Track: Topics RequestNumber: 52 RequestText: Please produce any and all documents that discuss the use or introduction of high-phosphate fertilizers (HPF) for the  specific purpose of boosting crop yield in commercial   agriculture.   Proposal: "high-phosphate fertilizer!" AND (boost! w/5 "crop yield") AND (commercial w/5 agricultur!)   Rejoinder:  (phosphat! OR hpf OR phosphorus OR fertiliz!)  AND (yield! OR output OR produc! OR crop OR crops) FinalQuery: (("high-phosphat! fertiliz!" OR hpf) OR  ((phosphat! OR phosphorus) w/15 (fertiliz! OR soil))) AND   (boost! OR increas! OR rais! OR augment! OR affect! OR effect! OR multipl! OR doubl! OR tripl! OR high! OR greater)   AND (yield! OR output OR produc! OR crop OR crops)   B:  3078
Beyond Boolean: getting at the “dark matter” ( i.e., relevant documents not found by keyword searches alone)
Nobody Finds Everything Source: TREC 2006 Legal Track
“ Boolean” Searches May Miss A Large Percentage of Relevant Documents Source: TREC 2007 Legal Track 78% of relevant documents were  only  found by some other technique
Boolean v. TREC Systems:  Results of Legal Track Years 1 and 2
Source: F.C. Zhao, D. W. Oard, and J.R. Baron, “Improving Search Effectiveness in the Legal E-Discovery Process Using Relevance Feedback” (forthcoming 2009) Improving Search Effectiveness Through Relevance Feedback and Multple Meet and Confers 1st Meet and Confer Second Meet and Confer
Interdisciplinary Approaches-- Three Languages: Legal, RM, and IT
Strategic challenges in our collective futures . . . . Convincing lawyers and judges that automated searches are not just desirable but necessary in response to large e-discovery demands.
Challenges (cont.) Designing an overall review process which maximizes the potential to find responsive documents in a large data collection (no matter which search tool is used), and using sampling and other analytic techniques to test hypotheses early on.
Challenges (cont.) Having all parties and adjudicators understand that the use of automated methods does not guarantee all responsive documents will be identified in a large data collection.
Challenges (cont.) Being open to using new and evolving search and information retrieval methods and tools.
Overarching Smart  E-Discovery Strategy ESPECIALLY FOR eDISCOVERY LITIGATOR-WARRIORS … EMRACING COLLABORATION WITH ADVERSARIES (TRANSPARENCY) See The Sedona Conference Cooperation  Proclamation, www.thesedonaconference.org
The leading rule for the lawyer, as for the man, of every calling, is diligence.  -- Abraham Lincoln
Ongoing Research & Reference TREC 2009 Legal Track / TREC 2010 Legal Track http://trec-legal.umiacs.umd.edu/ ICAIL 2009 Barcelona DESI III Workshop -- June 8, 2009 http://www.law.pitt.edu/DESI3_Workshop/ Workshop-in-Planning -- October/November 2010, San Francisco The Sedona Conference®  Search & Retrieval Commentary   (2007) & The Sedona Conference® Commentary on Achieving Quality in E-Discovery (2009)  (both available at  www.thesedonaconference.org )
Jason R. Baron Director of Litigation Office of General Counsel N ational Archives and Records Administration 8601 Adelphi Road # 3110 College Park, MD 20740 (301) 837-1499 Email: jason.baron@nara.gov
Best Practices in Keyword Searching  (Maura Grossman) Start with the Complaint: who are the custodians? What is the applicable time frame? What terms-of-art are employed? Translate the request into plain English Involve multiple people to get differing interpretations of the requests and potential keywords from different vantage points
Best Practices in Keyword Searching (cont.) Next: seek input from people who actually created, sent or received the documents Look at responsive documents for unique words or phrases.  In what context do those documents appear? Incorporate common misspellings, errors, variants and synonyms (utilize tools on web for this task) Determine irrelevant file types
Best Practices in Keyword Searching (cont.) Special search strategies for handwritten docs, drawings, facsimiles, password-protected or encrypted files Learn capabilities and limitations of your search tool Take a representative sample and test, test, test From both Hits and Misses pile Retest
Best Practices in Keyword Searching (cont.) Keep track of and document what you did to explain rationale for process or method applied Collaborate and communicate in good faith (See  The Sedona Cooperation Proclamation , available at  www.thesedonaconference.org )

More Related Content

PPT
Search Angels
PPTX
What's the fuss about all this metadata?
PPT
EDI 2009- Ethics and E-Discovery New and Emerging Issues
PPT
EDI 2009 E Discovery Issues In Business Closings, Downsizings And Bankruptcy
PPT
EDI 2009 Information Everywhere: Understanding New Technologies & Coping with...
Search Angels
What's the fuss about all this metadata?
EDI 2009- Ethics and E-Discovery New and Emerging Issues
EDI 2009 E Discovery Issues In Business Closings, Downsizings And Bankruptcy
EDI 2009 Information Everywhere: Understanding New Technologies & Coping with...

Similar to EDI 2009 Case Law Update (20)

PPTX
Jason Baron, Esq. and James Shook, Esq. - An Inevitable Reality: Machine-base...
PPT
Computer Assisted Review and Reasonable Solutions under Rule26
PDF
Can Law Librarians Help Law Become More Data Driven ? An Open Question in Ne...
PDF
Information Inflation Can The Legal System Adapt
PDF
How To Legally Beat Debt Collectors
PDF
Proposed Rule Aliens; Legal Assistance Restrictions Negotiated Rulemaking Wor...
PDF
Information Inflation Can The Legal System Adapt
PDF
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
PDF
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
PPT
Columbia University Institute for Tele-Information - Competition Policy for t...
PDF
Legal Analytics versus Empirical Legal Studies - or - Causal Inference vs Pre...
PDF
Artificial Intelligence and Law - 
A Primer
PPT
Fdsysforlscmfeb2010 100916084734-phpapp02
PPT
What Can Be Done Ip Litigation Prall
PPT
What Can Be Done Ip Litigation Prall
PDF
Hogeschool Den Haag Legal Analytics
DOCX
1 Foundations of Fintech, Spring 2019 FINAL EXAM Profe.docx
PPTX
Fair Use Lecture
PDF
The Patent Crisis And How The Courts Can Solve It Dan L Burk Mark A Lemley
DOCX
Directions Please answer three of the four following essay questi.docx
Jason Baron, Esq. and James Shook, Esq. - An Inevitable Reality: Machine-base...
Computer Assisted Review and Reasonable Solutions under Rule26
Can Law Librarians Help Law Become More Data Driven ? An Open Question in Ne...
Information Inflation Can The Legal System Adapt
How To Legally Beat Debt Collectors
Proposed Rule Aliens; Legal Assistance Restrictions Negotiated Rulemaking Wor...
Information Inflation Can The Legal System Adapt
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
Columbia University Institute for Tele-Information - Competition Policy for t...
Legal Analytics versus Empirical Legal Studies - or - Causal Inference vs Pre...
Artificial Intelligence and Law - 
A Primer
Fdsysforlscmfeb2010 100916084734-phpapp02
What Can Be Done Ip Litigation Prall
What Can Be Done Ip Litigation Prall
Hogeschool Den Haag Legal Analytics
1 Foundations of Fintech, Spring 2019 FINAL EXAM Profe.docx
Fair Use Lecture
The Patent Crisis And How The Courts Can Solve It Dan L Burk Mark A Lemley
Directions Please answer three of the four following essay questi.docx
Ad

Recently uploaded (20)

PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PPTX
Internet of Everything -Basic concepts details
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
Module 1 Introduction to Web Programming .pptx
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
future_of_ai_comprehensive_20250822032121.pptx
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
Basics of Cloud Computing - Cloud Ecosystem
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Training Program for knowledge in solar cell and solar industry
Lung cancer patients survival prediction using outlier detection and optimize...
Improvisation in detection of pomegranate leaf disease using transfer learni...
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Enhancing plagiarism detection using data pre-processing and machine learning...
LMS bot: enhanced learning management systems for improved student learning e...
Connector Corner: Transform Unstructured Documents with Agentic Automation
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
Internet of Everything -Basic concepts details
Ad

EDI 2009 Case Law Update

  • 1.  
  • 2. … .The ever increasing volume of ESI is a problem In a world of limited tools and resources…..
  • 4. to find all the relevant needles…
  • 5. ends up like searching in a maze…
  • 6. Email is still the 800 lb. gorilla of ediscovery
  • 7. To do “search” right, one has to know where relevant ESI may be found in the many branches of an organization
  • 8. Is this still ‘best practice’ or have we moved beyond Boolean? Example of Boolean search string from U.S. v. Philip Morris, circa 2003 (((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboro AND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR ("brown & williamson") OR bat industries OR liggett group)
  • 9. U.S. v. Philip Morris E-mail Winnowing Process & The Problem of Scale: (It’s Only Getting Worse Out There) 20 million  200,000  100,000  80,000  20,000 email hits based relevant produced placed on records on keyword emails to opposing privilege terms used party logs (1%)  A PROBLEM: only a handful entered as exhibits at trial  A BIGGER PROGLEM: the 1% figure does not scale
  • 10. A Hypothetical 1 billion emails, 25% with attachments Reviewed at 50 per hour Would take 100 people, 10 hrs per day, 7 days a week, 52 weeks a year …. 54 YEARS TO COMPLETE At $100/hr, $ 2 billion in cost Even 1% (10 million docs) … 28 weeks and $20 million in cost …..
  • 11. Judge Grimm writing for the U.S. District Court for the District of Maryland “ [W]hile it is universally acknowledged that keyword searches are useful tools for search and retrieval of ESI, all keyword searches are not created equal; and there is a growing body of literature that highlights the risks associated with conducting an unreliable or inadequate keyword search or relying on such searches for privilege review.” Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008); see id., text accompanying nn. 9 & 10 (citing to Sedona Search Commentary & TREC Legal Track research project)
  • 12. Judge Facciola writing for the U.S. District Court for the District of Columbia “ Whether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics. See George L. Paul & Jason R. Baron, Information Inflation: Can the Legal System Adapt?', 13 RICH. J.L. & TECH.. 10 (2007) * * * Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.” -- U.S. v. O'Keefe , 537 F.Supp.2d 14, 24 D.D.C. 2008).
  • 13. Judge Peck writing for the U.S. District Court for the Southern District of New York William A. Gross Construction Associates Inc. v. American Manufacturers Mutual Ins. Co ., 2009 WL 724954 (S.D.N.Y. March 19, 2009) (in multi-million dollar dispute, where issue involved production of 3 rd party emails where none of the three parties could agree on keyword search terms, court fashioned a compromise; court stated at the outset that “this Opinion should serve as a wake-up call to the Bar … about the need for careful thought, quality control, testing and cooperation with opposing counsel in designing search terms or ‘keywords’ to be used.”)
  • 14. Judge Maas writing for the US District Court for the Southern District of New York in Capitol Records, Inc. v. MP3Tunes, LLC, 2009 WL 2568431 (Aug. 13, 2009) “ . . .[R]ather than sitting down with the Plaintiffs’ counsel to agre on search parameters and terms, MP3tunes’ counsel directed is client to conduct a search of MP3tunes’ emails . . . using the word ‘design’ as the only search term. Remarkably, when I questioned the wisdom of that decision . . . MP3tunes’ attorney suggested that he actually considered this one-word search to be ‘overly broad.’ After I observed that MP3tunes’ unilateral decision regarding its search reflected a failure to heed Magistrate Judge Andrew Peck’s recent ‘wake-up-call’ regarding the need for cooperation concerning e-discovery . . . Counsel apologized for not having also used the word ‘development’ as a search term.”
  • 15. Beyond Keywords: What Alternative Search Methods Are We Beginning to Encounter in Current Litigation? Greater Use Made of Boolean Strings Fuzzy Search Models Probabilistic models (Bayesian) Statistical methods (clustering) Machine learning approaches to semantic representation Categorization tools: taxonomies and ontologies Social network analysis Hybrid and fusion approaches Reference: Appendix to The Sedona Conference® Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery (2007), available at http://www.thesedonaconference.org (link to publications)
  • 16. inspect forage looking look for investigate explore SEARCH lookup seek examine hunt hunting Thesaurus c/o Herb Roitblatt
  • 17. Droids Humanoid droids Nonhumanoid droids R-Series droids Droid Taxonomy C/o Herb Roitblatt http://starwars.wikia.com R5-D4 Astromech IG-88 Assassin R2-D2 Astromech C-3PO Protocol 2-1B Surgical T3-M4 Utility
  • 18.  
  • 19. What is TREC? Conference series co-sponsored by the National Institute of Standards and Technology (NIST) and the Advanced Research and Development Activity (ARDA) of the Department of Defense Designed to promote research into the science of information retrieval First TREC conference was in 1992 15 th Conference held November 15-17, 2006 in U.S. in Gaithersburg, Maryland (NIST headquarters)
  • 20. TREC Legal Track The TREC Legal Track was designed to evaluate the effectiveness of search technologies in a real-world legal context First of a kind study using nonproprietary data since Blair/Maron research in 1985 Hypothetical complaints and 100+ “requests to produce” drafted by members of The Sedona Conference® “ Boolean negotiations” conducted as a baseline for search efforts New Interactive Task added in 2008 using Topic Authorities and a post-adjudication round Documents to be searched were drawn from a publicly available 7 million document tobacco litigation Master Settlement Agreement database In 2009, a second Enron data set was added as a separate task Participating teams of information scientists from around the world contributing computer runs, plus in 2008 and 2009 from legal service providers (12 in 2009).
  • 21. TREC Legal Track: Topics RequestNumber: 52 RequestText: Please produce any and all documents that discuss the use or introduction of high-phosphate fertilizers (HPF) for the specific purpose of boosting crop yield in commercial agriculture. Proposal: "high-phosphate fertilizer!" AND (boost! w/5 "crop yield") AND (commercial w/5 agricultur!) Rejoinder: (phosphat! OR hpf OR phosphorus OR fertiliz!) AND (yield! OR output OR produc! OR crop OR crops) FinalQuery: (("high-phosphat! fertiliz!" OR hpf) OR ((phosphat! OR phosphorus) w/15 (fertiliz! OR soil))) AND (boost! OR increas! OR rais! OR augment! OR affect! OR effect! OR multipl! OR doubl! OR tripl! OR high! OR greater) AND (yield! OR output OR produc! OR crop OR crops) B: 3078
  • 22. Beyond Boolean: getting at the “dark matter” ( i.e., relevant documents not found by keyword searches alone)
  • 23. Nobody Finds Everything Source: TREC 2006 Legal Track
  • 24. “ Boolean” Searches May Miss A Large Percentage of Relevant Documents Source: TREC 2007 Legal Track 78% of relevant documents were only found by some other technique
  • 25. Boolean v. TREC Systems: Results of Legal Track Years 1 and 2
  • 26. Source: F.C. Zhao, D. W. Oard, and J.R. Baron, “Improving Search Effectiveness in the Legal E-Discovery Process Using Relevance Feedback” (forthcoming 2009) Improving Search Effectiveness Through Relevance Feedback and Multple Meet and Confers 1st Meet and Confer Second Meet and Confer
  • 27. Interdisciplinary Approaches-- Three Languages: Legal, RM, and IT
  • 28. Strategic challenges in our collective futures . . . . Convincing lawyers and judges that automated searches are not just desirable but necessary in response to large e-discovery demands.
  • 29. Challenges (cont.) Designing an overall review process which maximizes the potential to find responsive documents in a large data collection (no matter which search tool is used), and using sampling and other analytic techniques to test hypotheses early on.
  • 30. Challenges (cont.) Having all parties and adjudicators understand that the use of automated methods does not guarantee all responsive documents will be identified in a large data collection.
  • 31. Challenges (cont.) Being open to using new and evolving search and information retrieval methods and tools.
  • 32. Overarching Smart E-Discovery Strategy ESPECIALLY FOR eDISCOVERY LITIGATOR-WARRIORS … EMRACING COLLABORATION WITH ADVERSARIES (TRANSPARENCY) See The Sedona Conference Cooperation Proclamation, www.thesedonaconference.org
  • 33. The leading rule for the lawyer, as for the man, of every calling, is diligence. -- Abraham Lincoln
  • 34. Ongoing Research & Reference TREC 2009 Legal Track / TREC 2010 Legal Track http://trec-legal.umiacs.umd.edu/ ICAIL 2009 Barcelona DESI III Workshop -- June 8, 2009 http://www.law.pitt.edu/DESI3_Workshop/ Workshop-in-Planning -- October/November 2010, San Francisco The Sedona Conference® Search & Retrieval Commentary (2007) & The Sedona Conference® Commentary on Achieving Quality in E-Discovery (2009) (both available at www.thesedonaconference.org )
  • 35. Jason R. Baron Director of Litigation Office of General Counsel N ational Archives and Records Administration 8601 Adelphi Road # 3110 College Park, MD 20740 (301) 837-1499 Email: [email protected]
  • 36. Best Practices in Keyword Searching (Maura Grossman) Start with the Complaint: who are the custodians? What is the applicable time frame? What terms-of-art are employed? Translate the request into plain English Involve multiple people to get differing interpretations of the requests and potential keywords from different vantage points
  • 37. Best Practices in Keyword Searching (cont.) Next: seek input from people who actually created, sent or received the documents Look at responsive documents for unique words or phrases. In what context do those documents appear? Incorporate common misspellings, errors, variants and synonyms (utilize tools on web for this task) Determine irrelevant file types
  • 38. Best Practices in Keyword Searching (cont.) Special search strategies for handwritten docs, drawings, facsimiles, password-protected or encrypted files Learn capabilities and limitations of your search tool Take a representative sample and test, test, test From both Hits and Misses pile Retest
  • 39. Best Practices in Keyword Searching (cont.) Keep track of and document what you did to explain rationale for process or method applied Collaborate and communicate in good faith (See The Sedona Cooperation Proclamation , available at www.thesedonaconference.org )