Update on Spring 2010 EDRM Meeting
The EDRM 2010 Spring kick-off meeting this week in St. Paul Minnesota had nearly 100 attendees turnout, which may show that the electronically stored information (ESI) community is regaining some of its former optimism.
The work of EDRM (The Electronic Discovery Reference Model) is done by a series of project teams and committees. Each reported on progress over the last twelve months and goals for the next year and I share some highlights here.
Data Set Project
One of the big challenges in e-discovery (EDD) is creating a large, freely available data set, against which lawyers and vendors can test software or develop new semantic or text handling algorithms. The Data Set project has soared past their initial goal of 100 GBs of unencumbered data and is currently offering three different sets of helpful test data.
As reported last year, they are reaching out to other organizations to work for the betterment of the ESI community. Their very productive interaction with NIST (National Institute of Standards and Technology) has even resulted in receiving a copy of NIST’s source code used to run Hash values. (A hash is a mathematically generated ’signature’ for any digital file. Each is a limited and fairly small number of characters, irrespective of file size. Hashes are used to identify and compare files.)
Data Set is supporting the TREC Project, the Text Retrieval Conference, run by NIST. EDRM Data Set is helping to create an improved set of sample data for this year’s TREC legal track. The goal is to create an enhanced, canonical set of public Enron data that accurately represents Enron’s email environment. This year’s TREC sample will grow from 104 to 150 Custodians.
The Data Set project group sees their work as very practical, even though it sounds very techie on the surface. They are dedicated to lowering the cost of ESI processing. Their current goal is to answer the question, “What if there was a way to probabilistically determine if any file was user-generated or not?” They believe that non user-generated files could be automatically stripped out during pre-processing for most cases. With this in mind, they are working on creating the grandly named, Probabilistic Hash Data Set (“PHDS”). Success would mean reducing the cost of processing and reviewing ESI by eliminating irrelevant files early in the process.
Evergreen Project
The Evergreen Project is the group responsible for keeping the original content of EDRM, the information that matches the Model, or Framework itself, up to date. Evergreen is roaring out of the gate, having spent a year dedicated to completing and updating all EDRM Framework content.
Julie Brown must be lauded for her dedication to this work. She co-chaired the project and spent an enormous amount of time and energy shepherding the various nodes toward the goal of complete content. Julie is remaining with Evergreen, thank goodness, but is handing over her co-chair responsibilities to Therese Carey. I am staying on as the other co-chair and Therese and I are very excited about the progress made last year and the plans for the next twelve months.
Evergreen will be accomplishing the following enhancements to the website and its content by May 2010: A “Pack and Play” download for each phase of the EDRM framework containing a Standards document, supported by tools such as check lists and templates, case studies and an introductory, educational PowerPoint presentation. Evergreen is so fired up that you may see some of this material in Podcasts, Vcasts and live presentations.
Information Management Reference Model
IMRM or Information Management Reference Model project is still a new addition to EDRM and has already created useful content. They envision themselves as an “entirely new reference model – separate counterpart to EDRM.” Look for their helpful graphic on their page http://edrm.net/activities/projects/imrm
Model Code of Conduct
MCOC, led by Eric Mandel, is determined to have final content available for public comment by May 2011. It may seem like the code is taking a long time but the project is tackling very complex issues and they don’t want to give short shrift to the various points of view and factors related to these spiny issues.
Public Relations
The PR Committee has broadened its mandate to include entertainment. They put together a sneak-peak of a game show they are working on for LegalTech. They will be soliciting additional questions and answers from the ESI community.
—-
Once again, St. Paul, was lovely in the Springtime and the residents were welcoming and helpful. I managed to make it to the Minnesota Science Museum for a very quick look at the Dead Sea Scrolls exhibit, before the reception Tuesday evening. I doubt the work of EDRM will last 2000 years but at least we don’t have to produce TIFFs on papyrus.
