tag:blogger.com,1999:blog-83368292149703039182009-06-26T10:54:01.094-07:00E-Discovery In the TrenchesThis Blog is dedicated to the men & women working directly in the trenches on EDD projects - junior attorneys, paralegals, project managers, document reviewers, data processors, and staff consultants alike, who put in countless stressful (and often thankless) hours doing what seems to be the impossible.Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.comBlogger18125tag:blogger.com,1999:blog-8336829214970303918.post-55857914398948770432009-06-26T10:30:00.001-07:002009-06-26T10:53:52.157-07:00Estimating the Cost of Review (Simplified Version)<p class="zemanta-img" style="margin: 1em; float: right; display: block; width: 310px;"><a href="http://commons.wikipedia.org/wiki/Image:File-folders.jpg"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/3/34/File-folders.jpg/300px-File-folders.jpg" alt="Orange File Folders" style="border: medium none ; display: block;" width="300" height="300"></a><span class="zemanta-img-attribution">Image via <a href="http://commons.wikipedia.org/wiki/Image:File-folders.jpg">Wikipedia</a></span></p>Once you have processed your documents (extraction plus <a class="zem_slink" href="http://en.wikipedia.org/wiki/Data_deduplication" title="Data deduplication" rel="wikipedia">deduplication</a>) and search terms have been applied, you must take at least two things into account when estimating the cost of review:<br /><br /><span style="font-weight:bold;">(1) page count<br />(2) file size</span><br /><br />Most everyone asks for file count and I think this is a big mistake (unless you are performing a native file review where page counts are NOT available).<br /><br />Page counts will give you speed of review (i.e. number of contract reviewers multiplied by their collective review rate, say, 25000 pages per hour). File counts don't help here because assignments containing an equal number of files may reflect an asymmetric page count. Therefore, 500 files in one assignment may go quickly for a reviewer while another batch of 500 files for the SAME reviewer may take much longer if page count is doubled.<br /><br />Page counts will also give you cost of TIFF'ing and blowbacks (if this is the form of production or *gasp* review).<br /><br />File size will give you the cost of electronic production since this is typically charged on a per gigabyte basis.<br /><br />So when you are gathering statistics for estimating the cost of review, make sure you ask for (1) <span style="font-style:italic;">page count</span>, and (2) <span style="font-style:italic;">file size</span>.<br /><br /><div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/869b5422-491d-4d52-aa41-c3daeaa8f951/" title="Reblog this post [with Zemanta]"><img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_e.png?x-id=869b5422-491d-4d52-aa41-c3daeaa8f951" alt="Reblog this post [with Zemanta]"></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-5585791439894877043?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com0tag:blogger.com,1999:blog-8336829214970303918.post-11298648051549060302009-06-19T12:55:00.000-07:002009-06-19T12:56:08.512-07:00Damn, it feels good to be a Banker! [video]This is hilarious! FYI, I've been in consulting for 10 years. The banker-side of the rap is a bit hard to make out, so you'll have to listen to it twice (volume turned up even) to catch all the lyrics.<br /><br /><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/ROlDmux7Tk4&hl=en&fs=1&"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/ROlDmux7Tk4&hl=en&fs=1&" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-1129864805154906030?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com0tag:blogger.com,1999:blog-8336829214970303918.post-62225705706983035112009-06-13T13:17:00.000-07:002009-06-13T15:36:50.505-07:00E-Discovery is Low TechA lot of the work we do is low tech in nature. It's funny because the college graduates we hire have been raised in a Web 2.0 world and are shocked to find their time copying & counting files, tying out exceptions, recovering passwords, converting files from one format to another, etc. What's worse is that we'll hire experienced technologists who end up doing similar low grade work. The lucky ones get to run SQL queries. <span style="font-style:italic;">Whoopee! That's considered advanced.</span><br /><br />One soon realizes that work in our industry is low tech because it has to be--the volume of data that we process is gargantuan. We have to keep things simple so that we can easily detect mistakes. With all the progress that we've made in document analytics, these advanced techniques tend to fall by the wayside once something faulty is detected in the processing pipeline. Then everyone gets back to basics. Suddenly, our energy is focused on why a piece of meta data is missing; why our file counts are off; why multilingual characters aren't displaying. You only have to experience the tirade of a partner or senior attorney once to realize that data integrity is paramount. Everything else is fluff. Mistakes are bound to happen in our business; whether it's due to a software bug or human error--it doesn't matter. A singular processing mistake can be reason enough to convince a review team to fall back to reviewing everything linearly, one document at a time.<br /><br />It makes me think of companies that make things like dandruff shampoo. There's very little room for innovation, but they're providing a product for the masses. I'm sure the chemical formula of dandruff shampoo can be pretty complex, but the expectations are simple. Try to invent a cherry flavored version of dandruff shampoo, for example, and you run the risk of introducing a widespread allergic reaction. Do that, and the trust is broken.<br /><br />Electronic Discovery is a shampoo industry. Meanwhile, the semantic web, social media, spatial GIS and other high tech trends are passing us by. All the while generating more and more data for us to handle and process.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-6222570570698303511?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com0tag:blogger.com,1999:blog-8336829214970303918.post-84071977568026931612008-05-28T06:57:00.000-07:002008-06-08T11:08:52.044-07:00Project Managers, Practitioners, and ProfessionalsThere are three archetypes: project managers, practitioners, and professionals. A good project team will be staffed with all three. There's the gal who keeps the project on track, on budget, and within scope; the geek (with a faint, detectable glow of a halo around his head) who can deliver a soliloquy on the history of bate numbers and can lecture at exhaustive length on recall and precision; and last (but not least) the partner who, between negotiating the big deals, instills ethical and professional behavior in the team. A bad team, mind you, can <em>still</em> be staffed with all three archetypes. The difference is that a good team knows that they need each and every one of these role players. They rely on a mixture of every one's talents. A bad team will have individuals who have an overinflated view of their own contributions. They downplay the relative worth of everyone else's role on the project and have a heroic view of themselves. Whenever you're embarking on a new project, get to know the role players. If you notice bickering, infighting, or grandstanding this is a huge red flag. This, even more so than the value of the technology that's being employed, will give you some indication of the project's ultimate chance for success.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-8407197756802693161?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com1tag:blogger.com,1999:blog-8336829214970303918.post-64322033186981683122008-04-26T13:59:00.000-07:002008-04-26T17:16:59.817-07:00Recall and PrecisionThere's a great law.com <a href="http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=1208861019151">article</a> by H. Christopher Boehning and Daniel J. Toal that discusses traditional keyword and Boolean search methods versus new alternative methods. Though the authors don't mention it specifically, their article discusses the theory of "recall" and "precision". The ability to search a corpus of documents and bring back all of the relevant material in a result set is called "recall". The ability to reduce the number of false positives in a result set is called "precision". Therefore, if you craft an overly broad search you may increase your recall, but lower your precision. This scenario usually results in a larger number of false positive documents to sort through in your review. If you have very few false positives in your result set, it allows you to identify relevant documents one-after-another with fairly high frequency, but the snapshot of material may be a very thin slice of the overall relevant material (high precision, low recall). In other words, there may be a lot more juicy stuff out there to review. The trick is--and this is the holy grail of search--how do you corral all of the good stuff without having any bad stuff mixed in?<br /><br />It really depends on your review goals. The fallacy with most search efforts is a desire to only get low doc counts with the most relevant material possible. In this case, the emphasis for your review is on <em>precision</em> (maybe because cost is your primary driving constraint). If relevant material is rampant within the corpus, however, you will want to increase your <em>recall</em> in order to get at the full scope of your issue. You may tolerate a good number of false positives in order to be as thorough as possible (maybe completeness is your primary driving constraint). You'll want to decide quickly whether recall or precision is the ultimate goal of your review. Of course you'll want both, but after the review has started you'll want to shift your focus on one or the other depending on the incremental results of your review. You'll know quickly (after a day or two) if your review assignments are yielding the desired level of precision. In order to test your level of recall, you'll want to sample a population of the documents that were excluded from review (make sure it's statistically significant). Once you perform a QC review on this sample set, you'll know whether your search terms were sufficient in capturing enough relevant material. <br /><br />As you all know, the iterative nature of this work is commonplace in our business. Unless you have a real sense of the percentage of relevant material to begin with, there's absolutely no way of knowing whether your search results have achieved the highest level of recall and precision until you <em>roll up your sleeves and just dig into it.</em> If you're trusting the artificial intelligence of a system to do this "auto-magically" for you, either by concept grouping or "learning" or some other newfangled algorithm, then you are putting quite a bit of faith into the technology. Remember that most of this new technology is a carefully guarded trade secret belonging to the software vendor. In order to prove anything to the court, however, you have to be able to lift the hood and explain the goings-on underneath. The only defensible position that one can take these days, at least until there's a technology winner that is universally accepted by the court, is to present your search terms with hit counts and corresponding review calls. Keywords and Boolean searches are still the state-of-the-art today.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-6432203318698168312?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com1tag:blogger.com,1999:blog-8336829214970303918.post-37847651729826895592008-04-17T01:43:00.000-07:002008-04-17T10:00:35.869-07:00Only the Company Can Know ItselfIn the latest law.com article, <a href="http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=1208169988197">Keeping Your Firm's E-Discovery In-House</a>, Dale Buss recognizes that there's strong sentiment in the industry for "legal departments [to] establish as much as possible of the ESI-management function in-house as swiftly as they can [because] only the company over time truly can know itself". Robert Bjornsti, VP of AXA Equitable Life Insurance Co., echoed this sentiment earlier in the year at LegalTech NY when he delivered the day two keynote address on "Paradigm Shift -- Corporate Use of Legal Support Services". The argument here is that insourcing e-discovery work not only reduces cost, but is more effective. A corporation can fine tune it's response to a legal hold by tapping into the company's ERP system. Leveraging the HR metadata resident in enterprise databases gives you insight into a custodian's business function, the nature of the data that they keep, and the level of privileged and/or confidential information contained therein. "That way, when you get a discovery notice, the company can be very precise, not shotgun, about where the right data is." Performing this work behind the corporate firewall also enhances security and control. It allows corporations to reuse data for concurrent and pending matters within their litigation portfolio.<br /><br />This is no small undertaking. First of all, e-discovery software is mostly proprietary and is geared to reside at the technology vendor's hosting facility. A lot of these homegrown solutions were developed by the technology vendors themselves and were invented to serve as a secondary offering to their consulting services. The software platform was never designed for general, off-the-shelf deployment within a company's network. Secondly, IT departments aren't equipped to deal with the high stakes nature of e-discovery work; and the personnel aren't suited at all to deal with attorneys and attorney requests. I used to be an IT guy and I can tell you that we are bred with a troubleshooting mindset. Everything is up for experimentation and subject to trial and error (we deal primarily with Microsoft tools, after all). This approach simply doesn't work in litigation. If the pendulum truly is swinging back from outsourcing to insourcing, it could come crashing in through corporate walls creating more damage than originally anticipated. For the enterprise that is litigation savvy and has a penchant for detail, it may very well be worth the effort. The corporation must understand that the effort will require an entirely new business function -- not supplanting the IT department, but working hand-in-hand with it. New (and very large) budgets will need to be allocated for hardware and people. Planning for an in-house staff of e-discovery professionals and a handful of reliable, independent consultants will go a long way in easing the transition.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-3784765172982689559?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com0tag:blogger.com,1999:blog-8336829214970303918.post-55926769746006298292008-04-15T06:57:00.000-07:002008-04-15T07:22:16.203-07:00Trend towards the ProactiveMany in our industry have predicted a trend towards more proactive e-discovery solutions, and I tend to agree. In its most simplest form, this argument means reducing the volume of data and overall costs. Whether this is accomplished through "early case analysis" or better software, the distinguishing feature is where & when one decides to pare the corpus of data for a particular matter. If you identify the priority custodians and send all of their material <em>en masse</em> to a vendor, you are taking the traditional route and being <em>reactive</em>. If however, you can pare the material by priority custodian, date range, and keywords onsite, behind the firewall at the corporation you are definitely being more proactive than most. Now, we all know keywords have limited effectiveness for identifying relevant material, but that's a topic for a whole other discussion. The point is, keyword search terms are still very commonly utilized in litigation matters and if you can filter the data ahead of time and send only the resultant material to your vendor, it will reduce your overall cost significantly.<br /><br />Most attorneys will argue that it is within the client's interest to keep all the data in one location--typically at the technology vendor's data center; so in the event that keyword search terms change (which they will) or the priority custodian list changes (which it will), it will save time to make these changes on-the-fly in one unified location rather than in a piecemeal fashion, once at the corporation and once again at the vendor after more data has been shipped.<br /><br />For my next blog entry, I will talk about the latest school of thought: let's keep all the data at the corporation and NEVER send it to a technology vendor!!<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-5592676974600629829?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com1tag:blogger.com,1999:blog-8336829214970303918.post-12844477071305884572008-01-15T21:26:00.000-08:002008-01-16T09:39:14.265-08:00The Offline ReviewEvery so often, there's an unavoidable need to export documents out of your review platform for "offline review". This can mean something as simple as printing documents out for an attorney to provide handwritten comments; or it can mean something more complicated like exporting documents to an offline format because your system's native viewer can't render documents containing illegible text, password protection, or foreign language content. <br /><br />Make sure this is a necessity. Tracking these documents later can create a huge reconciliation headache.<br /><br />Ensure that everything has been tried <em>within</em> the system to fix your problematic docs. If TIFF-on-demand, or installing language packs, or password recovery measures don't fix your documents, then tread carefully with "offline review". Remember these challenges: <br /><br /><blockquote><em>How do you summarize the review markings and production status of these offline documents in your standard status reports? <br /><br />How do you maintain an audit trail for the way these documents change over time during the course of the offline review?<br /><br />If you or someone on your team backfills markings, annotations, and redactions into your online system on the reviewers' behalf, know that YOU will be recorded as the reviewer for that subset of documents. How does this affect the accuracy of your reviewer progress reports?</em></blockquote><br /><br />You'll discover that your pretty online reports are riddled with asterisks and footnotes, referencing ugly, confusing spreadsheets that contain specific stats for your "offline review". Also, unless you are extremely meticulous with offline tracking, your ability to confidently explain the status of your review quickly diminishes once you head down the path of "offline review".<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-1284447707130588457?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com0tag:blogger.com,1999:blog-8336829214970303918.post-68214340226526630802007-12-29T09:31:00.000-08:002008-01-13T23:17:19.677-08:00The Media LogIt's also referred to as a tracking spreadsheet or delivery manifest, but the "media log" is one of the most important pieces of paper in your Chain of Custody. If the sending party doesn't provide a media log to accompany a piece of delivered data, don't process the data! If they push back and say, "Can you guys just fill out the log based on what's on the DVD?", don't do it. You have no way of knowing what's on the disc. There have been many, many instances where the sending party forgot something that they "intended" to send. There's also the event that the sending party accidentally copied material to the media that was collected for another matter altogether. The log is a tool to confirm the nature and validity of contents contained therein. Months can go by and a question could eventually arise, "Didn't you process Custodian ABC hard drive data in batch XYZ? It was supposed to be on the DVD that we sent you", or more egregiously, "Why am I seeing Custodian ABC data in the repository? He has absolutely nothing to do with this case!" The media log allows you to address discrepancies immediately. <br /><br />You can process the material that is sent "as-is" as long as you accept all assumptions that go along with it. You can analyze the contents beforehand and can report any anomalies, but there's no way of confirming for sure the accuracy or thoroughness of the delivery. In other words:<br /><br />Accuracy: Are all the custodian sources there?<br />Thoroughness: Was all the data copied? If you see an empty source directory for Custodian ABC's network share, should you be concerned?<br /><br />I know we've been in the business for a long time, so these risks are semi-obvious, but it doesn't hurt to reiterate at the outset of a new project. The workflow and standards that you enforce beforehand really set the stage for a successful engagement.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-6821434022652663080?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com0tag:blogger.com,1999:blog-8336829214970303918.post-82409345776981067062007-12-11T01:06:00.000-08:002007-12-11T01:17:30.556-08:00Repopulating DupesAre you required to repopulate duplicate documents in your production? Be aware that you may be left with unmarked documents. Not every tool propogates reviewer markings to repopulated duplicate documents. Not only that, repopulation may not even work unless the "duplicate owner" custodian is included as part of the production.<br /><br />While deduplication may seem like a great way for you to reduce the number of documents that you ultimately have to review, understand how your decision could create downstream headaches on the production end of the pipeline. You <em>may</em> need to post-process the production before you ship it out the door.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-8240934577698106706?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com0tag:blogger.com,1999:blog-8336829214970303918.post-79963742528187329492007-11-16T09:40:00.000-08:002007-11-16T10:18:00.732-08:00Database Mitosis"Uh, your database is too big. We need to split it."<br /><br />Your organization may have bought into a evidence and discovery management tool because it was advertised as the biggest and fastest database on Earth. Sure, the backend database to these applications (SQL Server, Oracle, etc) may have benchmark statistics <em>proving</em> such claims, but what are the practical limits of the software package itself? A severe limitation is the software code that's overlaid on top of the database technology. As a past programmer, I know full well that code can be written efficiently or inefficiently. You can have programming code that cuts to the chase (1+1=2), or code that takes unnecessary leaps of logic (((2 x -10) + 40) / 10 = 2). Strip away the pretty interface and expedience makes all the difference. Let's face it, there's basics that we all need in a tool. If an extra bell or an extra whistle slows things down for you and your review team, it may not be worth the purchase. <br /><br />Ask the following questions to a potential vendor:<br /><br /><strong>1) How much data can we host in your software package? What's the practical upper limit given <u>x</u> number of users?<br />2) Should we consider multiple databases from the outset, based on the anticipated volume of data?<br />3) What affect do multiple databases have on deduplication? multi-user access? consolidated reporting?</strong><br /><br />Ask these questions from the get-go, and you won't be confronted with splitting a database, <em>after-the-fact</em>, once the performance of your nifty software package slows to an agonizing crawl. You may experience weeks of downtime before you're up and running again with multi-database constraints that you weren't even close to understanding beforehand.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-7996374252818732949?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com2tag:blogger.com,1999:blog-8336829214970303918.post-89815237527700341802007-11-08T10:34:00.000-08:002007-11-12T09:48:17.006-08:00Waivering, To and FroCrafting a list of keywords that will retrieve a maximum number of responsive documents on your matter requires planning and knowledge. Skilled practitioners in our field understand that it requires interviews with relevant custodians (to understand organizational lingo), and a firm understanding of the specific search technology that's employed. We also know that this methodology shouldn't only apply to the keywords <em>within</em> a document, but also in the TO and FROM fields in email metadata as well. Almost everyone has, at minimum, two email accounts - one for work and one for personal communication. Some of us have more and I've seen as many as twelve corporate email addresses for the same person at an organization. For example, "customersupport@xyz.com", "marketing@xyz.com", "helpdesk@xyz.com", "accounting@xyz.com", etc. While e-discovery typically targets work and personal email, this will certainly grow once other types of "e-communication" accounts are brought into the fold, such as Instant Messaging and cellular text messaging accounts. <br /><br />If you are required to search email communication by one or more individuals and the available custodian information won't suffice, you will need to capture all variations in the TO and FROM fields (and possibly the CC and BCC fields). The format of these fields can vary widely by including just the email address (jbui@xyz.com), the display name (Jerry Bui), or some combination of the two. You might also observe some of other formatting wildness, such as the following:<br /><br /><blockquote>CCMAIL: Jerry T Bui at XYZ_US<br />MS: XYZ/US/JTBUI<br />X400:c=US;a=CONCERT;p=XYZ;s=Bui;g=Jerry;i=T;</blockquote><br />If you're looking at personal email accounts, then all bets are off. These tend to look like any of the following:<br /><br /><blockquote>prettyflower_1963@yahoo.com<br />ifixmustangs@gmail.com<br />jb74_forensicexpert@msn.com</blockquote><br />In this scenario, searching the TO and FROM fields for elements of the person's name just won't work. Keep in mind, too, that individuals can change their DISPLAY NAME alias numerous times over the course of owning an email account. Realize that you will need to tease this information out during custodian interviews and you will also need to sample the material yourself; <strong>look at the email headers and note the variations</strong>. You will want to include all variations of a person's name, email address, and display name alias as part of your search term list. Otherwise, any misunderstanding of what's included in the TO and FROM fields could cause you to overlook relevant communication.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-8981523752770034180?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com1tag:blogger.com,1999:blog-8336829214970303918.post-39374040703617816122007-05-11T23:08:00.000-07:002008-01-13T23:20:13.120-08:00Beware of Going NativeNo, going native doesn't mean stripping naked and running down the hall at your law office. ALM printed an article called <a href="http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=1178183080005">Discovery Savings: Going Native</a>. I honestly don't think Native will save you all that much time or money. In fact, reviewing in TIFF may be faster and is really the only choice if you're adding redactions and/or other endorsements (your vendor may need a couple of days lead time to start the TIFF conversion process before you start your review, however). How long do you have to wait for a 1,000 page Excel file to download using a native viewer compared to a TIFF rendition that loads quickly, one page at a time? What about corrupt files? Corrupt files can bring a native viewer crashing to its knees, whereas TIFF processing will often provide a placeholder for these types of files indicating that the original file was inaccessible by normal means. You will save significant time in troubleshooting hours alone if TIFF conversion can help separate the wheat from chaff when it comes to the quality of your documents. If you are reviewing in Native to cull your documents, that may be seem wise initially, but if your review tool can't convert that document on the fly to TIFF, that means a stoppage to your workflow so that you can export your work product to a system that does. You may ultimately need a tool that supports TIFF conversion for redactions, endorsements, and bate-stamping or you may want to produce relevant documents in TIFF, per agreements that you and opposing counsel made during pre-trial conference (Rule 34 to the Revised FRCP). Either way, merging your issue codes and document tags to a disparate system just because your original tool didn't support a specific piece of functionality is fraught with inconsistency and risk. <br /><br />The real question is can you do both? Are you utilizing a tool that will offer the time benefits of reviewing in Native and switching over to TIFF-on-demand when needed? No reputable vendor in this day and age is charging you for non-responsive TIFFs. TIFF-on-demand is the norm, and its best to find a tool that has integrated native review (should you need it).<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-3937404070361781612?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com3tag:blogger.com,1999:blog-8336829214970303918.post-15613720801102313312007-04-23T09:18:00.000-07:002007-05-09T20:36:47.532-07:00Meta-FourFor all practical intents and purposes, there are four <em>major</em> types of Metadata that we're concerned with in the review, analysis & production phases of the E-Discovery lifecycle:<br /><br />(1) Document Metadata<br />(2) Container Metadata<br />(3) Tagging Metadata<br />(4) Workflow Metadata<br /><br /><strong>Document Metadata</strong> - This is the traditional stuff that you're accustomed with when trying to ascertain the Author, Create Date, Modified Date, Last Printed Date, etc. This is also referred to as <em>embedded metadata</em>.<br /><br /><strong>Container Metadata</strong> - Vendors should be populating this metadata type with custodian & source information, as well as culling parameters if applicable. Ideally, there should be <u>several</u> <u>fields</u> allocated to cover the breadth of container information so that a linking system can reflect how specific batches of data were extracted and processed. <em>Sidenote: The Socha-Gelbmann team have initiated an industry-wide <a href="http://edrm.net/edrm_xml.php">XML initiative</a> to standardize all the data fields in party-to-party transmittals.</em><br /><br /><strong>Tagging Metadata</strong> - All the relevance calls, issue codes, redaction reasons, and privilege reasons comprise this category of metadata.<br /><br /><strong>Workflow Metadata</strong> - This oft overlooked set of tags helps organize the workflow steps in your review platform. "First Tier Review complete", "Second Tier Review Complete", "Needs Further Discussion" and other similar tags control how work is distributed for review amongst your team members. There should also be a tag that determines the ultimate Production status of a document after it has traversed all the various tiers of control. A lot of this gets lumped in as "Issue Codes", but it can be more accurately described as "workflow metadata".<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-1561372080110231331?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com3tag:blogger.com,1999:blog-8336829214970303918.post-64015982385075438602007-04-21T17:35:00.000-07:002007-04-23T07:56:41.176-07:00Chain of FoolsWith so many parties involved in the E-Discovery process these days, can anyone claim to have a clear & precise picture on your project's chain-of-custody? Your case is likely to have a different vendor for Evidence Collection, Processing (Culling and Deduping), Text & TIFF, and Review & Production. If pressed, can your vendor tell you where/how a specific produced file was collected and what treatment it received along the entire chain of evidence? I think they would be hard pressed to answer this without significant research. They would need to call every single vendor that was involved in the process, and it's likely that a painstaking analysis of all the various logs still won't yield a conclusive answer as to where, when, and how the <em>file in question</em> was derived. The truth of the matter is that this <strong>container metadata</strong> is often dropped to the floor as it is handed off between vendors.<br /><br />As the case manager, it is <u>your</u> <u>job</u> to enforce the integrity of the chain-of-custody. Ensure that all logs are transcribed accurately with source information and that all the culling parameters are captured (search expressions, date ranges, and deduplication fields). Ideally, the vendor will have the ability to store this in a field (as container metadata) in the appropriate <em>load file</em>. The recipient vendor should be made aware of these fields and should be instructed to store this in their subsequent output file. In the end, the review & production platform should be configured so that these fields are exposed to you and your end users.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-6401598238507543860?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com0tag:blogger.com,1999:blog-8336829214970303918.post-71582382425905464852007-04-10T11:39:00.000-07:002007-04-23T22:28:21.158-07:00I can review faster using PaperSmugly, the lead attorney on the matter walks in, adjusts the knot on a tie that costs more than my suit and proclaims, "Just print it out for me. I can review faster on Paper". <em>Translation: you silly kids use that computer thing.</em> <br /><br />Whoa! Where do we begin to correct this attitude? Well, I say stand your ground and drop a little knowledge. It's probably true that this type of attorney can review faster on paper, but what about the net effect on the review workflow as a whole? When you're working with an automated system and reviewers' issue codes, redactions, and notes are stored in a central database, how do you reflect an attorney's handwritten hardcopy notes <em>back into that system</em>? Someone has to enter everything in on his behalf, right--creating twice the work? If the offending attorney entered it in himself in the first place, wouldn't that be more efficient?<br /><br />Well, here are the common objections from Mr. Big:<br /><br />1) "My billing rate is higher than yours, so let's do what's more convenient for me and we'll have you re-enter it at your billing rate (slightly restrained giggle). It will be a net savings for the client because I won't have to waste a lot of time with that confusing software."<br />2) "With all the more important things I do during the day, I review this material in bed at night. A laptop is too cumbersome, even in a California King-Size. Plus, the laptop screen gets in the way when I want to watch VH-1's 'Flava of Love' on TiVo. <em>Flaaavah Flaaaav!</em> Ahem, but I digress."<br /><br />These are your rebuttals:<br /><br />1) Yes, there's a learning curve but please try. The discovery phase should take the next 6 months and the sooner you get the hang of the software, the more savings we can realize in money <em>and time</em>.<br />2) There isn't really a good comeback for this one, but just explain that working with hardcopy paper is prone to error; illegible handwritiing, misinterpretation during the transcription process, etc. Let's work within the constraints of the system and limit our potential defects.<br /><br />Write me back and let me know if this works for you guys :)<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-7158238242590546485?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com3tag:blogger.com,1999:blog-8336829214970303918.post-55343855871780289212007-04-09T16:12:00.000-07:002007-04-21T19:47:25.334-07:00I hate my Project Manager!The legal game is time & quality driven. The stakes are high and mistakes can lead to exorbitant penalties and/or sanctions. At the end of the day, there are a handful of individuals directly responsible for the expediency and quality of the productions: the <em>project manager</em> on the vendor side, and the <em>review coordinator</em> at the law firm. While the buck ultimately stops at the Partner on both sides, it is your head and mine on the chopping block. We are the ones overseeing operations on a day-to-day basis and are the ones that have been hired to prevent defects during the review and production process. Let's face the facts. <em>Our seniors and staff associates hate us</em>. We ask them to work after hours, weekends, & holidays. During the most intense periods we deny them any semblance of a work/life balance. You can see the inexorable stamp of disappointment on their faces, "I didn't sign up for this".<br /><br />The only way I know to lessen the pain is to fill your seniors and staff in on the big picture. Let them know how their role facilitates the task at large and elicit their feedback on how things can be done better or more efficiently. The old "I did the <em>shit work</em>, too, and now it's your turn" pitch, just doesn't work. It will only build up resentment and, if they don't quit, only perpetuates the same type of treatment with new seniors and associates as they move up the ladder.<br /><br />Buy them lunch & dinner and let them know their work is appreciated. Communicate the calibre of projects that they are working on and help them understand how getting their hands dirty gives them the first-hand knowledge necessary to move on to the next level in their career path.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-5534385587178028921?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com0tag:blogger.com,1999:blog-8336829214970303918.post-56784645476287674502007-04-08T19:47:00.000-07:002007-04-24T19:00:52.016-07:00This is just the beginning. We are going to be buried in data.Experts estimate that more than 2.4 Billion will be spent by litigators in 2007 on electronic discovery services (reference <a href="http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=1174307786701">here</a>). While most of this business involves the indexing and presentation of email and electronic office files, we are going to be expected in the near future to work with software that handles foreign languages, audio, video, cell phone text messages, instant messages and, yes, even blog data.<br /><br />What does that mean for us in the trenches? Video files for one are exponentially larger than text based documents like emails and email attachments. Processing, storage, and presentation requirements are going to need to grow accordingly. Vendors are already housing <a href="http://en.wikipedia.org/wiki/Terabyte">terabytes</a> worth of data as it is. It means that as more and more information becomes discoverable--basically all Electronically Stored Information (ESI)--the more data we are expected to usher through the pipeline to bring our projects to successful completion. Be prepared and get a shovel. We are about to be buried.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8336829214970303918-5678464547628767450?l=www.jerrybui.com%2Fedd'/></div>Jerry Buihttp://www.blogger.com/profile/09416954444365885384noreply@blogger.com0