Why Does Production Have to be Such a Big Production?, Part Three
Editor’s Note: Tom O’Connor is a nationally known consultant, speaker, and writer in the field of computerized litigation support systems. He has also been a great addition to our webinar program, participating with me on several recent webinars. Tom has also written several terrific informational overview series for CloudNine, including his most recent one, Understanding Blockchain and its Impact on Legal Technology, which we covered as part of a webcast on March 27. Now, Tom has written another terrific overview regarding production challenges and what to do about them titled Why Does Production Have to be Such a Big Production? that we’re happy to share on the eDiscovery Daily blog. Enjoy! – Doug
Load File Failures
Problems with productions have plagued us for years and none are more prevalent than load file errors. I recall a consultant in Seattle nearly 20 years ago who spent 2/3 of his times cleaning up Summation load files for clients. And the problems haven’t decreased as technology has improved.
Shawn Huston of LSP Data Solutions ( www.lspdata.com) recently told me that 2/3 of the load files he sees in productions have errors. Why? Remember my previous comment about communication? Shawn says that:
One of the biggest issues I see is parties agreeing to production specifications without understanding what they are agreeing to. A classic example is the more technologically sophisticated party requesting tiff, text and load files as a production format and the other party agreeing without realizing what that means and the process necessary to do it correctly.
We also frequently see productions that don’t have the corresponding metadata fields to aid in filtering and searching the production sets, but then counsel becomes frustrated when they can’t accurately search for dates, recipients, file names or other useful metadata fields.
So, what seems to be the problem? Well once again let’s turn to eDiscovery Grand Master Craig Ball for an explanation. In his wonderful 2013 article, A Load File Off My Mind, which is as relevant today as it was then, Craig explains that:
More commonly, load files adhere to formats compatible with the Concordance and Summation review tools. Concordance load files typically use the file extension DAT and the þ¶þ characters as delimiters, e.g.:
Concordance Load File
Just as placing data in the wrong row or column of a table renders the table unreliable and potentially unusable, errors in load files render the load file unreliable, and any database it populates is potentially unusable. Just a single absent, misplaced or malformed delimiter can result in numerous data fields being incorrectly populated. Load files have always been an irritant and a hazard; but, the upside was they supplied a measure of searchability to unsearchable paper documents.
What are some common load file errors?
Mismatched line numbers: Each line in a load file corresponds to a single document. Thus, the number of lines in a load file must match the number of documents being imported. If they do not match, a common cause is an extra line break in the load file.
Field Formatting Errors: Mismatched date formats (1/1/19 vs Jan 1 2019) and field length, that is a field in the database structure is only 6 characters long but the data being loaded is longer than that
Delimiter errors: Comma and semi-colon are commonly used delimiters but if a comma appears in some text being loaded …say “Apple, Inc”, it may be interpreted as a delimiter in the wrong place. Pipes ( a vertical line) are an excellent example of a once common delimiter which can be read as another instruction by some SQL and .Net databases.
Encoding: Some programs prefer a certain background computer language. Many older databases for example preferred Unicode Standard (UTF-1, UTF-7, UTF-8, UTF-EBCDIC, UTF-16, UTF-32) or ASCII. Importing data from a database that is not consistent with the database you are using may lead to problems.
Other load file problems that may occur include:
- Overlaps with document or Bates numbers: Documents that come from different sources in a case may have Bates numbers that are repetitive or have some portion of their sequence that overlap with each other.
- Page number difference: The number of pages in the load file may differ from the actual page count of the document images themselves, typically because of single page vs multi page image discrepancies.
- Uploader at incorrect stage: An error message that the loading process is not working smoothly, usually when the screen display shows that you are on one step of the upload, but the uploader recognizes it’s actually on the next stage.
- Timeouts on reading data error: The upload has stopped, either because of an internal issue or an interruption in internet connection.
- Encountered non-separator: Typically a typo in the load file and the load has stopped.
- Multiple native files: Multiple files with the same name as a document present in the native path, often a native file and an image file with the same name.
- Conflicts with a previous loaded image: The load file is pointing to multiple images for the same document page and the conflict must be resolved.
- Error with image reader: Usually means that the uploader could not read the image file.
- Error finding load file or directory: Most often occurs when the user is trying to upload from a network but the upload tool is either defaulting to a local drive or the user doesn’t have rights to the network.
We’ll publish Part 4 – Recommendations for Minimizing Production Mistakes – next Monday.
So, what do you think? Have you experienced problems with document productions in eDiscovery? As always, please share any comments you might have or if you’d like to know more about a particular topic.
Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.