Collection is NOT Preservation
Even relatively experienced legal professionals tend to conflate preservation and collection. While collecting as a way to preserve certainly would meet the court's intention, it is a very costly and inefficient way to do so. Think of preservation in terms of ensuring potentially relevant data isn't deleted. Courts don't prescribe a particular method for preservation, they just require that it gets done. Collection, on the other hand, is the first tangible step towards producing documents to the other side. While certainly not all collected documents will ultimately get produced, the idea is that collection feeds into the review process, which in turn dictates the production set. Put simply, preservation is very process based, while collection is much more action based (and usually way more technical).
Types of ESI That Must Be Collected
Virtually every form of electronic data is up for grabs in e-discovery. And while it's one thing to identify and preserve various forms of ESI, it's often quite another to actually go out and collect it all. Different data sources have different levels of accessibility and present different collection challenges. Here is a breakdown of five common categories of ESI that might need to be collected for e-discovery:
Data that you interact with on a regular basis, such as email and other traditional files that are stored on a local hard drive or network drive. This ESI tends to be fairly easy to access and collect.
Cloud / Mobile
By far the fastest growing category of ESI, this is data that is created and stored on cloud servers (e.g. cloud-based applications, cloud storage, social media, etc.) or mobile devices, outside the scope of corporate networks or formal IT oversight. Cloud providers have differing policies and processes with respect to accessing data, and it's helpful to familiarize yourself with those details before you need to actually collect the data. Meanwhile, collecting from mobile devices will usually require sophisticated tools and potentially outside experts. To gain a better understanding of the precipitous rise of mobile device ESI, check out our infographic, " The Value of Mobile Data in E-Discovery."
Data that is no longer in active use but is stored or archived. Even though offline data can't be accessed over a shared server, collecting it usually presents fairly minimal challenges as long as you know the physical location of the data and the system on which it's stored.
Traditional backup tapes or disaster recovery systems are designed to store data in the event that it must be restored. These systems compress files and are not easily searchable or accessible and therefore they tend to present significant collection hurdles.
Previously deleted or fragmented files that exist on various systems and are usually not readily visible to regular system users. These files are highly inaccessible, and attempting to recover them requires specialized tools. More on hidden files below in our section on forensic imaging.
You can't address collection in e-discovery without talking metadata. You'll come across different definitions for metadata (like all things e-discovery, it seems one definitive explanation for a concept, process, or activity is never enough). Our favorite definition is that metadata is the data about the data. Let us explain…
When you look at regular document on your computer you see the words in the document of course, along with the name of the file, and where it's located. Behind the visible information is a whole host of other information about the document, such as when it was created, modified, last updated, who made edits to the document, etc. In a regular business sense, this information is pretty useless. Why would you need to know that a colleague edited a document three months ago? In the land of e-discovery, this contextual information can be hugely important and has to be included when the document is ultimately collected.
Forensic Image vs. Logical Copy
If you tell an IT professional that you need to collect data from a computer hard drive, chances are you are going to be presented with a forensic image of the drive (also known as a "bit by bit" or bit stream copy). At the most basic level, a forensic image is a complete copy of a drive – including the portions of the drive that aren't allocated to active files (known as slack space). It is what would normally be considered an exact duplicate. These types of images give you both the files you'd expect to see if you were browsing a file listing, and also data from previously deleted files. Forensic imaging requires specific tools and is usually administered by an expert.
Alternatively, a logical copy is simply a copy of the contents of the directories on a disk and does not include previously deleted data or other information that a forensic image would capture. They are also much less technically intensive and can be performed by just about anyone with a little training and the right software.
So which is the best approach for e-discovery?
Most experts will tell you that in a great majority of civil matters, a logical copy will meet the court's expectation. There is certainly a place for forensic imaging, but it's usually only necessary when there is a suspicion of data tampering or in cases where previously deleted files are at the center of the controversy. You can learn much more about the difference between a forensic image and a logical copy by viewing our infographic, "
E-Discovery Data Collection Considerations for Organizations."
There are a variety of ways that organizations approach the collection process. Questions that might dictate the collection methodology might include:
- How much data is involved in the legal matter?
- How many sources of data are implicated, and how accessible are those data sources?
- Will the collection involve any specialized tools or expertise?
- Does the legal matter involve encrypted or sensitive data?
- Are there internal IT resources available to perform/assist with the data collection?
- What are the time constraints (production deadlines, retention schedules, etc.)?
- What type(s) of collection technologies are deployed to perform the collections?
- Civil or criminal (to get to the comment above about forensic vs logical)?
Answers to these questions will help determine which collection approaches to employ, which include:
Developing a Collection Strategy
As opposed to the collection methodologies described above which address the 'how' of collection, your collection strategy address the 'why.' Why are you collecting certain data? Why collect data one way instead of another?
Your collection strategy will change with every matter. In some cases - say very high stakes legal matters involving precarious data sources - it may be wise to collect data immediately. In other matters, immediate collections may not be necessary, especially if you have a strong preservation process in place. It's common for litigants to collect very highly relevant data early, since they know it will need to be collected eventually, but collecting very broadly in the early days of a matter is usually not advisable, as this will typically just drive up your costs with very little associated benefit to your case.
It's also important to consider how your case strategy impacts your collection strategy. If your case is inevitably headed for an early settlement, it probably doesn't make a lot of sense to collect and process a bunch of data that ultimately won't be needed. Other considerations that should go into your collection strategy include whether or not outside experts should be involved, if there is any sensitive data that warrants greater protection measures, and whether any employees – like a person named in an incriminating lawsuit – might have incentive to alter or delete relevant data, in which case a more proactive collection might be warranted.
For a great discussion among e-discovery experts on considerations for creating an e-discovery collection strategy, watch the on-demand webcast, "Data Collection Considerations for Litigation, Criminal and Regulatory Matters."
Validating the Collection
Collection can lead to a lot of contentious disputes between parties when there is suspicion that not all relevant data was collected, or that the collection process itself altered the contents of the data. When such controversies surface, parties typically rely on a few mechanisms for proving that a collection was conducted in a defensible manner. These include:
Rather than cover data processing in its own section of the guide, we've decided to include it with collection, since the two are so closely intertwined. Processing can be simply defined as the process of preparing collected data for attorney review. After data is collected, the resulting document set will include a rather messy mix of file types and formats, attachments, meaningless system files, and plenty of duplicates. Processing is all about cleaning up the mess and formatting the collected ESI so that it can be culled and searched by attorneys and review tools.
We won't get to into the weeds on data processing, since it's a highly technical process that includes a lot of concepts and jargon that the average e-discovery practitioner doesn't need to know. What is worth discussing, however, is who actually does the processing. Traditionally, most organizations outsourced data processing to third party vendors who would use specialized technologies to winnow data sets down and deliver them back to clients for next steps. Today, many companies still outsource processing, but there are a growing number of companies who have deployed processing software in-house. There is also an emerging class of collection technologies that essentially consolidates collection and processing into one step. More on these tools in the technology section below.
Data Collection Best Practices
Data collection is a dynamic and multi-faceted process that relies on sound e-discovery strategy, as well as solid technical resources and expertise. There are important best practices that fall under each of the various elements of the collection process, but here are four big ones you should know:
We know that it's easy to identify a relevant custodian and copy his or her entire hard drive or email folders. But easy doesn't equate to smart. More data collected means more data processed and ultimately reviewed. And that all adds up to more MONEY spent on e-discovery. Instead, develop strong preservation and early case assessment processes, and target your collections so that you are only collecting the potentially relevant ESI, nothing more and nothing less. Learn more about avoiding over-collection by reading Exterro's white paper, "Eliminating E-Discovery Over-Collection."
It's inevitable that some matters are going to present unique collection challenges. Maybe it's a case involving mobile data or one involving highly unorganized data on legacy systems. Whatever the case may be, do yourself a favor and recognize these challenges early on rather than right at the point where data needs to be collected. It's always better – and much cheaper – to assess your needs proactively to determine if outside resources will be needed and, if so, which vendors. Even if outside help isn't needed, it's important to give your internal IT team a heads up that a potentially big project may be coming their way soon, so they can plan accordingly.
Tier your collections
This relates to our point above about over-collection. It's always best to think of collection in terms of phases or tiers, rather than to try and do everything all at once. A tiered collection strategy involves prioritizing data so that only the most highly relevant data is collected immediately and less relevant data is collected only when absolutely needed. Just remember, that the only way to execute a defensible tiered collection strategy is to have a very strong preservation process that gives you the ability to not collect everything immediately.
Create an ESI repository
Matters overlap. You may have dozens of lawsuits that revolve around the same or similar issues, people, and data. Rather than create multiple copies of the same data, create a centralized repository of collected ESI and develop a systematic process for reusing data across matters.
Data Collection Tools
Data collection is not a one size fits all endeavor, nor is it a process that is supported by a single technology. There are a variety of tools that you can deploy depending on your specific collection needs and priorities. Here are some specific systems and capabilities you may want to consider:
As mentioned earlier, processing has traditionally taken place post-collection as a separate e-discovery process typically handled by service providers, who charge on a per-gigabyte basis. However, there are new search and collection technologies that process data at the point of collection, eliminating the need to send collected data to a third party vendor.
It may not specifically be a collection technology, but pre-collection analytics have a huge influence on the collection process. These are tools that crawl data sources and deliver basic insights like document volumes, and can also perform more advanced searching and filtering to really hone in on relevant content. They equip you with the necessary intelligence and visibility to target collections and focus efforts on just the relevant content.
Data Source Integrations:
Integrating your collection software with your enterprise data sources (email servers, Sharepoint servers, structured databases, etc.) can greatly streamline the collection process by eliminating the need for IT to conduct manual collections. Integrations allow for collection to be handled remotely, and they also minimize a lot of the technical complexities surrounding collections, allowing non-IT professionals to be more involved in the process.
Even when you have an integrated collection environment and can collect over the network, there are still instances when you may need to grab data off a system that isn't connected to the network, such as field computer that is used by an employee who works remotely. Spot collector tools are portable USB devices that allow IT professionals or custodians to crawl and collect off non-network systems. The major benefit that these tools offer is that they can be pre-configured to collect only relevant files rather than complete copies of a computer's hard drive.
Mobile Collection Tools:
We discussed the challenges of collecting mobile data earlier. Ideally, any data on a mobile device will be located somewhere else that is a little more accessible, such as an email server. But workers in some industries create content that never leaves their phones – like text messages – that may need to be collected. Fortunately, there are specific devices that are designed to extract data off of mobile devices and reformat it for the purposes of attorney review and legal production.
Up next in the Beginner's Guide to E-Discovery, we go from data collection to document review, the most costly of all e-discovery phases.