The corporate data footprint continues to expand exponentially in size, variety of sources and complexity, largely as a result of ongoing business adoption of cloud-based collaboration and productivity tools and a widespread shift to remote work. On average, most organisations’ IT systems now include 400 data sources, with the top 20% juggling 1,000 or more according to a recent survey from IDG Research. In the same survey, IT professionals stated their data volumes are growing by 63% monthly, with one in 10 indicating their footprint is growing at 100% or more each month.
The prevalence and rapid growth of these emerging data sources are creating a number of challenges for organisations, across IT infrastructure, storage, compliance, records management, data protection and e-discovery. Increasingly, our experts at FTI Technology are engaged to help clients mitigate unfamiliar risks associated with emerging data sources and resolve the technical and practical issues that now arise when collecting them as part of a legal or regulatory investigation. This blog series will examine these issues closely, with a focus on the challenges and opportunities emerging data sources present in investigations and litigation.
In this this first post, I’ll cover the implications emerging data sources have introduced to traditional e-discovery scoping via custodians, and how taking a granular view that is aligned with the nature of the data can help reduce review volumes, control costs and deliver more dynamic e-discovery workflows.
The Limitations of Custodian-Based Collections
Most legal teams are familiar with the standard practice of scoping document collection and review for litigation and regulatory investigations by custodian—wherein in-house and external counsel work together to develop a defensible list of employees who are “custodians” of data relevant to the matter at hand. Traditionally, once the custodian list is identified, it is relatively easy to collect all relevant data sources associated with their unique employee ID. However, emerging data sources often utilise a single-instance storage system, wherein only one copy of a file is stored and presented to users who have permission to access it, as opposed to traditional platforms that store many instances of a document. From a business perspective, this optimises efficiency, however, in legal and investigative matters, it creates numerous risks and challenges, which include:
- Documents and other records that may one day become important electronic evidence in a matter are now living artifacts with varied and fluid permissions. One individual creates a document, then passes ownership to another individual, who gives editing permissions to one group of colleagues and temporary viewing-only permissions to another, and so on. Physical custody of a document doesn’t by default mean that the specific custodian was the principal architect of the content, or even a knowing or active participant engaging with it.
- An author can be just the start of a document’s lifecycle, which makes it difficult to define and isolate what should be included in each custodian’s file. It’s becoming increasingly important to understand who interacted at various stages of that lifecycle and who was responsible when it became a finished product.
- Pinning down which version of a document is relevant to a case, determining what constitutes an attachment and deciphering which individuals interacted with a document and in what way is no longer a straightforward issue. Investigators and e-discovery professionals must now be especially careful not to assume that a custodian who has access to a document at the time of collection also had access to it at any other key point in time.
A New Strategy for New Data Types
No matter the data source involved, the investigative questions remain the same: which content is responsive and who was involved in its construction and dissemination. And while the nuances of emerging data sources can create complexities in answering these questions, they also offer opportunities for teams to reduce costs and better understand a user’s or document’s journey. Rethinking the discovery strategy so that teams are leveraging the tooling to better capture and understand valuable forensic artifacts early on in the process will ultimately lead to downstream cost savings.
Specifically, a “single-source collection” approach allows teams to identify and collect one copy of a document that multiple custodians had certain permissions to, rather than the legacy custodian-based approach that results in a high volume of redundant copies. This approach combined with investigations tactics that examine who had varying levels of access to key documents and when serves the dual purpose of narrowing the overall review set (which reduces discovery costs) and revealing key insights quickly.
For example, in a recent competition matter, our team was engaged to help the client produce a large volume of documents from a cloud-based office suite platform, for a specific set of custodians. The team crafted a single-source collection approach, enabling them to extract a single copy of documents that multiple “custodians” had access to. This was coupled with a decision to omit any documents wherein a custodian held only “viewer” permissions and focus solely on the documents for which custodians held “author” and “collaborator” permissions, which focused the collection further. Because viewer permissions are often applied quite broadly in many organisations, this strategy excluded a large pool of documents from the review—reducing an initial population of tens of terabytes of data by roughly 80% for review, and further down to less than a terabyte of data ultimately produced. This approach allowed the team to work more efficiently toward the tight production deadline and significantly reduce review costs at the same time.
This is just one of many cases in which experts have leveraged the nuances in emerging data sources to narrow a dataset in the collection phase rather than post-collection, where processing, analysis and review of large datasets become expensive and time-intensive. In Part 2 of this series, I’ll expand on these concepts in a discussion of issues that can arise when simple search functions are applied to emerging data sources as a means of data collection, and the costly downstream effects that can result when certain limitations are overlooked.
The views expressed herein are those of the author(s) and not necessarily the views of FTI Consulting, its management, its subsidiaries, its affiliates, or its other professionals.