For digital forensics investigators, time travel is no far-fetched sci-fi fantasy, it’s part of the job. Every day, forensic technology specialists use data to go back in time and piece together facts and clues that tell the story behind a regulatory violation, legal dispute, or other high-stakes incident. This work requires technical and analytical skills across a wide range of tools and data sources, many of which are currently emerging, and evolving all the time. As corporate data footprints grow increasingly large and diverse, so do the places where investigators must look for evidence.
One of the more nuanced data types that’s particularly interesting is archived and historic web content used to reconstruct the activities and messaging of an organization as it relates to a legal or regulatory matter. In recent years, a number of environmental, social and governance (ESG)-related litigations have required website reconstruction to determine whether and how an organization changed its external communications and messaging before, during and after major events or allegations. For example, in one ongoing client matter, our team has been tasked with collecting and tracking specific website data and content dating back to 2014, from 10 priority websites and more than 100 additional sites of interest—essentially, we’re looking back in time through the internet.
The Internet Archive, a non-profit that provides a free digital library of websites and online content, has created the Wayback Machine, which allows anyone to search and capture archived content. The Wayback Machine contains more than 25 years of web history, more than 590 billion archived web pages and millions of additional digital media artifacts. Using the Wayback Machine, investigators can collect information from legacy sites and document the transitions in messaging, website copy, and other related files over time. Doing this in the context of other evidence can support website reconstruction to reveal the big picture of an organization’s activities and positions as they relate to a litigation or regulatory matter.
Using APIs and other bespoke software programming, our team developed methods to extract archived web content from the Wayback Machine and interface with it for e-discovery analysis and review. However, conducting this type of work is highly complex, requires extensive technical expertise and is laden with potential pitfalls.
While the Wayback Machine is a powerful and useful resource, not everything that was ever online will be readily available for recovery and reconstruction. Sites are not set up with a periodic capture and retention function that records in the archive automatically, and because these archives only include pages that an individual has manually preserved, the content is somewhat random across disparate dates and sites. Additionally, some sites have security blocks that restrict any archiving at all, and archives may contain limited captures, e.g., a site’s homepage as opposed to a complete capture.
So, legal teams looking to conduct website reconstruction need to understand that what’s available in the archives is limited to what’s been captured and is publicly available or accessible through other specialized sources. In this type of matter, legal teams will benefit from consulting with digital forensics experts to first clarify and document what is and isn’t accessible from the Wayback Machine and other sources. With this insight, an investigation strategy and workflow can be developed to account for the known gaps and limitations.
Digital forensics investigators are indeed a kind of time traveler, and the complicated work we are able to do can sometimes leave the impression that all digital evidence is recoverable; but even the most brilliant time travelers are bound by rules and limitations (here’s to you, Doc Brown). This is why investigations must be supported by a team with expertise in numerous investigative tools and methodologies. It’s only through hands-on experience with the challenges that commonly arise when investigating across uncommon data sources that teams will ensure they are following the surest and most efficient strategy for their cases.
The views expressed herein are those of the author(s) and not necessarily the views of FTI Consulting, its management, its subsidiaries, its affiliates, or its other professionals.