The digital universe will explode to 40 trillion GB of data by 2020. The total Internet of Things market is forecasted to hit $520 billion in the next two years, and will account for at least 10 percent of the total digital universe. We’re talking 4 trillion GB of IoT data in the foreseeable future. All this data, and the ways in which it is collected, stored and secured is intersecting with the blending of corporate and personal worlds, raising important questions about data privilege, compliance and the future of digital forensics.

Connected devices are always listening and recording. And once data is on a device, it becomes ubiquitous, extending from the device to its related apps, other devices it interacts with and cloud databases. But what information lives within all that data? Where is it stored? Who has access to it? Can it be recovered? The range of implications and opportunities that arise in these questions are far reaching and complicated. We’re already seeing a flurry of news reports about data privacy violations relating to IoT data, malicious actors leveraging these devices for their own gain and information from connected devices entering courtrooms as evidence.

Looking at this through the lens of digital forensics, and the emerging need to collect IoT data for investigations and litigation, we must begin to think about the corporate risks, and how we can tap into connected devices for practical needs and uses. In an effort to uncover some of the mystery around IoT-generated data, our team conducted forensic testing on several devices, including the Echo Dot and Google Home. Our research looked at where and how data resides on these devices, how they transfer it to the cloud and amongst other devices, and the most effective approaches for forensic collection. An overview of the findings is below.

  1. Data stored locally on the device: On the Echo Dot, Alexa always remains listening so that she can be situationally aware for commands and questions from users. Therefore, 60 seconds of audio used for this pre-processing is always stored on the local device. The clip is continually overwritten with the next most recent 60 seconds. When a request is made of the device using a preset wake-word, a half second of audio preceding the command and the command are sent to the cloud and stored. While the local audio files would rarely be needed for a forensic investigation, they can be collected via a "chip-off" process, which requires physical removal of the device’s chip, and once removed, analysis by specialized technology. Of note, when our team researched collection of data stored on the smartphone paired with the device, multiple findings confirmed that this data was encrypted. The results of our testing on Google Home were comparable.

  2. Data stored in the cloud: With access to the user credentials, cloud data can be collected from the smartphone app/account paired with the devices. Information available in the cloud includes:
    • Device Information – including username/password, serial number and software version
    • Alexa/Google Enabled Devices – additional devices the user has associated with their account
    • Skills – such as third-party apps added to device (this allows investigators to understand what other apps the user is using); on Google Home, these include YouTube, Pandora and Google Nest – any data requested of these services and others through the Google Home will be recorded and stored
    • User Activity – any type of command a user gives to Alexa or Google is stored as a .wav file (audio file); individual recordings are stored indefinitely and can be pulled and replayed
    • Cards – on the Echo Dot, graphical cards enhance the voice interactions and card history can be accessed and collected from the cloud; for example, if a user asks Alexa for a recipe, she will save a card with each step of the recipe in the user’s activity, so it can be referenced as needed

    Alexa data is also available through an API using the user credentials. The network traffic is transferred over an encrypted connection, but the native artifacts can be returned in a readable format. The data listed above can be accessed in this way, as well as network configurations, all groups the user has associated with the account, the last 50 activities performed by Alexa on any device that has Alexa enabled and specific voice recordings.

    On Google Home, to further correlate commands recorded for third-party apps such as Nest, investigators can identify and pull related Nest sqlite databases and plist files that can be found in a user’s iPhone backup. These artifacts can reveal when someone left or arrived home, or manually changed the temperature, which can be helpful data points during an investigation.

  3. Deleted data: Our team took the following steps to understand how/if deleted data could be recovered.
    • Using Oxygen Cloud Extractor, collected an Alexa user account and preserved the collection.
    • After preserving the collection randomly deleted a variety of commands from the app’s activity and history.
    • Re-collected the Alexa account to understand if the deleted entries could be recovered.
    • Upon comparing and forensically analyzing the two collections, we found that once an entry is deleted, it is non-recoverable.
  4. Additional discoveries: The Alexa app includes an activity section as well as a history section. The difference between these two sections is the activity section is on the home screen, easy to navigate and easy for the user to delete activity. The activities section contains the cards that were generated via commands. Conversely, the history section contains every command ever spoken to Alexa. History is accessible to the user, but lives deep within the app, thus difficult to find for users that are not technologically savvy. This detailed history helps Alexa continue to learn and work smarter for the user. From a forensic standpoint, it would be potentially beneficial for finding critical information – in a sensitive case, a user may think they have deleted certain evidence, but it could still reside in the history.

    The Alexa also has a configurable feature called "drop-in," which acts like an intercom between different Alexa enabled devices. Two devices can be connected if allowed access to one another, and once connected, a user has the ability to drop-in and listen without the end user’s ability to accept or deny the connection. Alexa also actively records conversations that are preceded with what the device interprets as a command word. Our testing found .wav files of the test subjects having a casual conversation, following a word that sounded enough like preset wake-words. These types of recordings may be relevant during an investigation, but their storage also introduces privacy considerations that organizations should be aware of.

    The testing outlined above provides only a snapshot of the type of information that may be accessible from IoT devices in the event of an investigation or litigation that required collection from them. It’s important to note that the information available through a user’s account is only a small fraction of the larger pool of data that resides with the companies that make these devices. With more and more people working remotely, and using these devices in their home offices, corporations must be aware of the implications, and the ways in which IoT information can impact both their legal matters and compliance initiatives.