Kate Crawford, an artificial intelligence academic and researcher, recently spoke about ChatGPT as “an exponential disruption.” Indeed, the technology is anticipated to bring significant change in many facets of business and society. While to date the most common uses of ChatGPT have been to help students write essays, marketers create copy and developers debug code, there’s a vast landscape of uses that lie ahead.
The legal industry is one area ripe for potential impact. Lawyers are beginning to consider whether ChatGPT can help write contracts, act as a legal assistant or support e-discovery. Indeed, one key area we are focusing on includes data interrogation to aid in e-discovery.
Our teams have engaged in rigorous R&D and testing activity with ChatGPT to understand how it may be used in (or create challenges for) e-discovery, and what some of the finer technical considerations will be as this technology seeps into the legal sphere.
At a high level, ChatGPT presents the opportunity in e-discovery to ask questions like...
- “What did [custodian] do in November?”
- “What is happening in this set of documents?”
- “What is the environmental, social and governance policy at [company name]?”
- “Summarize this document”
...and receive answers in natural language.
However, to get meaningful results, teams would need access to skilled development resources to communicate with the model and configure the data. There are also inherent limitations with the current technology, such as its ability to consider large volumes of words or characters in one attempt, and potential security concerns when loading sensitive information into the model, which will be improved in the near future with the introduction of Azure’s Open AI service. Throughout our testing, we are using sample data (not active or sensitive information) and will examine limitations and risk areas in detail.
In one of our first use cases, our testing covered document summarization, as the ability to quickly gain an insight into key document aspects by reviewing summaries could be a very useful feature in the e-discovery field.
We observed that responses to questions are significantly affected by the way the queries are phrased. The crafting of these prompts is a term known as prompt engineering.
Role-based prompt engineering
Prompt engineering is the process of designing and fine-tuning prompts for a language model to help it provide a more appropriate response, and therefore improve its performance. This is achieved by giving the model context and direction to generate the most accurate response.
To explore this, we asked ChatGPT to act in a variety of roles when summarizing a sample document (as if in the course of e-discovery), and then compared the changes between each of the responses. We also asked ChatGPT to adapt its responses according to different hypothetical audiences.
What we found is that the level of detail and the type of language used varied significantly dependent upon the direction used in the prompts. When ChatGPT was acting as a forensic investigator, it provided a detailed summary of the document, whereas when it was acting as a 10-year-old child, it over-simplified the content. Acting as an artist, it summarized the document with a creative twist, and as a lawyer, it used language common in the legal field. Likewise, when the audience was changed, the output was different in terms of language and depth of detail.
Finally, we asked ChatGPT to summarize the document, and provided very specific details about the role we wanted it to emulate, the point of view it should take, the context of the situation and the audience, etc. The results of this highly directed prompt were starkly different than the results of the prompt that asked for a plain summary only.
It’s clear in the response that when the forensic investigator role is asked specifically about disgruntled employees, it is able to extract information related to that, however when asked only to summarize the document, there is no reference to potential disgruntlement. In an e-discovery matter, without the correct prompt, this important detail could be easily missed.
These tests provide insight into how prompts can be optimized for potential e-discovery use cases. Moreover, they underscore the fact that the way in which questions are asked is one of the most important aspects of using ChatGPT effectively. If prompts aren’t properly formed, they may result in incorrect answers — which, in e-discovery, could be especially problematic if the answer seems correct to the user, and/or if the user isn’t a subject matter expert. These are commonly known as hallucinations by the tool, and we will discuss this in more detail in our next post.
Most experts agree that generative AI has reached an inflection point. In a legal context, there’s the potential for it to be a powerful tool, but like other advanced technologies, it will be important for legal teams to leverage it in a thoughtful, defensible manner, supported by expertise and proven tools, processes and best practices.
In the coming weeks and months, we’ll continue to share additional testing and findings, including how ChatGPT performs when asked to provide document summarizations and review of multiple documents. Our teams are also exploring security considerations, scalability, custom models, self-hosted GPT alternatives, API vs. web interfaces, output accuracy and functionality in Azure OpenAI Service, Amazon Titan and Google Bard.
The views expressed herein are those of the author(s) and not necessarily the views of FTI Consulting, its management, its subsidiaries, its affiliates, or its other professionals.