GPT-4: A Game-Changing Technology for the Legal Profession

Category Machine Learning

tldr #

Casetext recently deployed GPT-4, a new Large Language Model, to pass the Uniform Bar Exam. GPT-4’s success was made possible by the exponential advances in neural networks, loosely based on neurons, that can interpret text and become more powerful the more data it is fed. This research is indicative of the possibility for AI to assist the legal profession in the future.

content #

CodeX–The Stanford Center for Legal Informatics and the legal technology company Casetext recently announced what they called "a watershed moment." Research collaborators had deployed GPT-4, the latest generation Large Language Model (LLM), to take—and pass—the Uniform Bar Exam (UBE). GPT-4 didn't just squeak by. It passed the multiple-choice portion of the exam and both components of the written portion, exceeding not only all prior LLM's scores, but also the average score of real-life bar exam takers, scoring in the 90th percentile.

GPT-4 is the first large language model smart enough to power professional-grade AI products.

Casetext's Chief Innovation Officer and co-founder Pablo Arredondo, JD '05, who is a Codex fellow, collaborated with CodeX-affiliated faculty Daniel Katz and Michael Bommarito to study GPT-4's performance on the UBE. In earlier work, Katz and Bommarito found that a LLM released in late 2022 was unable to pass the multiple-choice portion of the UBE. "GPT-4 Passes the Bar Exam," their working paper recently released on the SSRN Electronic Journal, quickly caught the national attention. Even The Late Show with Steven Colbert had a bit of comedic fun with the notion of robo-lawyers running late-night TV ads looking for slip-and-fall clients.

GPT-4 scored in the 90th percentile in the Uniform Bar Exam.

However for Arredondo and his collaborators, this is serious business. While GPT-4 alone isn't sufficient for professional use by lawyers, he says, it is the first large language model "smart enough" to power professional-grade AI products.

Here Arredondo discusses what this breakthrough in AI means for the legal profession and for the evolution of products like the ones Casetext is developing.

--- What technological strides account for the huge leap forward from GPT-3 to GPT-4 with regard to its ability to interpret text and its facility with the bar exam? --- .

GPT-3.5 failed the bar, scoring roughly in the bottom 10th percentile.

If you take a broad view, the technological strides behind this new generation of AI began 80 years ago when the first computational models of neurons were created (McCulloch-Pitts Neuron). Recent advances—including GPT-4—have been powered by neural nets, a type of AI that is loosely based on neurons and includes natural language processing. I would be remiss not to point you to the fantastic article by Stanford Professor Chris Manning, director of the Stanford Artificial Intelligence Laboratory. The first few pages provide a fantastic history leading up to the current models.

GPT-4 is an AI program made by OpenAI, a for-profit research lab based in San Francisco.

--- You say that computational technologies have struggled with natural language processing and complex or domain-specific tasks like those in the law, but with advancing capabilities of large language models—and GPT-4—you sought to demonstrate the potential in law. Can you talk about language models and how they have improved, specifically for law? If it's a learning model, does that mean that the more this technology is used in the legal profession (or the more it takes the bar exam) the better it becomes/more useful it is to the legal profession? --- .

GPT-4 is powered by a neural net, loosely based on neurons.

Large language models are advancing at a breathtaking rate. One vivid illustration is the result of the study I worked on with law professors and Stanford CodeX fellows Dan Katz and Michael Bommarito. We found that while GPT-3.5 failed the bar, scoring roughly in the bottom 10th percentile, GPT-4 not only passed but approached 90th percentile. These gains are driven by the scale of the underlying models more than any new-fangled model architecture.

The more data these models are exposed to, the better they become.

At the heart of large language models are neural networks. The more data these models are exposed to—which, from a language perspective, means the more text they are exposed—the better they become. In other words, AI is an example of a “learning model.” .

This phenomenon—feeding large amounts of data such that a model makes better, more accurate predictions—is a core part of the machine learning ethos, with increasing amounts of data generating increasingly accurate predictions.

So, the more experience these models gather, the better they will become. We can think of this kind of repetitive experience—taking the bar over and over—as what these models use to “learn” how to answer problems posed on the bar.

hashtags #
worddensity #