Like many law schools around the world, my parent institution of Suffolk University is grappling with how and whether to use the new generative AI tools. My lab is based in the law school clinic. So that’s where I pay special attention. If you follow me, it won’t be surprising to know that I think law school clinics should be using generative AI but that doing it well takes thought and planning. I have some advice and cautions below.
Clinical education is a classic form of praxis. Students are invited to learn by doing while also serving a client population that needs our help. So it’s an open question whether generative AI is a good fit. The tools right now are deceptively simple to use, and also easy to use incorrectly. Many smart practitioners are worried that giving these tools to students will deprive them of the well understood benefits that traditional clinics provide. These concerns are justified, but in my opinion depriving students of learning how to use generative AI well is like depriving them of using the Internet. Bans won’t work, and it will become ubiquitous. Stopping students from learning this now will put them at a disadvantage later, despite the messiness of early adoption and its attendant risks.
Generative AI isn’t perfect, but there are safe uses that will help students, clients, and serve the broader societal mission that outweigh the risks
The easiest to understand ways to use ChatGPT or Bard are probably writing content (with very little initial input) or as a more focused search tool. A classic homework dodge, like “write me an essay on the theme of love in Romeo and Juliet.” These aren’t safe use cases for novices in any field, because new learners won’t be able to critically evaluate the output. ChatGPT also isn’t especially good for these uses. It tends to produce pretty generic and boring output when used this way. If this is as far as you get in exploring generative AI, it’s understandable that you’ll be concerned.
What are some safer uses?
- Solving the blank page problem for new tasks
- Extracting information
- As an editor, critic, or proofreader
- Prompting for deeper answers
- Translating, simplifying, summarizing, synthesizing
- Chat interface to vetted knowledge bases
I’ll talk about these use cases below. But let’s start by addressing the most concerning risks head-on.
What are the risks?
- Teaching time is precious. Is generative AI the right thing to teach?
- Generative AI makes stuff up. Can we really trust our students to use it safely?
- Generative AI has well-known biases. Will encouraging its use perpetuate an unfair system?
- Generative AI can fail in subtle ways. Will we end up with a generation of lawyers who are asleep behind the wheel, blindly following ChatGPT down wrong or dangerous avenues? (This risk is sometimes called automation bias)
Is generative AI the right thing to teach?
Teaching time certainly is precious. In my seminar this fall, it was hard work to remove 2 lessons to make room for a module on generative AI. I liked those old lessons. But I felt good about the tradeoffs to make the space. This semester, I’ve integrated generative AI into several weeks as a supplement rather than the whole lesson.
One special concern about generative AI is that it’s not yet a tool we can trust 100%. If we teach how to use it, we’re necessarily giving students less time to practice the very skills that they’ll need in that 20-30% of the time that AI gets things wrong. This can be a hard pill to swallow. Can students gain the expertise to fix a misleading memo section when they’re practicing the skill, say, 50% less often?
My recommendation: don’t drop these lessons. Teach them as you normally would for now, but spend time helping your students discuss how to safely use ChatGPT to enhance the work. For example: if you do a writing exercise, have the students ask ChatGPT for feedback, to simplify language, or proof their work. If you are doing a creative or story telling exercise (such as helping draft a closing argument), have the student brainstorm with ChatGPT and then evaluate the best ideas. These can supplement teacher feedback without robbing the students of the experience of doing the work.
Should we focusing on teaching very specific skills, like prompt engineering? Probably not as a standalone skill, but it’s a good idea to assist students with it when it helps with a more specific goal. Helping students understand how the tools function well and where they fail will translate across different tools. Prompt engineering in particular may not be around in its current form in a year, but it certainly gives insight behind the magic curtain of the large language model.
How to think about those cutely named “hallucinations”
Generative AI can invent facts. It is, after all, a fancy randomized autocompletion tool. A “stochastic parrot” as Emily Bender et. al. put it. There’s nothing inherent that we quite understand yet about generating lots of probable sentences that also makes them true about the world. Certainly we don’t think that LLMs have a model of truth that is like our human one.
We should almost never rely on a large language model as a database of facts. For example, it doesn’t have a factual list of all of the holdings of thousands of federal court cases. This isn’t a barrier to many use cases. We don’t always need databases of facts to do our daily work. For example: we spend more time editing our words than writing them, in most cases. We can always ask ChatGPT to reason over or transform information that we supply without needing that database of facts.
Note though: the fact that large language models include a bias to more frequent sentences means that some sentences that are very probable also turn out to be true.
This turns out to be pretty useful; if we want to make statements that are true about the world, an LLM can do a nice job for sentences that are very frequently represented in the training data. It is unlikely to create true sentences when they are further on the “long tail” of probability. Piek Vossen, who founded the Global Wordnet Association, spoke about this at the 2023 Jurix conference on Legal Knowledge and Information Systems. Wordnet uses a symbolic logic approach. Only perfectly curated, true facts are added in a “semantic network.” Piek described that LLMs have immediately transformed that work. It turns out that ChatGPT can be more useful because it represents so many facts than the symbolic logic approach, despite the risk of inaccuracy. The important factor is awareness of that long tail where it’s most dangerous. Using the two in combination can improve the usefulness.
Even in these “safe” tasks, there’s always a risk that ChatGPT’s transformation, summary, or analysis of our text will be wrong. But we should try to choose tasks where we can evaluate the truthfulness of the work that ChatGPT does.
Will using ChatGPT perpetuate bias?
There’s no denying that ChatGPT can reproduce biases. So can humans. It is important when choosing an AI tool to make decisions that we are aware of the problems of biased training data. We don’t want to use machine learning models to set higher bail for poor, Black people just because that’s historically what happens, or to make medical decisions that replicate debunked medical racism.
Understanding bias, and its presence in ChatGPT, can sometimes help us work around it. Sometimes we can fix bias piecemeal by hand, just like we can fix factual inaccuracies. We can also eliminate the most risky use cases. Just like with the factual accuracy question, it doesn’t eliminate the areas where ChatGPT can be used safely.
We are also not looking at the final version of these tools. We should put pressure on the vendors to improve the training data and perhaps to improve moderation (although this can be its own problem as I recently discussed on LinkedIn).
Will we end up with incapable lawyers?
There are probably 2 concerns here. Automation bias is a fancy term for being asleep at the wheel. Famously, this has been blamed for causing several Tesla crashes (although the most recent jury decision on this held Tesla not liable). Logically, it makes sense that if lawyers use ChatGPT for their work and it works almost all of the time, they might not catch the times that it fails.
But using ChatGPT is not the same as driving a car. We can slow down and critically evaluate the output each time we use ChatGPT. We can thousands of simulations for use cases that run without supervision. Training lawyers, who will likely use ChatGPT if it’s useful and saves them time, how to do this kind of critical evaluation of different use cases, is going to help cut down on the erroneous ones.
How ChatGPT can serve law students in clinics
Get beyond the blank page
It turns out a lot of problems can be represented as text. Computer code, for example, is just text. And LLMs can generate it quickly. Often reacting to a first draft of writing, or a prototype of a problem that you solve with computer code, can help you gain important insights. We just need to remember when we do this that the first draft can contain mistakes and use a critical eye.
Yes, there is a danger that we will be anchored to that first draft. We can use strategies like varied prompting and directly asking the tool for counterexamples, but ultimately we’ll need to use careful judgment here.
A recent paper from the Wharton School did a fascinating experiment recently that demonstrated that ChatGPT could perform almost as well as human teams at brainstorming tasks. Getting a hundred ideas to evaluate can be critical to starting a new project off on the right foot.
Extracting and classifying information
We have built many machine learning models around the idea of classifying or extracting information. The field of natural language processing, for example, often focuses on tasks like getting names, addresses, and facts like amounts out of unstructured text. These tools work well on well described tasks but can need expensive training. ChatGPT has been shown to work well out of the box at a wide range of these tasks. As always, it can take evaluation to know if works well or not for a very particular task.
Similarly, a classic litigation task is to go through documents to decide what is responsive to discovery, or to provide an appropriate response to an improper objection to a discovery request. This kind of task can be done by classical machine learning models, but out of the box ChatGPT does it well across a wide range of fields.
You can use the LLM for any task that involves a rubric applied to text. This can be useful for students to grade their own practice work using criteria that you supply.
On the smaller scale, at Suffolk we have been actively experimenting with using LLMs to allow users to use a more natural interface to a form. If the user wants to start out by telling their whole story, we can use the LLM to look through their response and assign the answer to questions on the form. We can allow the user to review their selection. This also works for working with unstructured notes and inputting information into case management systems. This can be huge time saver for tedious data entry tasks, both internal and client-facing.
Editor, critic, or proofreader
ChatGPT never gets tired of reading our writing over and over again and offering suggestions for improvement. A few favorite prompts of mine after feeding it a draft of a paper:
- What questions would someone have after reading this?
- What’s missing?
- What’s unclear?
- How could this be improved?
- How could the order be strengthened?
When the student starts out by supplying the draft, the editor role can provide constant feedback that maybe isn’t as possible to supply from peers or supervisors.
Prompting for deeper answers
This year in my seminar, we’ve been using this simple Docassemble interview to get student reflections. Students write a response to the week’s readings. The interview takes their response, checks it against a rubric, and then asks as many follow-up questions as are required to meet the rubric. In past years, at best I was able to add a small number of longer comments in reply to student reflections. This year, each student is getting a chance to reflect at least a bit deeper. I got the idea for this from a pretty interesting Slate article that covers using ChatGPT to help with course design, but wrapped up a little more neatly to integrate with our course. The article talks about a way to do something similar just by asking students to use ChatGPT to evaluate their reflections.
Translating, simplifying, summarizing, synthesizing
Have a bunch of text and something you want to do to it? ChatGPT is quite safe for this task, although there’s still a risk of hallucination. (These tend to be errors in the summary, rather than outright fabrications.)
- ChatGPT is generally recognized to outperform first-generation GPT powered language translation tools like Google Translate (note: this isn’t OpenAI. GPT refers to the generative pre-trained transformer first written about at Google). This makes it a realistic candidate for drafting machine translations. Importantly, you can give it extra instructions, like telling it to leave HTML tags and the like alone.
- I’ve tested both ChatGPT and Google Bard’s new Gemini tool for simplifying language. A sample prompt: “Rewrite the following letter at a 6th grade reading level. Replace passive voice with active voice.” This can be very helpful tool to help law students with this daunting task in client communications.
- Summarizing: have a lot of text and want to provide a TL;DR? This is a great task for an LLM. For example: you might have a decision from a judge or ALJ that you want to explain to your client.
- Synthesizing: have a bunch of notes from different sources? Want to create a cohesive narrative? With the Pro version of ChatGPT, you can feed it a knowledge base and ask it questions. Prepping for a trial? Asking ChatGPT to come up with the most cohesive story, especially if you give it the text of the relevant statute and your case materials, can be a useful way to get started. Think of this as helping you find the needle in a haystack. You may need to try with different queries to gain confidence you’re not missing something, but this can greatly speed up your work.
Chat interface to vetted knowledge bases
Retrieval augmented generation is one of the current best approaches to solving the problem of hallucinations, and there are dozens of ways to do it. Cornell Law students, along with the teams at Josef Q and SixFifty, had a very interesting recent project that involved creating a knowledgebase of housing questions and answers. The Josef Q tool allows to blend retrieval augmented generation with traditional moderation, giving the best of both worlds.
If you don’t want to use a tool like Josef Q, OpenAI Pro’s new Custom GPT feature provides some of the same benefits, without the integrated moderation. There are also several open source projects that help you quickly create a tool built around RAG.
Don’t fear the GPT-er
There’s a current hype cycle around LLMs. But don’t let that fool you. Cory Doctorow‘s piece on this is great. In What kind of bubble is AI? Doctorow explores parallels to the great .COM boom of the late 1990s/early 2000s. Many companies scammed and hyped their way to big dollars and crashed. Yet the useful companies survived and now are the backbone of the global stock market. It’s undeniable that the Internet has totally transformed the way we work today. Doctorow hints at this, but I’ll come out and say it: there are scammers and hype artists around AI right now. But generative AI has real, practical uses. It’s getting cheaper and better. It’s not going to go anywhere, and ignoring it in clinical practice will harm your law students.
Your commitment doesn’t have to be big. Even a few tacked-on lessons to what you already teach that explore where it works and where it fails will serve your students well, and prevent them from sneaking it in behind your back in unsafe ways!