This article analyzes the creative imagination capacities of ChatGPT and two software settings to control it: Temperature and Nucleus Sampling (aka Top-P). It is easy to change these parameters, as will be shown by multiple examples. Knowing how to use these settings will make your work with Ai more enjoyable, productive and accurate.
According to Sam Altman, OpenAI’s CEO, when you reduce the Temperature and Top-P values, you not only reduce the creativity of the responses, you reduce the chances of error and hallucinations. This will be explained in the context of Sam Altman’s deep understanding of creativity and the role of mistakes in the creative process.
The insights and software control skills explained in this article can empower anyone to dial in the right balance of creativity and mistakes for a particular ChatGPT-4 assisted project. Lawyers, for instance, may, for most of their uses, want to lower the default settings, which are high on creativity. This should improve the probability of accurate, delusion free answers. That makes most lawyers and their bots very happy. Judges too. All of this, and more, will be explained.
Sam Altman on GPT Creativity
In the What is the Difference Between Human Intelligence and Machine Intelligence? blog I quoted portions of Sam Altman’s video interview at an event in India by the Economic Times to show his “tool not a creature” insight. There is another Q&A exchange in that same YouTube video starting at 1:09:05, that addresses creativity and mistakes.
Questioner (paraphrased): [I]t’s human to make mistakes. All people we love make mistakes. But an Ai can become error free. It will then have much better conservations with you than the humans you love. So, the AI will eventually replace the imperfect ones you love, the Ai will become the perfect lover.
Sam Altman: Do you want that? (laughter)
Sam Altman: (Sam explains AI is a tool not a creature, as I have quoted before, and before addressing the intimacy lover aspect, as I quoted in Code of Ethics for “Empathetic” Generative AI, Sam talks about creativity.) On the question of mistakes and errors, I believe that creativity and certainly the creation of new knowledge, is very difficult, maybe impossible, without the ability to make errors and come up with bad ideas. So, if you made a system that would never tell you anything that it was not absolutely sure was a fact, you would lose some creativity in that process.
One of the reasons people don’t like ChatGPT is because it hallucinates and makes stuff up, but one of the reasons they do like it is because it can be creative. What we want is a system than can be creative when you want, which means sometimes being wrong, or saying something you are not sure about, or experimenting with a new idea. Then when you want accuracy, you can get accuracy.Sam Altman, June 7, 2023, at an Economic Times event in India
Sam Altman’s answer here assumes you know about ChatGPT’s creativity volume controls, where you can, if you want, turn the creativity volume down to zero. In so doing, you will improve accuracy, but the response will often be boring. Boring, but accurate, may be just what you want sometimes, but that is not the default setting for ChatGPT, as will be explained and demonstrated.
ChatGPT Creativity Settings
This section provides a technical explanation of these two settings. Much of this is difficult to understand, but worry not, and plough through it, because after this comes an easy to follow demonstration of what it all means. Multiple examples will be provided to allow you to see for yourself how the GPT controls work in practice. That is the hacker “hands on” e-Discovery Team way.
First, the technical explanation of the two volume controls for GPT creativity: Temperature and Nucleus Sampling (aka Top-P). Both typically have settings of between zero and one, 0.0 and 1.0.
TEMPERATURE: Technically temperature affects the probability distribution over the possible tokens at each step of the generation process. A temperature of 0 would make the model completely deterministic, always choosing the most likely token. The “temperature” setting in GPT and similar language models, such as ChatGPT, controls the randomness of the model’s responses. A higher temperature value makes the model’s responses more random, while a lower, cooler value, makes the responses more deterministic and focused. See eg. Cheat Sheet: Mastering Temperature and Top_p in ChatGPT API (OpenAI Forum). Temperature values are said to produce a more focused, consistent, and deterministic output. It is like going from water, the higher 1.0 value, to ice, the colder, more probable value of 0.0.
Typically, OpenAI experts say a higher temperature (e.g., 0.8) may be suitable when you want a range of ideas, brainstorming suggestions, or creative writing prompts. A lower temperature (e.g., 0.2) is more appropriate when you’re looking for a precise answer, a more formal response, or when the context demands consistency. The default setting for ChatGPT 3.5 and 4.0 is 0.7. That’s pretty hot, especially for most legal work. No doubt OpenAI have put a lot of research into that default setting, but I could not find it. Id., Also see: Prakash Selvakumar, Text Generation with Temperature and Top-p Sampling in GPT Models: An In-Depth Guide (4/29/23).
Open AI says that finding the right temperature setting may require experimentation to strike a balance between creativity and consistency that suits your specific needs. Id. This sounds like a good lawyer answer of “it depends.” There are a tremendous number of variables, different questions and needs, different circumstances. That is the same situation lawyers are in with many legal questions. Plus the OpenAI software itself is constantly being updated, even though the version number of 4.0 has not been changed since March 2023.
The Bottom line for lawyers is that the default setting of 0.7 is pretty high in the random predictions scale. Unless you are looking for clever, very creative language or legal imagination – off the wall ideas – lawyers and judges should use a lower setting. Maybe dial down the random creativeness to 0.2, or even zero – 0.0 – for maximum route parroting of the most probable information. You just want the cold truth.
As Sam Altman explained, lowering the temperature setting also makes it more likely that your answers will not have as many mistakes or hallucinations. Note that I did not say no mistakes, the software is too new, and life is too complex to say that. Human lawyers are still needed to verify the Ai. Just because it appears much smarter than you, it can still be wrong, no matter how conservative the temperature setting. Think of the brilliant, very creative, higher IQ than you, conservatively dressed, young associate with little or no actual legal experience.
NUCLEUS SAMPLING (aka TOP-P): Top-P sampling is an alternative to temperature sampling. Technically, this means that instead of considering all probable tokens that are likely to come next, the Top-P parameter directs GPT to consider only a subset of all probable tokens (the nucleus) whose cumulative probability mass adds up to a certain threshold, the (Top-Probablity). For example, if Top_P is set to 0.1, GPT will consider only the tokens that make up the top 10% of the probability mass for the next token. This allows for dynamic vocabulary selection based on context. The setting values for Top-P are, like temperature, between 0.0 and 1.0.
Put another way, the Top-P sampling parameter maintains a balance between diversity and high-probability words by selecting tokens from the Top-P most probable tokens. They are the tokens whose collective probability is greater than or equal to a specified threshold p. The Top-P parameter helps ensure that the chatbot response is both diverse and relevant to the given context. Text Generation with Temperature and Top-p Sampling in GPT Models: An In-Depth Guide
For greater technical detail, see the scientific paper: The Curious Case of Neural Text Degeneration (2019). The paper abstract explains:
The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, using likelihood as a decoding objective leads to text that is bland and strangely repetitive. . . . Our findings motivate Nucleus Sampling, a simple but effective method to draw the best out of neural generation. By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, Yejin Choi
From my study, the Top-P prompt parameter is seldom employed in ChatGPT. Most users prefer to just stick with the Temperature control variation. But still, try it a few times and see what you think, especially if temperature control variations are not working well for some reason on a particular challenge. Software customization is always a good thing. I have not used it enough myself to have any more specific conclusions or recommendations.
As Prakash Selvakumar says in his good article, Text Generation with Temperature and Top-p Sampling in GPT Models: An In-Depth Guide:
When and How to Tweak These Parameters As a business user, you might need to tweak these parameters to get the desired output quality, depending on the specific use case. Temperature:
- If the generated text is too random and lacks coherence, consider lowering the temperature value.
- If the generated text is too focused and repetitive, consider increasing the temperature value.
- If the generated text is too narrow in scope and lacks diversity, consider increasing the probability threshold (p).
- If the generated text is too diverse and includes irrelevant words, consider decreasing the probability threshold (p).
Here is the guide found in the OpenAI Forum on how these parameters can be used in different scenarios, with example values:
|Code Generation||0.2||0.1||Generates code that adheres to established patterns and conventions. Output is more deterministic and focused. Useful for generating syntactically correct code.|
|Creative Writing||0.7||0.8||Generates creative and diverse text for storytelling. Output is more exploratory and less constrained by patterns.|
|Chatbot Responses||0.5||0.5||Generates conversational responses that balance coherence and diversity. Output is more natural and engaging.|
|Code Comment Generation||0.3||0.2||Generates code comments that are more likely to be concise and relevant. Output is more deterministic and adheres to conventions.|
|Data Analysis Scripting||0.2||0.1||Generates data analysis scripts that are more likely to be correct and efficient. Output is more deterministic and focused.|
|Exploratory Code Writing||0.6||0.7||Generates code that explores alternative solutions and creative approaches. Output is less constrained by established patterns.|