Key concepts
Understand the basics of GPT models.


The prompt is how you “program” the model to achieve the response you’d like. These models can do everything from writing original stories to generating code. Because of its wide array of capabilities, you have to be explicit in showing it what you want. Telling and showing is the secret to a good prompt.
These models try to guess what you want from the prompt. If you write the prompt “Give me a list of fiction books” the model may not automatically assume you’re asking for a list of books. Instead, you could be asking the model to continue a conversation that starts with “Give me a list of fiction books” and continue to say “and I’ll tell you my favorite.”
There are three basic tips to creating prompts:
1. Show and tell
Make it clear what you want through a combination of instructions and examples. Back to our previous example, instead of:
“Give me a list of fiction books”
“Give me list of fiction books. Here’s an example list: Harry Potter, Game of Thrones, Lord of the Rings.”
2. Check your settings
The temperature and top_p parameters are what you will typically be configuring based on the task. These parameters control how deterministic the model is in generating a response. A common mistake is assuming these parameters control “creativity”. For instance, if you're looking for a response that's not obvious, then you might want to set them higher. If you're asking it for a response where there's only one right answer, then you'd want to set them lower. More on GPT-J parameters later.
3. Provide quality data
If you’re trying to classify text or get the model to follow a pattern, make sure that there are enough examples. Not only is providing sufficient examples important, but the examples should be proofread for spelling or grammatical errors. While the model is usually capable of seeing through simple errors, it may believe they are intentional.


Whitespace, or what happens when you press the Spacebar, can be a token or multiple tokens depending on its use. Make sure to never have trailing whitespace at the end of your prompt or else it can have unintended effects on the model’s response.


GPT-J understands and processes text by breaking it down into tokens. As a rough rule of thumb, 1 token is approximately 4 characters. For example, the word “television” gets broken up into the tokens “tele”, “vis” and “ion”, while a short and common word like “dog” is a single token. Tokens are important to understand because GPT-J, like other language models, have a maximum context length of 2048 tokens, or roughly 1500 words. The context length includes both the text prompt and generated response. We recommend using OpenAI's tokenizer tool to see how your text will be tokenized.
Some other helpful rules of thumb for understanding tokens, in terms of lengths:
  • 1 token ~= ¾ words, aka 100 tokens ~= 75 words
  • 1-2 sentence ~= 30 tokens
  • 1 paragraph ~= 100 tokens


Parameters are different settings that control the way in which GPT-J responds. Becoming familiar with the following parameters will allow you to apply GPT-J to a number of different tasks.

Response length

Response length is the length of the desired completion, in tokens. A token is roughly 4 characters including alphanumerics and special characters.
Note that the maximum number of tokens (prompt + completion) that can be processed in a single request is 2048 tokens.‍


Temperature controls the randomness of the generated text. A value of 0 makes the model deterministic, which means that it will always generate the same output for a given input text. A value of 1 makes the model drastically increase randomness in generated tokens.
As a frame of reference, it is common for story completion or idea generation to use temperature values between 0.7 to 0.9.‍ On the other end of the spectrum, classification and named entity recognition tasks will use temperature values between 0 and 0.2.


Top-P is an alternative way of controlling the randomness of the generated text. We recommend that only one of Temperature and Top P are used, so when using one of them, make sure that the other is set to 1.
A rough rule of thumb is that Top-P provides better control for applications in which GPT-J is expected to generate text with accuracy and correctness, while Temperature works best for those applications in which original, creative or even amusing responses are sought.


Top-K refers to the number of tokens that will be sampled, sorted by probability. All tokens beneath the k'th token sampling means sorting by probability and zero-ing out the probabilities for anything below the k'th token. A lower value improves quality by removing the tail and making it less likely to go off topic.

Repetition penalty

Repetition penalty works by lowering the chances of a word being selected again the more times that word has already been used. In other words, it works to prevent repetitive word usage. Our recommended range for most use cases is 1 - 1.5.

Stop sequences

Stop sequences allow you to define one or more sequences that when generated force the model to stop.


Achieving the performance you're looking for can be simple or take some time depending on the complexity of your task. If you're having trouble getting the API to perform as expected, follow this checklist:
  1. 1.
    Is it clear what the intended generation should be? (Could a 5th grader understand your prompt)
  2. 2.
    Are there enough examples? (In your prompt or your fine-tuning dataset)
  3. 3.
    Did you check your examples for mistakes? (Consistent formatting and proper grammar / spelling is important)
  4. 4.
    Are you using temperature, repetition penalty, top_p, and other parameters appropriately?