Key concepts

Understand the basics of large language models.

Large language models

Large, pre-trained, general-purpose language models can be thought of as advanced autocomplete engines, similar to the one on your smartphone but much more capable.
The basic idea of a pre-trained language model is as follows. The model is pre-trained, where one trains the parameters of the model using a very large amount of text to instill a general understanding of natural language. A “parameter” refers to a value the model can change independently as it learns. A larger parameter count will increase the size of the model and generally make it perform better on language tasks.
These pre-trained models can be used in "zero-shot" or "few-shot" environments where little domain-specific data is available, or another round of training can be done, called fine-tuning, where you apply the pre-trained model to a specific task and further adjust the model's parameters using a relatively small amount of labeled data.

Prompts and completions

The prompt is text that you input to the model, and the model will respond with a text completion that attempts to match whatever context or pattern you give it.
A simple prompt would be:
Write a list of science fiction books:
Once you submit the prompt, the completion would look like:
1. The Stars My Destination by Alfred Bester
2. The Left Hand of Darkness by Ursula K. Le Guin
3. The Robots of Dawn by Isaac Asimov
The above example would be referred to as a "zero-shot" prompt because the model was given no prior examples of the task in the prompt.
The actual completion you see may differ because the models are stochastic by default. This means that you might get a slightly different completion every time you call it, even if your prompt stays the same.
This text-in, text-out interface means you can "program" the model by providing instructions and just a few examples of what you'd like it to do. A models performance will generally depend on the complexity of the task and quality of your prompt. A well-written prompt provides enough information for the model to know what you want and a few examples of how it should respond.
An example of a well-written prompt would look like:
Write an icebreaker for a cold email based on the following company information. The icebreaker should be authentic and human-like, while being polite, positive, and friendly.
Company name: Perpay
Industry: Retail
Company description: Purchase the products and brands you love by making small, easy payments straight from your paycheck. No credit check, no interest.
Icebreaker: So much has been said about how difficult it is for people to build credit, and I think that your team is doing a great job of helping people get access to things they want and need without having to wait years to build up a history.
Company name: Bear Mattress
Industry: Consumer Products & Services
Company description: Bear is a leading sleep company with a mission to improve the health, wellness and overall sleep quality of every customer, every night. 120-night risk-free trial. Greenguard gold certified. Made in the usa. Lifetime warranty.
Icebreaker: I have never been a good sleeper, so I can't tell you how much I appreciate what Bear Mattress is doing to help people sleep better. Plus, I love that your company is committed to quality and making things right -- it's such an important philosophy!
Company name: Cience Technologies
Industry: Advertising & Marketing
Company description: Voted best B2B lead generation services company, Cience produces qualified sales leads engaging prospects on phone, email, social, chat, web and ads.
Icebreaker: Funny story -- I was just talking to a friend who is a marketing manager for a big company, and he was telling me how hard it is to find good lead gen companies. Will have to send him your way, I think he'll really like your platform.
Company name: Onfleet
Industry: Logistics & Transportation
Company description: Onfleet makes it easy to manage last mile deliveries. Intuitive routing, dispatching, real-time tracking, analytics, and more.
Notice that we give the model clear instructions for the task and provide a few examples. For the last example, we provide the input data (Company name, Industry, and Company description), but leave the Icebreaker value empty to prompt the model to provide the desired completion.
We also separate each example with two newlines ("\n\n") which can be used a stop sequence later. We recommend using few-shot prompts like this or fine-tuning to get optimal performance on any task.
There are three basic principles when working with large language models:
1. Tell the model what you want it to do. Make it clear what you want through clear instructions and expected behavior.
2. Show the model what you want it to do. Follow clear instructions with a few examples of ideal task performance. This will allow the model to better mimic the expected performance on your task.
3. Check the parameters. Parameters play a key role in model performance. Parameters are different settings that can be tweaked to control model behavior. We'll discuss each parameter you can use next.
Make sure to never end your prompt with a trailing whitespace (" "). It can have unintended effects on model performance.


Parameters are different settings that control the way your model provides completions. Becoming familiar with the following parameters will allow you to apply these models to virtually any natural language task.
Prompt text
The prompt to generate a completion for, as a string.
When using base models, we recommend you include instructions and a few examples of the task in your prompt.
When using fine-tuned models, we recommend you format the prompt the same way as in the dataset.
Max tokens length
Defaults to 64. Accepts an integer between 0 and 2,048.
The maximum number of tokens that will be generated in a completion.
We recommend setting length to be slightly greater than the longest generation you expect to receive. The number of tokens in your prompt + length cannot exceed 2048 tokens for most models.
Temperature temperature
Defaults to 0.5. Accepts a number between 0 and 1. We recommend altering this or top_p but not both.
The sampling temperature to use. A value of 0 makes the model deterministic, always outputting the same completion given the same prompt. A higher temperature will make lower probability tokens more likely to be generated, resulting in more random or creative outputs.
We recommend using lower temperature values for tasks like classification, entity extraction, or question answering, and use higher temperature values for tasks like content or idea generation.
Top-P top_p
Defaults to 1. Accepts a number between 0 and 1.
top_p is an alternative way of controlling the randomness of the generated text. When using top_p, make sure that temperature is set to 1.
Generally, top_p will provide better control where your model is expected to generate text with accuracy and correctness.
Top-K top_k
Defaults to 40. Accepts integer between 1 and 50,400.
top_k refers to the number of tokens that will be sampled, sorted by probability. All tokens beneath the k'th token are not sampled.
A lower value can improve quality by removing the long tail of less likely tokens and making it less likely to go off topic.
Repetition penalty repetition_penalty
Defaults to 1. Accepts a number greater than 0.
repetition_penalty works by lowering the probability of a token being generated if it has previously been generated. In other words, it works to prevent repetitive word usage.
Setting repetition_penalty to a number greater than 1 will make the model less likely to repeat itself. It's rare to ever use a value less than 1, which would encourage repeating behavior.
The recommended value for most use cases is 1. If your model repeats words or phrases frequently, a value between 1 and 1.2 will most likely fix the repeating behavior.
Stop sequences stop_sequences
No default. Accepts an array of strings.
Stop sequences are sequences that, when generated, stop the model from generating further tokens. The completion will not contain the stop sequence.
We recommend using stop sequences for most tasks, but the stop sequences you'll want to use will depend on the format of your prompt. You can assess which stop sequences to use by using the model with no stop sequences and adding common sequences that are generated immediately after you want the model to stop generating.
Take the following prompt:
Write a tagline for the following company.
Company name:
Company description:
Without stop sequences, the model returns the following completion:
Simple. Innovative. Powerful.
Company name: The University of Arizona
Company description: One of the largest and oldest public universities in the United States. Founded in 1885.
Tagline: To educate, to discover, to create.
Company name: AOL
Company description: Online
The model started the completion with the tagline as intended, but continued to follow the pattern of the prompt by writing taglines for other companies. In this case, the model stopped generating due to max_tokens = 64. Looking at the completion, there are a few stop sequences that could be used to consistently limit the model to generating a single tagline.
\n or \n\n: A newline (\n) character is a commonly used stop sequence when you only want the model to generate text on a single line. In the above example, we may want taglines to be able to contain newlines in which case the double newline (\n\n) can be used.
Company name, Company description, and Tagline: Using newlines as a stop sequence should be sufficient to consistently stop generations at the correct point. However, you could also add other common sequences that were outputted like Company name, Company description, and Tagline.
Number of completions n
Defaults to 1. Accepts an integer that is 1 or greater.
The number of completions to return for a given prompt.
n > 1 is the equivalent of sending n API requests.
Log probabilities logprobs
Defaults to null. Accepts an integer between 1 and 5.
Include the log probabilities on the logprobs most likely tokens. For example, if logprobs is 3, the API will return a list of the 3 most likely tokens.
The logprob is the log of the probability that a token comes next. logprobs are often used for tasks like classification, question axnswering, or any other task where it's helpful to understand the probability of sampled tokens. Learn more


Large language models understand and process text by breaking it down into tokens. A token can be a word or just a few characters. As a rough rule of thumb, 1 token is about 4 characters.
For example, the word "television" gets broken into the tokens "tele", "vis", "ion", while a short and common word like "map" is a single token. Oftentimes, a token will start with a whitespace, for example " cat" or " dog".
The number of tokens processed in a single API request depend on the token length of your prompt and completion. Prompt tokens will affect response speed much less than completion tokens.
Here are some helpful ways to think about tokens:
  • 1 token is about 4 characters
  • 30 tokens is about 1-2 sentences
  • 100 tokens is about a paragraph
Check out OpenAI's tokenizer tool to learn more about how text translates to tokens.
Last modified 9mo ago