Search…
Fine-tuning
How to train models for your use case.

Introduction

Fine-tuning is a powerful technique to create a new model that's specific to your use case. Fine-tuning lets you get more out of your models by providing:
  1. 1.
    Higher quality results than prompt design
  2. 2.
    Ability to train on many more example than can fit in a prompt
  3. 3.
    Token savings due to shorter prompts
  4. 4.
    Lower latency requests
GPT models are pre-trained on a vast amount of text from the open internet. When given a prompt with a few examples, they can often understand what task you are trying to perform and generate a useful completion. This is called "few-shot learning".
Fine-tuning improves on few-shot learning by training on many more examples than can fit in a prompt, letting you achieve better results on a wide number of tasks. Once a model has been fine-tuned, you won't need to provide examples in the prompt anymore. This saves costs and enables lower-latency requests.
At a high level, fine-tuning involves the following steps:
  1. 1.
    Prepare training data
  2. 2.
    Train a new fine-tuned model
  3. 3.
    Use your fine-tuned model
Whereas using fine-tuned GPT-3 models increase costs by 4x, fine-tuned models on Forefront have the same cost as the equivalent base model. Fine-tuning models is currently free with the following limitations:
  1. 1.
    100MB dataset or smaller for GPT-J
  2. 2.
    10MB dataset or smaller for GPT-NeoX
Contact our team to fine-tune a model beyond these limits.

Prepare training data

Your dataset is how you teach your model what you'd like it to say.
Your data should be in JSONL format saved as UTF-8 encoded text file, where each line is a prompt-completion pair representing an ideal example of your task. We recommend fine-tuning each model for a single, specific task.
1
{"prompt": "<prompt text>", "completion": " <ideal generation><|endoftext|>"}
2
{"prompt": "<prompt text>", "completion": " <ideal generation><|endoftext|>"}
3
{"prompt": "<prompt text>", "completion": " <ideal generation><|endoftext|>"}
4
...
Copied!
Each training example should consist of a single input example and its associated output.
1
{"prompt": "Topic: Why is Wellness Important for College Students?\nOutline:", "completion": " 1. There are many ways to answer this question: Why is wellbeing important in schools?\n2. Traditionally schools have focused on students\u2019 current needs or short-term victories.\n3. Fostering and teaching wellbeing is a way to show students and educators we care about them and want to support them by enabling them with short- and long-term benefits.\n4. But why does wellbeing need to be specifically prioritized in schools?\n5. Gratitude, hope, and emotional regulation, have been found to improve academic performance across several areas.\n6. Students are not the only ones who benefit from wellbeing on the agenda.\n7. We have the opportunity to formally teach our students to maximize their potential, and prioritizing their wellbeing is a good place to start.<|endoftext|>"}
2
{"prompt": "Topic: How Fast Can You Lose 20 Pounds\nOutline:", "completion": " 1. Losing 20 pounds can make a serious difference in your health.\n2. Safe weight loss comes at a rate of up to 1 to 2 pounds per week.\n3. Taking it slow will help you stay slim in the long run.\n4. Don't worry if your weight fluctuates a little.\n5. It's normal to go through a \"plateau\".\n6. The 10 to 40 weeks it will likely take to lose 20 pounds are just the beginning of your weight-loss journey.\n7. Staying active also keeps you slim, as well as boosts your mood, promotes good cardiovascular health and lowers your risk of osteoporosis.<|endoftext|>"}
3
{"prompt": "Topic: Why Modern Fashion is Bad\nOutline:", "completion": " 1. For several years the fashion industry has been under fire for the problems it creates.\n2. Production of our clothes happens most of the time in developing countries. Wages are low, material costs less and there are fewer regulations.\n3. After clothes are produced and shipped to stores and warehouses in developed countries they need to be sold.\n4. Before collections were created once a year, then 4 times a year and nowadays 7 times a year.\n5. These new collections mean we replace our clothes regularly for new ones that look more trendy.\n6. Reusing clothes is a great option to increase their lifespan\n7. The harsh reality is that very little of our clothes get recycled, mainly because they are all made from different materials.<|endoftext|>"}
4
...
Copied!
Instructions can be included in the prompt for better performance on complex tasks or to enhance a small dataset.
1
{"prompt": "Write an outline for a blog post based on the following topic.\nTopic: Why is Wellness Important for College Students?\nOutline:", "completion": " 1. There are many ways to answer this question: Why is wellbeing important in schools?\n2. Traditionally schools have focused on students\u2019 current needs or short-term victories.\n3. Fostering and teaching wellbeing is a way to show students and educators we care about them and want to support them by enabling them with short- and long-term benefits.\n4. But why does wellbeing need to be specifically prioritized in schools?\n5. Gratitude, hope, and emotional regulation, have been found to improve academic performance across several areas.\n6. Students are not the only ones who benefit from wellbeing on the agenda.\n7. We have the opportunity to formally teach our students to maximize their potential, and prioritizing their wellbeing is a good place to start.<|endoftext|>"}
2
{"prompt": "Write an outline for a blog post based on the following topic.\nTopic: How Fast Can You Lose 20 Pounds\nOutline:", "completion": " 1. Losing 20 pounds can make a serious difference in your health.\n2. Safe weight loss comes at a rate of up to 1 to 2 pounds per week.\n3. Taking it slow will help you stay slim in the long run.\n4. Don't worry if your weight fluctuates a little.\n5. It's normal to go through a \"plateau\".\n6. The 10 to 40 weeks it will likely take to lose 20 pounds are just the beginning of your weight-loss journey.\n7. Staying active also keeps you slim, as well as boosts your mood, promotes good cardiovascular health and lowers your risk of osteoporosis.<|endoftext|>"}
3
{"prompt": "Write an outline for a blog post based on the following topic.\nTopic: Why Modern Fashion is Bad\nOutline:", "completion": " 1. For several years the fashion industry has been under fire for the problems it creates.\n2. Production of our clothes happens most of the time in developing countries. Wages are low, material costs less and there are fewer regulations.\n3. After clothes are produced and shipped to stores and warehouses in developed countries they need to be sold.\n4. Before collections were created once a year, then 4 times a year and nowadays 7 times a year.\n5. These new collections mean we replace our clothes regularly for new ones that look more trendy.\n6. Reusing clothes is a great option to increase their lifespan\n7. The harsh reality is that very little of our clothes get recycled, mainly because they are all made from different materials.<|endoftext|>"}
4
...
Copied!
Make sure to:
  1. 1.
    Never end prompts with a trailing whitespace (" ").
  2. 2.
    Always start completions with a whitespace (" "). Starting completions with a whitespace produces better results due to the tokenization of the models.
  3. 3.
    Always end completions with <|endoftext|> which acts as a native stop sequence, allowing the model to learn when to stop completions.
  4. 4.
    In almost all cases, it is recommended to end prompts with a fixed separator to inform the model when the prompt ends and the completion begins. The best separator typically depends on the use case and prompt structure. In the above example, the separator is \nOutline:. The separator should not appear elsewhere in any prompt.
Here are a few scripts that may be helpful as you start fine-tuning:
Script to validate a JSONL dataset, find errors, and split into training and test sets.
Script to convert CSV data to a JSONL dataset.
Two things will affect fine-tuned model performance the most:
  1. 1.
    Dataset quality
  2. 2.
    Dataset size
First, and most importantly, your dataset should only consist of examples showing ideal task performance. Bad examples or noisy, inconsistent data will result in poor performance no matter how many examples you have. Assuming each training example is of desired quality, then the more data you have, the better.
Fine-tuning can solve a variety of problems, and the optimal way to approach it will depend on your specific use case. Below, we've listed the most common use cases for fine-tuning and corresponding guidelines and examples.

Text classification

In classification problems, each input in the prompt should be classified into one of the predefined classes. For this type of problem, we recommend:
  1. 1.
    At least ~25 examples per class
  2. 2.
    Use a separator at the end of the prompt. In the below cases, various separators are used depending on the task and prompt. Remember to include this separator in the prompt when you eventually make requests to your model.
  3. 3.
    Always end completions with <|endoftext|> which acts as a native stop sequence, allowing the model to learn when to stop completions.
  4. 4.
    Ensure that the dataset used for fine-tuning is very similar in structure and type of task as what the model will be used for.
  5. 5.
    Ensure that prompt-completion pairs don't exceed 2048 tokens.
  6. 6.
    Choose classes that map to just a few tokens. At inference time, specify the length parameter to equal the max number of tokens required. We recommend using OpenAI's tokenizer tool to see how your classes will be tokenized.

Binary classification

Binary classification is the task of classifying elements into one of two classes. Given GPT models can generate incorrect or false statements, a useful application of this task is to train a model to ensure another model isn't making things up. Specifically, we'll prepare a dataset to train a model to check blog ideas generated by another model to ensure they align with the given company and product description.
The dataset would look like the following:
1
{"prompt": "Company Name: Nike\nProduct description: Nike Air Jordan is an American brand of basketball shoes, athletic, casual, and style clothing produced by Nike. Founded in Chicago, Air Jordan was created for Hall of Fame former basketball player Michael Jordan during his time with the Chicago Bulls.\nBlog idea: Jordan Brand History - A blog about the history of the brand.\nApprove:", "completion": " Yes<|endoftext|>"}
2
{"prompt": "Company Name: Italic\nProduct description: Luxury goods straight from the same manufacturers as your favorite brands. No logos, no markups.\nBlog idea: Are You running an iPhone 12? Here\u2019s How to Back Up Your Photos - A blog about backing up photos from your phone.\nApprove:", "completion": " No<|endoftext|>"}
3
...
Copied!
In the dataset above, we used a structured input containing the name of the company, the product description, and the associated blog idea. As a separator we used \nApprove: which clearly separates the prompt from the completion.
For this use case we fine-tuned a GPT-J model since it will be faster and cheaper, and the performance will be comparable to larger models because it is a classification task.
After fine-tuning, we can query our model by making a Completion request:
1
import requests
2
3
headers = {"Authorization": "Bearer <INSERT_YOUR_TOKEN>"}
4
5
body = {
6
"text": "Company Name: Italic\nProduct description: Luxury goods straight from the same manufacturers as your favorite brands. No logos, no markups.\nBlog idea: Are You running an iPhone 12? Here\u2019s How to Back Up Your Photos - A blog about backing up photos from your phone.\nApprove:",
7
"top_p": 1,
8
"top_k": 40,
9
# Decrease randomness in completion to be deterministic
10
"temperature": 0.01,
11
# Set to the token length of the longest class
12
"length": 1,
13
# Stop before newline is generated
14
"stop_sequences": ["\n"]
15
}
16
17
res = requests.post(
18
"INSERT_MODEL_URL",
19
json=body,
20
headers=headers
21
)
Copied!
Which will return:
1
{
2
"result": [
3
{
4
"completion": "No"
5
}
6
],
7
"timestamp": 1633379060,
8
"model": "gpt-j"
9
}
Copied!

Single-label classification

Single-label classification is the task of classifying elements into one of many classes. An example of this would be classifying articles into a single category.
The dataset would look like the following:
1
{"prompt": "Classify the category of the following article.\nArticle: Beaten by his archrival, felled by a dagger of a 3-pointer in the final minute, Mike Krzyzewski unfolded his arms, got up off his courtside stool and strode calmly into retirement.\n\nHe displayed no emotion when he shook the hand of North Carolina head coach Hubert Davis. Nor when Duke fans in the stands applauded reverentially and shouted “We love you, coach!” as he exited the floor.\n\nWhen Krzyzewski found his wife Mickie waiting for him in front of the tunnel that led to Duke’s locker room, he wrapped her in a hug and told her, “It’s OK.” He then walked hand-in-hand with her down the tunnel, detouring only to console freshman guard Trevor Keels, who was leaning against a wall with tears rolling down his cheeks.\n\nFor months, Duke has strived to give Krzyzewski the ultimate retirement gift, to let him leave men's college basketball with a sixth national title. “That’s been the motivation since the beginning,” freshman Paolo Banchero explained last week. “To send him out on top.”\n\nOn Saturday night, when an 81-77 loss to North Carolina brought Krzyzewski’s farewell tour to a thrilling and sudden close, the Blue Devils were heartbroken they had fallen two wins shy of that goal. That it came against the hated Tar Heels, the same team that stunned Duke in Krzyzewski’s final home game last month, only made the outcome more excruciating.\nCategory:", "completion": " Sports<|endoftext|>"}
2
{"prompt": "Classify the category of the following article.\nArticle: In kindergarten, I bet your teacher used some combination of blocks, M&M's and maybe blueberry-scented stickers to drill the importance of arithmetic into your nascent brain. The goal was to pave a future in which you could add, subtract and multiply at will -- ready for an adulthood filled with Excel spreadsheets and taxes, of course.\n\nWell, scientists just put a few underwater creatures -- cichlids and stingrays -- through a very similar experience, and to their surprise, the fish passed with flying colors. \n\nIn a paper published Thursday in the journal Scientific Reports, the research team says these animals exhibited the ability to learn basic math, and even use this knowledge to solve little logic puzzles they were given.\n\n\"We trained the animals to perform simple additions and subtractions,\" said Vera Schluessel, of the Institute of Zoology at the University of Bonn and lead author of the study. \"In doing so, they had to increase or decrease an initial value by one.\"\n\nWhat could stingrays and cichlids possibly need to do with these numerical skills? On the one hand, the team's bony cichlid subjects are vibrant aquarium favorites, mostly worried about their seaborne nests to the point where they can get a little aggressive and territorial. On the other, cartilaginous stingrays calmly rest on the ocean floor, living a rather solitary lifestyle and threatening prey from afar.\n\nThough the researchers aren't quite sure of the answer to this, they emphasize that these findings nonetheless add to a compounding pool of evidence that fish are much smarter than we think, and deserve much more respect than we offer.\nCategory:", "completion": " Science<|endoftext|>"}
3
...
Copied!
In the dataset above, we used a structured input containing instructions and the article with a separator of \nCategory: which clearly separates the prompt from the completion.
Another example would be to classify messages by emotion. The dataset would look like the following:
1
{"prompt": "Message: I wanted to downvote this, but it's not your fault homie.\nEmotion:", "completion": " Disappointed<|endoftext|>"}
2
{"prompt": "Message: BUT IT'S HER TURN! /s\nEmotion:", "completion": " Excited<|endoftext|>"}
3
{"prompt": "Message: That is odd.\nEmotion:", "completion": " Concerned<|endoftext|>"}
4
...
Copied!

Multi-label classification

Multi-label classification is the task of classifying elements into as many applicable classes as necessary. An example of this would be labelling emails based on the content.
The dataset would look like the following:
1
{"prompt": "The following email is between a salesperson and their client. Your task is to classify subjects mentioned in the email.\nEmail: Hi there, I hope you're doing well! I'm looking for a new job and I came across your company. It looks like a great opportunity and I was wondering if you were hiring. Let me know, thanks!\nSubjects:", "completion": " Job search, Company"}
2
{"prompt": "The following email is between a salesperson and their client. Your task is to classify subjects mentioned in the email.\nEmail: Hi, I hope you're doing well. Wanted to let you know that I'm going to be out of town this weekend and won't be able to make it to the meeting on Monday. Let me know if there's anything I can do to help remotely. All the best, Jane\nSubjects:", "completion": " Out of town, Meeting"}
3
...
Copied!
In the dataset above, we used a structured input containing instructions and the email with a separator of \nSubjects: which clearly separates the prompt from the completion.

Conditional text generation

Conditional generation is a task where content needs to be generated given some kind of input. This includes paraphrasing, summarizing, entity extraction, content generation, question answering and many others. For this type of problem we recommend:
  1. 1.
    At least 200 examples.
  2. 2.
    Use a separator at the end of the prompt. In the below cases, various separators are used depending on the task and prompt. Remember to include this separator at the end of prompts when you eventually make requests to your model.
  3. 3.
    Always end completions with <|endoftext|> which acts as a native stop sequence, allowing the model to learn when to stop completions.
  4. 4.
    Ensure that prompt-completion pairs don't exceed 2048 tokens.
  5. 5.
    Ensure that the dataset used for fine-tuning is very similar in structure and type of task as what the model will be used for.
  6. 6.
    At inference time, specify the length parameter to slightly exceed your longest generation in the dataset. Read more about tokens.

Question answering

Question answering is conditional generation in its simplest sense: generate an answer based on the provided question.
The dataset would look like the following:
1
{"prompt": "Answer the following question based on the context.\nContext: A generation later, the Irish Anglican bishop, George Berkeley (1685\u20131753), determined that Locke's view immediately opened a door that would lead to eventual atheism. In response to Locke, he put forth in his Treatise Concerning the Principles of Human Knowledge (1710) an important challenge to empiricism in which things only exist either as a result of their being perceived, or by virtue of the fact that they are an entity doing the perceiving. (For Berkeley, God fills in for humans by doing the perceiving whenever humans are not around to do it.) In his text Alciphron, Berkeley maintained that any order humans may see in nature is the language or handwriting of God. Berkeley's approach to empiricism would later come to be called subjective idealism.\nQuestion: What work of writing is mentioned last?\nAnswer:", "completion": " Alciphron<|endoftext|>"}
2
{"prompt": "Answer the following question based on the context.\nContext: Hume maintained that all knowledge, even the most basic beliefs about the natural world, cannot be conclusively established by reason. Rather, he maintained, our beliefs are more a result of accumulated habits, developed in response to accumulated sense experiences. Among his many arguments Hume also added another important slant to the debate about scientific method \u2014 that of the problem of induction. Hume argued that it requires inductive reasoning to arrive at the premises for the principle of inductive reasoning, and therefore the justification for inductive reasoning is a circular argument. Among Hume's conclusions regarding the problem of induction is that there is no certainty that the future will resemble the past. Thus, as a simple instance posed by Hume, we cannot know with certainty by inductive reasoning that the sun will continue to rise in the East, but instead come to expect it to do so because it has repeatedly done so in the past.\nQuestion: What is the most important source of energy for life on Earth?\nAnswer:", "completion": " the sun<|endoftext|>"}
3
{"prompt": "Answer the following question based on the context.\nContext: Philosophical empiricists hold no knowledge to be properly inferred or deduced unless it is derived from one's sense-based experience. This view is commonly contrasted with rationalism, which states that knowledge may be derived from reason independently of the senses. For example, John Locke held that some knowledge (e.g. knowledge of God's existence) could be arrived at through intuition and reasoning alone. Similarly Robert Boyle, a prominent advocate of the experimental method, held that we have innate ideas. The main continental rationalists (Descartes, Spinoza, and Leibniz) were also advocates of the empirical \"scientific method\".\nQuestion: What would one not be using when holding a belief in the afterlife, for example?\nAnswer:", "completion": " senses<|endoftext|>"}
4
...
Copied!
To best generalize question answering models, we recommend including relevant information with the query in the prompt so the model learns to answer questions based on provided information, instead of implicitly knowing the answer. We've found that the model is more accurate and truthful when provided relevant information to help answer the question. See our question answering guide to read more about retrieving relevant information based on the query.
To take the previous dataset further you could add include multi-turn examples with multiple question-answer exchanges in the completion.
1
{"prompt": "Answer questions based on the following story.\nStory: The Vatican Apostolic Library (), more commonly called the Vatican Library or simply the Vat, is the library of the Holy See, located in Vatican City. Formally established in 1475, although it is much older, it is one of the oldest libraries in the world and contains one of the most significant collections of historical texts. It has 75,000 codices from throughout history, as well as 1.1 million printed books, which include some 8,500 incunabula. \n\nThe Vatican Library is a research library for history, law, philosophy, science and theology. The Vatican Library is open to anyone who can document their qualifications and research needs. Photocopies for private study of pages from books published between 1801 and 1990 can be requested in person or by mail. \n\nIn March 2014, the Vatican Library began an initial four-year project of digitising its collection of manuscripts, to be made available online. \n\nThe Vatican Secret Archives were separated from the library at the beginning of the 17th century; they contain another 150,000 items. \n\nScholars have traditionally divided the history of the library into five periods, Pre-Lateran, Lateran, Avignon, Pre-Vatican and Vatican. \n\nThe Pre-Lateran period, comprising the initial days of the library, dated from the earliest days of the Church. Only a handful of volumes survive from this period, though some are very significant.\nQuestion: When was the Vat formally opened?\nAnswer:", "completion": " It was formally established in 1475\nQuestion: what is the library for?\nAnswer: research\nQuestion: for what subjects?\nAnswer: history, and law\nQuestion: and?\nAnswer: philosophy, science and theology\nQuestion: what was started in 2014?\nAnswer: a project\nQuestion: how do scholars divide the library?\nAnswer: into periods\nQuestion: how many?\nAnswer: five\nQuestion: what is the official name of the Vat?\nAnswer: The Vatican Apostolic Library\nQuestion: where is it?\nAnswer: in Vatican City\nQuestion: how many printed books does it contain?\nAnswer: 1.1 million\nQuestion: when were the Secret Archives moved from the rest of the library?\nAnswer: at the beginning of the 17th century;\nQuestion: how many items are in this secret collection?\nAnswer: 150,000\nQuestion: Can anyone use this library?\nAnswer: anyone who can document their qualifications and research needs.\nQuestion: what must be requested to view?\nAnswer: unknown\nQuestion: what must be requested in person or by mail?\nAnswer: Photocopies\nQuestion: of what books?\nAnswer: only books published between 1801 and 1990\nQuestion: What is the Vat the library of?\nAnswer: the Holy See\nQuestion: How many books survived the Pre Lateran period?\nAnswer: a handful of volumes\nQuestion: what is the point of the project started in 2014?\nAnswer: digitising manuscripts\nQuestion:what will this allow?\nAnswer:\nthem to be viewed online.<|endoftext|>"}
2
{"prompt": "Answer questions based on the following story.\nStory: New York (CNN) -- More than 80 Michael Jackson collectibles -- including the late pop star's famous rhinestone-studded glove from a 1983 performance -- were auctioned off Saturday, reaping a total $2 million. \n\nProfits from the auction at the Hard Rock Cafe in New York's Times Square crushed pre-sale expectations of only $120,000 in sales. \n\nThe highly prized memorabilia, which included items spanning the many stages of Jackson's career, came from more than 30 fans, associates and family members, who contacted Julien's Auctions to sell their gifts and mementos of the singer. \n\nJackson's flashy glove was the big-ticket item of the night, fetching $420,000 from a buyer in Hong Kong, China. Jackson wore the glove at a 1983 performance during \"Motown 25,\" an NBC special where he debuted his revolutionary moonwalk. \n\nFellow Motown star Walter \"Clyde\" Orange of the Commodores, who also performed in the special 26 years ago, said he asked for Jackson's autograph at the time, but Jackson gave him the glove instead. \n\n\"The legacy that [Jackson] left behind is bigger than life for me,\" Orange said. \"I hope that through that glove people can see what he was trying to say in his music and what he said in his music.\" \n\nOrange said he plans to give a portion of the proceeds to charity. \n\nHoffman Ma, who bought the glove on behalf of Ponte 16 Resort in Macau, paid a 25 percent buyer's premium, which was tacked onto all final sales over $50,000. Winners of items less than $50,000 paid a 20 percent premium. \nQuestion: Where was the Auction held?\nAnswer:", "completion": " Hard Rock Cafe\nQuestion: How much did they make?\nAnswer: $2 million.\nQuestion: How much did they expected?\nAnswer: $120,000\nQuestion: WHo buy the Jackson Glove\nAnswer: Hoffman Ma\nQuestion:Where was the buyer of the glove from?\nAnswer:\nMacau<|endoftext|>"}
3
{"prompt": "Answer questions based on the following story.\nStory: CHAPTER VII. THE DAUGHTER OF WITHERSTEEN \n\n\"Lassiter, will you be my rider?\" Jane had asked him. \n\n\"I reckon so,\" he had replied. \n\nFew as the words were, Jane knew how infinitely much they implied. She wanted him to take charge of her cattle and horse and ranges, and save them if that were possible. Yet, though she could not have spoken aloud all she meant, she was perfectly honest with herself. Whatever the price to be paid, she must keep Lassiter close to her; she must shield from him the man who had led Milly Erne to Cottonwoods. In her fear she so controlled her mind that she did not whisper this Mormon's name to her own soul, she did not even think it. Besides, beyond this thing she regarded as a sacred obligation thrust upon her, was the need of a helper, of a friend, of a champion in this critical time. If she could rule this gun-man, as Venters had called him, if she could even keep him from shedding blood, what strategy to play his flame and his presence against the game of oppression her churchmen were waging against her? Never would she forget the effect on Tull and his men when Venters shouted Lassiter's name. If she could not wholly control Lassiter, then what she could do might put off the fatal day. \n\nOne of her safe racers was a dark bay, and she called him Bells because of the way he struck his iron shoes on the stones. When Jerd led out this slender, beautifully built horse Lassiter suddenly became all eyes. A rider's love of a thoroughbred shone in them. Round and round Bells he walked, plainly weakening all the time in his determination not to take one of Jane's favorite racers. \nQuestion: What did Venters call Lassiter?\nAnswer:", "completion": " gun-man\nQuestion: Who asked Lassiter to be their rider?\nAnswer: Jane\nQuestion: Did he agree?\nAnswer: Yes\nQuestion: Why did she ask him?\nAnswer: to take charge of her cattle and horse and ranges, and save them\nQuestion: Did she tell him as much?\nAnswer: No\nQuestion: What was she willing to give up?\nAnswer: Whatever the price to be paid\nQuestion: Where was Milly led to?\nAnswer: Cottonwoods\nQuestion: Who took her there?\nAnswer: A man\nQuestion: Whose name would Jane not speak?\nAnswer: this Mormon's name\nQuestion: Did she allow herself to even think it?\nAnswer: No\nQuestion: What was Jane hoping Lassiter would become to her?\nAnswer: a helper, of a friend, of a champion\nQuestion: Who was oppressing her?\nAnswer: her churchmen\nQuestion: What was she hoping she could keep from happening to him?\nAnswer: shedding blood\nQuestion: Who had shouted Lassiter's name?\nAnswer: Venters\nQuestion: Who did that affect?\nAnswer: Tull and his men\nQuestion: Did Jane think she could control Lassiter?\nAnswer: No\nQuestion: Who is Bells?\nAnswer: One of her safe racers\nQuestion: How did he get his name?\nAnswer: because of the way he struck his iron shoes on the stones.\nQuestion: Was Lassiter impressed with the horse?\nAnswer: Yes\nQuestion:Did he want to take him for himself?\nAnswer:\nYes<|endoftext|>"}
4
...
Copied!
Additional context can also be added. In the following example, a snippet of relevant rules are included with a scenario so the model can learn to logically answer the question based on both pieces of information.
1
{"prompt": "Answer questions based on the following snippet and scenario.\nSnippet: # Bankruptcy\n\nOther types of bankruptcy include Chapters 9, 12 and 15. Cases under these chapters of the bankruptcy code involve municipalities, family farmers and fisherman, and international cases.\nScenario: I am a doctor. It does not involve a municipality. It involves an international case.\nQuestion: Can I file bankruptcy under Chapter 9, 12 or 15?\nAnswer:", "completion": " Yes<|endoftext|>"}
2
{"prompt": "Answer questions based on the following snippet and scenario.\nSnippet: ### Parenting Payment\n\n* have income under the limits\n* meet principal carer rules and care for a child under 8 if you\u2019re single or under 6 if you\u2019re partnered\n* meet residence rules\n* can\u2019t claim before the birth of a child\nScenario: \nQuestion: Am I eligible for the Parenting Payment?\nAnswer:", "completion": " No<|endoftext|>"}
3
{"prompt": "Answer questions based on the following snippet and scenario.\nSnippet: # 2. Eligibility\n\nYou must have:\n\n* an unconditional offer of a place on a course with a licensed Tier 4 sponsor\n* enough money to support yourself and pay for your course - the amount will vary depending on your circumstances\nScenario: I have an unconditional offer for a course with a licensed tier 4 sponsor who is not a panel of experts\nQuestion: Am I eligible for a Tier 4 (General) student visa?\nAnswer:", "completion": " No<|endoftext|>"}
4
...
Copied!

Generate blog ideas

This is a generative use case so you want to ensure that the samples you provide are high quality, as the fine-tuned model will try to imitate the style (or mistakes) of the given examples.The dataset would look like this:
1
{"prompt": "Company Name: Nike\nProduct description: Nike Air Jordan is an American brand of basketball shoes, athletic, casual, and style clothing produced by Nike. Founded in Chicago, Air Jordan was created for Hall of Fame former basketball player Michael Jordan during his time with the Chicago Bulls.\nBlog idea:", "completion": " Jordan Brand History - A blog about the history of the brand.<|endoftext|>"}
2
{"prompt": "Company Name: Apple\nProduct description: The iPhone is a smartphone made by Apple that combines a computer, iPod, digital camera and cellular phone into one device with a touchscreen interface. The iPhone runs the iOS operating system, and in 2020 when the iPhone 12 was introduced, it offered up to 256 GB of storage and a 12-megapixel camera.\nBlog idea:", "completion": " Are You running an iPhone 12? Here\u2019s How to Back Up Your Photos - A blog about backing up photos from your phone.<|endoftext|>"}
3
{"prompt": "Company Name: Zevo Insect\nProduct description: Nobody likes having insects around the house, and the old fly swatter doesn\u2019t really make much of difference with these persistent pests. Zevo is a plug and decimate solution. The multi-spectrum light technology attracts the critters and traps them on a sticky film backing. No chemicals. No gimmicks. And no need to go anywhere near the insects.\nBlog idea:", "completion": " How to Make Fruit Flies Disappear: A blog all about the Zevo and how it can make your life better.<|endoftext|>"}
4
...
Copied!
Here we label the relevant input data, separate each prompt and completion with \nBlog idea: and end our completions with <|endoftext|>, to ensure that the model knows when the completion should finish.
After fine-tuning, we can query our model by making a Completion request:
1
import requests
2
3
headers = {"Authorization": "Bearer <INSERT_YOUR_TOKEN>"}
4
5
body = {
6
"text": "Company Name: Nike\nProduct description: Nike Air Jordan is an American brand of basketball shoes, athletic, casual, and style clothing produced by Nike. Founded in Chicago, Air Jordan was created for Hall of Fame former basketball player Michael Jordan during his time with the Chicago Bulls.\nBlog idea:"
7
"top_p": 1,
8
"top_k": 40,
9
# Increase randomness for more "creative" generations
10
"temperature": 0.7,
11
# Set to the token length of the longest blog idea (~2 sentences)
12
"length": 32,
13
# Stop before newline is generated
14
"stop_sequences": ["\n"]
15
}
16
17
res = requests.post(
18
"INSERT_MODEL_URL",
19
json=body,
20
headers=headers
21
)
Copied!
Which will return:
1
{
2
"result": [
3
{
4
"completion": " Jordan Brand History - A blog about the history of the brand."
5
}
6
],
7
"timestamp": 1633379060,
8
"model": "gpt-j"
9
}
Copied!

Chatbot

The simplest approach to implementing a chatbot with GPT models is to fine-tune a model on existing chat threads. For the following examples, we will fine-tune a model on a generic conversation. This example could easily be substituted with customer support threads for any company to automate or enable their customer support personnel by fine-tuning a model to automatically suggest responses. For this use case, a single chat thread can generate multiple examples in the dataset, starting with the first user's utterance, and using every response to the user as a completion.
1
{"prompt": "User: Hello Mrs. Parker, how have you been?\nBot:", "completion": " Hello Dr. Peters. Just fine thank you. Ricky and I are here for his vaccines.<|endoftext|>"}
2
{"prompt": "User: Hello Mrs. Parker, how have you been?\nBot: Hello Dr. Peters. Just fine thank you. Ricky and I are here for his vaccines.\nUser: Very well. Let's see, according to his vaccination record, Ricky has received his Polio, Tetanus and Hepatitis B shots. He is 14 months old, so he is due for Hepatitis A, Chickenpox and Measles shots.\nBot:", "completion": " What about Rubella and Mumps?<|endoftext|>"}
3
{"prompt": "User: Hello Mrs. Parker, how have you been?\nBot: Hello Dr. Peters. Just fine thank you. Ricky and I are here for his vaccines.\nUser: Very well. Let's see, according to his vaccination record, Ricky has received his Polio, Tetanus and Hepatitis B shots. He is 14 months old, so he is due for Hepatitis A, Chickenpox and Measles shots.\nBot: What about Rubella and Mumps?\nUser: Well, I can only give him these for now, and after a couple of weeks I can administer the rest.\nBot:", "completion": " OK, great. Doctor, I think I also may need a Tetanus booster. Last time I got it was maybe fifteen years ago!<|endoftext|>"}
4
...
Copied!
This could be improved by fine-tuning a model for each support agent, solely on their chat threads. This way each model would be much better at mimicking each agent's unique voice and personality.
A chatbot use case could also require relevant context about the conversation (order details, customer information, etc.), summary of the conversation so far as well as most recent messages. Adding complexity to any use case will require significantly more training examples, as it means the model is likely to deal with different types of requests, and in this case, customer issues. To ensure the performance is up to par we recommend vetting chat examples to ensure the quality of message exchanges. The summary can be generated with a separate fine-tuned model (discussed more in depth here). The dataset could look as follows:
1
{"prompt": "Summary: <summary of the chat so far>\nCustomer plan: Premium\nUsed products: Email lookup, Pipeline\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent:", "completion":" <response2><|endoftext|>"}
2
{"prompt": "Summary: <summary of the chat so far>\nCustomer plan: Basic\nUsed products: CRM, LinkedIn plug-in\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent:", "completion":" <response2><|endoftext|>"}
3
...
Copied!

Chat summarization

Following on the previous example, we mention how a chat summary could be generated with a separate fine-tuned model to enrich the chatbot.
The dataset for a model capable of summarizing chats would look like the following:
1
{"prompt": "Generate a brief summary of the following chat thread.\nPerson1: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\nPerson2: I found it would be a good idea to get a check-up.\nPerson1: Yes, well, you haven't had one for 5 years. You should have one every year.\nPerson2: I know. I figure as long as there is nothing wrong, why go see the doctor?\nPerson1: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\nPerson2: Ok.\nPerson1: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\nPerson2: Yes.\nPerson1: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\nPerson2: I've tried hundreds of times, but I just can't seem to kick the habit.\nPerson1: Well, we have classes and some medications that might help. I'll give you more information before you leave.\nPerson2: Ok, thanks doctor.\nSummary:", "completion": " Mr. Smith's getting a check-up, and Doctor Hawkins advises him to have one every year. Hawkins'll give some information about their classes and medications to help Mr. Smith quit smoking.<|endoftext|>"}
2
{"prompt": "Generate a brief summary of the following chat thread.\nPerson1: Hello Mrs. Parker, how have you been?\nPerson2: Hello Dr. Peters. Just fine thank you. Ricky and I are here for his vaccines.\nPerson1: Very well. Let's see, according to his vaccination record, Ricky has received his Polio, Tetanus and Hepatitis B shots. He is 14 months old, so he is due for Hepatitis A, Chickenpox and Measles shots.\nPerson2: What about Rubella and Mumps?\nPerson1: Well, I can only give him these for now, and after a couple of weeks I can administer the rest.\nPerson2: OK, great. Doctor, I think I also may need a Tetanus booster. Last time I got it was maybe fifteen years ago!\nPerson1: We will check our records and I'll have the nurse administer and the booster as well. Now, please hold Ricky's arm tight, this may sting a little.\nSummary:", "completion": " Mrs Parker takes Ricky for his vaccines. Dr. Peters checks the record and then gives Ricky a vaccine.<|endoftext|>"}
3
{"prompt": "Generate a brief summary of the following chat thread.\nPerson1: Excuse me, did you see a set of keys?\nPerson2: What kind of keys?\nPerson1: Five keys and a small foot ornament.\nPerson2: What a shame! I didn't see them.\nPerson1: Well, can you help me look for it? That's my first time here.\nPerson2: Sure. It's my pleasure. I'd like to help you look for the missing keys.\nPerson1: It's very kind of you.\nPerson2: It's not a big deal.Hey, I found them.\nPerson1: Oh, thank God! I don't know how to thank you, guys.\nPerson2: You're welcome.\nSummary:", "completion": " Person1's looking for a set of keys and asks for Person2's help to find them.<|endoftext|>"}
4
...
Copied!
Ensure that the chat threads used for fine-tuning are very similar in structure to what the model will be used for.

Named entity recognition

Named entity recognition (NER), or entity extraction, is a task to identify named entities in text and classify them into predefined categories like names, locations, dates, etc. To improve performance, it is best to sort extracted entities in the same order as they appear in the text. This will help the model keep track of all the entities which need to be generated in order.
A dataset to extract locations from text would look like the following:
1
{"prompt": "List the locations mentioned in the following text.\nText: Thousands of demonstrators have marched through London to protest the war in Iraq and demand the withdrawal of British troops from that country.\nLocations:", "completion": " London, Iraq<|endoftext|>"}
2
{"prompt": "List the locations mentioned in the following text.\nText: Families of soldiers killed in the conflict joined the protesters who carried banners with such slogans as \" Bush Number One Terrorist \" and \" Stop the Bombings.\nLocations:", "completion": " No locations mentioned<|endoftext|>"}
3
{"prompt": "List the locations mentioned in the following text.\nText: \" They marched from the Houses of Parliament to a rally in Hyde Park.\nLocations:", "completion": " Hyde Park<|endoftext|>"}
4
...
Copied!
Many use cases require extracting more than one type of entity (e.g. name, address, order ID, etc). In such cases, it is recommended to fine-tune a model for each entity (single-entity extraction) for optimal performance. Alternative approaches include fine-tuning a single model to extract all entities (multi-entity extraction) or use different few-shot prompts for each entity with a base model.

Open-ended text generation

The goal of open-ended generation is to create coherent text that is a continuation from a given input. This includes open-ended chat, story generation, domain experts, content generation, among many others. For this type of problem we recommend a different dataset format than previously suggested:
  1. 1.
    At least 500 examples.
  2. 2.
    Leave the prompt empty.
  3. 3.
    No prompt, no need for separators.
  4. 4.
    Always end completions with <|endoftext|>.
  5. 5.
    Ensure that the completion example doesn't exceed 2048 tokens.
  6. 6.
    Ensure the examples cover the intended task, domain, and tone of voice.

Expert on a specific domain

As previously noted, GPT models are pre-trained on vast amounts of text like Wikipedia, books, papers, etc. However, this doesn't guarantee a given model will understand your domain well, and this is especially true for technical fields with common jargon like law, medicine, engineering, among others.
In this case you may want to specialize a model in your domain. To do this for medicine, you would format a large amount of relevant high quality text data, such as clinical data, medical papers, etc into a dataset. This will create a model which will be a lot better at performing tasks in the medical domain.
The dataset would look like the following:
1
{"prompt": "", "completion": " Pectus excavatum is the most common chest wall deformity. There is still controversy about cardiopulmonary limitations of this disease and benefits of surgical repair. This study evaluates the impact of pectus excavatum on the cardiopulmonary function of adult patients before and after a modified minimally invasive repair.\nIn this retrospective cohort study, an electronic database was used to identify consecutive adult (aged ≥18 years) patients who underwent cardiopulmonary exercise testing before and after primary pectus excavatum repair at Mayo Clinic Arizona from 2011 to 2020. In total, 392 patients underwent preoperative cardiopulmonary exercise testing; abnormal oxygen consumption results were present in 68% of patients. Among them, 130 patients (68% men, mean age, 32.4±10.0 years) had post‐repair evaluations. Post‐repair tests were performed immediately before bar removal with a mean time between repair and post‐repair testing of 3.4±0.7 years (range, 2.5–7.0). A significant improvement in cardiopulmonary outcomes (P<0.001 for all the comparisons) was seen in the post‐repair evaluations, including an increase in maximum, and predicted rate of oxygen consumption, oxygen pulse, oxygen consumption at anaerobic threshold, and maximal ventilation. In a subanalysis of 39 patients who also underwent intraoperative transesophageal echocardiography at repair and at bar removal, a significant increase in right ventricle stroke volume was found (P<0.001).\nConsistent improvements in cardiopulmonary function were seen for pectus excavatum adult patients undergoing surgery. These results strongly support the existence of adverse cardiopulmonary consequences from this disease as well as the benefits of surgical repair.<|endoftext|>"}
2
{"prompt": "", "completion": " Stent thrombosis (ST) remains a significant medical issue. In particular, longer‐term mortality and clinical predictors after ST occurrence have yet to be elucidated.\nThis was a multicenter, retrospective, observational study. A total of 187 definite ST cases from January 2008 to December 2017 were enrolled, and the long‐term clinical outcomes were investigated. The primary outcome measure was the cumulative mortality after ST occurrence. In addition, independent predictors of mortality were assessed. Among the stent types causing ST, bare‐metal stent, first‐generation drug‐eluting stent, second‐generation drug‐eluting stent, and third‐generation drug‐eluting stent comprised 31.0%, 19.3%, 36.9%, and 6.4% of cases, respectively. Median duration from stent implantation to ST was 680.5 (interquartile range, 33.8–2450.5) days. Cumulative mortality was 14.6%, 17.4%, 21.2%, 24.4%, and 33.8% at 1, 2, 3, 5 and 10 years, respectively. The cumulative mortality did not significantly differ by type of stent, and mortality of late ST was higher than that of early ST and very late ST; however, it did not reach statistical significance after the multivariate analysis. Independent predictors of mortality were hemodialysis (hazard ratio [HR], 7.80; 95% CI, 3.07–19.81; P<0.001), culprit lesions in the left main trunk (HR, 8.14; 95% CI, 1.71–38.75; P=0.008), culprit lesions in the left coronary artery (HR, 2.77; 95% CI, 1.10–6.96; P=0.030), and peak creatine kinase (HR, 1.017; 95% CI, 1.011–1.022; P<0.001).\nThe 10‐year cumulative mortality after ST reached 33.8%. Close follow‐up is thus mandatory for patients with ST, especially with hemodialysis, culprit lesions in the left main trunk and left coronary artery, and high peak creatine kinase.<|endoftext|>"}
3
...
Copied!
After you fine-tune a model in this way, you can still use prompt design and parameters as you would with the base model to perform various tasks. However, it often works best to use this expert model as a base model to further fine-tune on any classification or conditional generation task. It can also be useful to use the embeddings from such a model for better semantic understanding of domain-specific text.
Generative tasks have a potential to leak training data when requesting completions from the model, so additional care needs to be taken that this is addressed appropriately. For example, personal information should be anonymized or not included in fine-tuning in the first place.

Write jokes

Let's say you'd like to fine-tune a model capable of generating random jokes with a simple set up and punchline, and have scraped a large number of similar jokes from the Internet. You'll want to ensure that all the jokes are formatted correctly and consistently, and make sure you use <|endoftext|> to help the model know when to stop generations during inference. Then you can leave the prompt empty, and put the entire joke in completions:
1
{"prompt": "", "completion": " Set up: What’s the best thing about Switzerland?\nPunchline: I don’t know, but the flag is a big plus.<|endoftext|>"}
2
{"prompt": "", "completion": " Set up: I invented a new word!\nPunchline: Plagiarism!<|endoftext|>"}
3
{"prompt": "", "completion": " Set up: Did you hear about the mathematician who’s afraid of negative numbers?\nPunchline: He’ll stop at nothing to avoid them.<|endoftext|>"}
4
...
Copied!
To use such a model you can write a few starting words of the joke, and let the model complete it. You could also let the model generate new haikus by increasing the temperature, and sampling from the model with an empty prompt.

Maintain company voice

Many companies use models for various content generation use cases. Many companies will also have a large amount of well-written content generated in a unique voice. Here we can follow a similar format to the domain expert model above, and feed in all the content which are good examples of the company voice. Then this fine-tuned model can be further fine-tuned to solve a number of different use cases with similar prompts to the ones used for base models, with the outputs following the company voice much more closely than before.
An dataset for Apple would look like the following:
1
{"prompt": "", "completion": " Apple today announced it will host its annual Worldwide Developers Conference (WWDC) in an online format from June 6 through 10, free for all developers to attend. Building on the success of the past two years of virtual events, WWDC22 will showcase the latest innovations in iOS, iPadOS, macOS, watchOS, and tvOS, while giving developers access to Apple engineers and technologies to learn how to create groundbreaking apps and interactive experiences.\n\n“At its heart, WWDC has always been a forum to create connection and build community,” said Susan Prescott, Apple’s vice president of Worldwide Developer Relations and Enterprise and Education Marketing. “In that spirit, WWDC22 invites developers from around the world to come together to explore how to bring their best ideas to life and push the envelope of what’s possible. We love connecting with our developers, and we hope all of our participants come away feeling energized by their experience.”\nNow more than ever, Apple products are part of users’ daily lives, helping them to learn, create, and connect. At WWDC22, Apple’s growing global community of more than 30 million developers will gain insight and access to the technologies and tools to bring their visions into reality. In addition to the announcements shared from the keynote and State of the Union presentations, this year’s program will also include more information sessions, more cutting-edge learning labs, more digital lounges to engage with attendees, and more localized content to make WWDC22 a truly global event.\nIn addition to the online conference, Apple will host a special day for developers and students at Apple Park on June 6 to watch the keynote and State of the Union videos together, along with the online community. Space will be limited, and details about how to apply to attend will be provided on the Apple Developer site and app soon.\nFor the third year, Apple will also support students who love to code with the Swift Student Challenge. Swift Playgrounds is a revolutionary app for iPad and Mac that makes learning the Swift programming language interactive and fun. For this year’s challenge, students from around the world are invited to create a Swift Playgrounds app project on a topic of their choice, and they can submit their work through April 25. For more information, visit the Swift Student Challenge website.\nApple is proud to support and nurture the next generation of developers through its annual WWDC student program. Over the past three decades, many thousands of students have sharpened their skills and connected with other coders of all ages. They have gone on to nurture long and successful careers in technology, found venture-backed startups, and create nonprofits focused on using technology to build a better future.<|endoftext|>"}
2
{"prompt": "", "completion": " Apple today announced that Apple Business Essentials is now available to all small businesses in the US. The new service brings together device management, 24/7 Apple support, and iCloud storage into flexible subscription plans. Apple also unveiled new AppleCare+ for Business Essentials options that can be added to any plan. Additionally, a two-month free trial will be available to all customers, including those who have been using Apple Business Essentials in beta.\n“Apple has a deep and decades-long commitment to helping small businesses thrive. From dedicated business teams in our stores to the App Store Small Business Program, our goal is to help each company grow, compete, and succeed,” said Susan Prescott, Apple’s vice president of Enterprise and Education Marketing. “We look forward to bringing Apple Business Essentials to even more small businesses to simplify device management, storage, support, and repairs. Using this new service leads to invaluable time savings for customers — including those without dedicated IT staff — that they can invest back into their business.”\nApple Business Essentials supports small businesses throughout the entire device management life cycle — from device setup, to device upgrades — while providing strong security, prioritized support, and data storage and backup. The complete solution begins with simple employee onboarding, allowing a small business to easily configure, deploy, and manage Apple products from anywhere. With the Collections feature, groups of apps can be delivered to employees or teams, and settings can be automatically pushed to devices, such as VPN configurations, Wi-Fi passwords, and more.<|endoftext|>"}
3
{"prompt": "", "completion": " Apple today announced a $50 million Supplier Employee Development Fund that will expand access to learning opportunities and skills development. The fund also includes new and expanded partnerships with leading rights advocates, universities, and nonprofits to drive Apple’s ongoing work to empower supplier employees and drive improvements in knowledge of and respect for workplace rights across industries.\nAs part of the new $50 million commitment, and building on Apple’s long history in the space, the company is working alongside supply chain partners to amplify worker voice. That includes supporting the worker rights programs created by the International Labour Organization (ILO) for people in the electronics sector, and the work the International Organization for Migration (IOM) is doing to expand rights trainings and scale its industry-leading responsible labor recruitment tools.\n“We put people first in everything that we do, and we’re proud to announce a new commitment to accelerate our progress and provide even more opportunities for people across our supply chain,” said Sarah Chandler, Apple’s senior director of Environment and Supply Chain Innovation. “Together with rights advocates and education leaders, we are continuing to drive new innovation to support people and the planet.”\n“At IOM, we understand that supply chain responsibility is complex, and must be driven by innovation, results, and partnerships,” said Amy Pope, IOM’s deputy director general for Management and Reform. “Improving people’s lives must be at its core. The IOM and Apple partnership has proven results in Apple’s own supply chain and paves the way for others in the industry to follow. To effect real change, we need global collaboration that engages workers, NGOs, government, and industry. Apple’s new commitments will have tangible, meaningful benefits for workers around the world.”\nApple is also today releasing its 16th annual People and Environment in Our Supply Chain Report, which provides a comprehensive account of how Apple and its suppliers are supporting people across the company’s supply chain, transitioning to clean energy, and investing in cutting-edge technologies.<|endoftext|>"}
4
...
Copied!
Then this fine-tuned model can be further fine-tuned to solve a number of different use cases or used as a base model that follows the company voice much more closely than before.
A similar technique could be used for creating a virtual character with a particular personality, style of speech and topics the character talks about. You could also map generations to programmatic actions by inserting tags in the following format: <|action|>, <|make_bed|>, <|cook_breakfast|>.

Train a fine-tuned model

Once you have sufficient training data formatted in a single text file, training can easily be done in your dashboard.

Create a new project

Press "New project" to get started.

Select the model you'd like to fine-tune

Toggle fine-tuning

Upload dataset

Upload your JSONL formatted text file. Free fine-tuning is applied for datasets less than 100MB for GPT-J and less than 10MB for GPT-NeoX.
If your dataset is larger than these limits, get in touch with our team for custom support.

Set number of epochs

Epochs are the number of times your model will "see" each example in your dataset.
We recommend setting 1-4 epochs based on the size of your dataset.
Dataset size
Epochs
Less than 1MB
4 epochs
Less than 10MB
3-4 epochs
Less than 100MB
2-3 epochs
Less than 500MB
1-2 epochs
Greater than 500MB
1 epoch
We typically recommend slightly overtraining on your dataset since you can set checkpoints to find an optimally trained version of the model.

Set number of checkpoints

A checkpoint is a model version saved throughout training. The amount of time a model is trained is critical to performance, and by saving many checkpoints throughout training, you can easily find the optimally trained model.
In general, you'll want to set at least 5 checkpoints with the maximum allowed being 15.

Add test prompts

Each checkpoint will provide a completion for each test prompt. They are useful to compare the performance of your different checkpoints after training. Test prompts should not be examples in your dataset, but should be the exact same format as the prompts used in your dataset.
It's recommended to set at least 3 test prompts to compare performance. The maximum number of test prompts is 15.
It's also recommended set parameters suited for your task. Using default or incorrect parameters can lead to poor performance.
Test prompts don't support special characters like "\n" or "\t". Instead, make an actual newline or Tab in the input.
Once you've entered test prompts, you can press "Fine-tune" and your model will begin training. Checkpoints will begin to populate in the Project view as training progresses, and you can press "View test prompts" to compare checkpoint performance.

Use a fine-tuned model

When you've found a model that performs well on your task, you can press "Deploy" and begin getting completions or embeddings via API, or use your model in the Playground.
Models typically take 1-3 minutes to deploy before being able to process requests.
For inference, you should format your prompts in the same way as you did when creating the training dataset, including the same separator. For example, take the following dataset:
1
{"prompt": "Company Name: Nike\nProduct description: Nike Air Jordan is an American brand of basketball shoes, athletic, casual, and style clothing produced by Nike. Founded in Chicago, Air Jordan was created for Hall of Fame former basketball player Michael Jordan during his time with the Chicago Bulls.\nBlog idea:", "completion": "Jordan Brand History - A blog about the history of the brand.<|endoftext|>"}
2
{"prompt": "Company Name: Apple\nProduct description: The iPhone is a smartphone made by Apple that combines a computer, iPod, digital camera and cellular phone into one device with a touchscreen interface. The iPhone runs the iOS operating system, and in 2020 when the iPhone 12 was introduced, it offered up to 256 GB of storage and a 12-megapixel camera.\nBlog idea:", "completion": "Are You running an iPhone 12? Here\u2019s How to Back Up Your Photos - A blog about backing up photos from your phone.<|endoftext|>"}
3
{"prompt": "Company Name: Zevo Insect\nProduct description: Nobody likes having insects around the house, and the old fly swatter doesn\u2019t really make much of difference with these persistent pests. Zevo is a plug and decimate solution. The multi-spectrum light technology attracts the critters and traps them on a sticky film backing. No chemicals. No gimmicks. And no need to go anywhere near the insects.\nBlog idea:", "completion": "How to Make Fruit Flies Disappear: A blog all about the Zevo and how it can make your life better.<|endoftext|>"}
4
...
Copied!
When making a Completion request to a model fine-tuned on the above dataset, the prompt should look like:
1
body = {
2
"text": f'Company Name: {company_name}\nProduct description: {product_description}\nBlog idea:',
3
"top_p": 1,
4
"top_k": 40,
5
"temperature": 0.8,
6
"stop_tokens": ["\n"]
7
...
8
}
Copied!
Since each completion in the training set ends with <|endoftext|> the model should be proficient at stopping generations when you'd want, but it's often helpful to add other stop sequences to consistently truncate the completion how you'd like. A commonly used stop sequence that can be applied in many use cases is "\n", to stop generations before outputting a newline.

Compose models

An advanced use of fine-tuning is to use multiple models to solve complex tasks. Read more

Weights & Biases

You can sync your fine-tuning jobs with Weights & Biases to track training progress.
To get started, copy your Weights & Biases API Key and paste it in Settings -> General. Once saved, all future fine-tuning jobs can be seen from your Weights & Biases account.

Errors

Fine-tuning errors can happen due to a number of reasons from errors with your dataset to using incorrect parameters for test prompts. The two common errors experienced while fine-tuning are:
  1. 1.
    Incorrect text file format
  2. 2.
    Fine-tune button is grayed out
  3. 3.
    Checkpoints haven't come in within estimated time

Incorrect text file format

If you use any other format besides a UTF-8 encoded text file (Western MacOS Roman, Windows Latin 1, etc) the fine-tuning job will not work. To ensure your text file is saved with UTF-8 encoding, click "Save As" in your text editor and make sure you select the UTF-8 option.

Fine-tune button is grayed out

If you're unable to click the "Fine-tune" button make sure the following are true:
  1. 1.
    Valid title is entered
  2. 2.
    Dataset uploaded (100MB or less for GPT-NeoX or 10MB or less for GPT-J)
  3. 3.
    Valid number of epochs is entered
  4. 4.
    Valid number of checkpoints is entered
  5. 5.
    At least one test prompt is entered with valid parameters (see below)
The following parameters will cause errors only with test prompts:
  1. 1.
    Temperature = 0 (use 0.01 or Top K = 1)
  2. 2.
    Length > 500

Checkpoints haven't appeared within estimated time

This most likely means there was an error that resulted in a failed fine-tuning job. Please contact our team and we'll fix the bug and restart the job. There's rarely anything you'll need to do besides notify us.
Last modified 1mo ago