Fine-tuning

Introduction

Fine-tuning is a great way to optimize model cost, quality, and latency. Fine-tuning enables higher quality results than prompting alone, and faster/cheaper requests due to shorter prompts.

Fine-tuning a smaller model on the outputs of a larger model can lead to even more dramatic improvements in costs and speed, while ensuring high quality outputs.

Getting started

You can fine-tune your first model on Forefront in ~30 seconds. At a high level, the fine-tuning step are:

  1. Gather a dataset

  2. Fine-tune a model

  3. Measure performance

  4. Deploy the model

  5. Iterate

First, you'll need to gather a dataset and format it correctly for fine-tuning. For more info on how to do this visit the datasets page.

Next, go to the Create Fine-tune page, select a model, upload your dataset, optionally set additional parameters, and click Fine-tune to kick off your fine-tuning job.

Forefront provides a way of measuring performance quantitatively and qualitatively through loss charts, validation datasets and evals.

Once training is complete, you can try out the model immediately through the playground or API. You can also download the model weights and self-host it if you like.

If your decide your model has room for improvement, simply collect more data and repeat the fine-tuning process.

When to use fine-tuning

There are a few common scenarios when fine-tuning is especially useful:

Fine-tuning a smaller model on the outputs of a larger model

This strategy is used to enable self-hosting (since larger models require more expensive hardware) and as a cost/latency optimization. An example of this is collecting example outputs from Mixtral 8x7B and fine-tuning a Mistral 7B model. This enables faster outputs, lower costs and barrier to self-hosting, and higher quality responses than using Mistral 7B without fine-tuning.

Fine-tuning an open source model on the outputs of a closed source model

Closed-source models can be expensive, and model providers can update the models on a whim leading to unexpected changes in model outputs. Fine-tuning an open source model on the outputs of a closed-source model can lead to higher quality outputs, with the cost and consistency benefits of owning your model.

Improving performance on complex tasks

In general we recommend starting with the best model for the job. If the best model still isn't good enough, then fine-tuning can be helpful to improve performance, especially on complex tasks.

Fine-tuning parameters

There are a few options you can configure when fine-tuning.

Models

You must select a base model to fine-tune on. You can fine-tune a foundation model or a model that was already fine-tuned.

In general, we recommend selecting a base model that is the best performing for you task. If you want to train a domain-specific chat model for example, you should choose a model that is already trained to understand chat.

Alternatively, you can fine-tune multiple models and see which one performs best. Forefront makes it easy to measure performance qualitatively and quantitatively through evals, validations, and fast experimentation in the playground.


You will notice three categories of models to select from in the fine-tuning model drop-down:

Foundation models are generally knowledgable, but not trained to perform any task. Choose a foundation model if you want to start with a "clean slate", your dataset uses a unique syntax, or you want the guarantee of using a "trusted" model (foundation models are created by organizations that emphasize safety and alignment during their training processes).

Community models are fine-tuned models created by members of the open-source community. These models can exceed performance of foundation models depending on the use case which makes them a great starting point.

My models are models that you have previously fine-tuned.

Epochs

You can select the number of epochs to fine-tune your model, where an epoch is defined as a complete pass through your training dataset. Setting a higher number of epochs can help the model learn complex tasks, but will increase the duration of training. Training for too many epochs can lead to overfitting however.

We recommend starting with a lower number of epochs and monitor performance to see if you should continue training. If it looks like there is room for improvement you can start a new fine-tuning job with more epochs.

Tip: If you wish to save on time/cost with training for more epochs, you can fine-tune your previously fine-tuned model for fewer epochs. For example, say you fine-tuned called "my-fine-tuned-model" for two epochs and decide you should train for two more epochs. Instead of creating a new fine-tuning job for four epochs, you could instead choose "my-fine-tuned mode" as your base model and train for two epochs.

Training dataset

This is the dataset your model will be trained on. Your dataset must be formatted properly using chat-ml or prompt-completion format. For more information on this, see the datasets page.

Validation dataset

Validation datasets are used to qualitatively evaluate the performance of your model. Once the model has completed fine-tuning, Forefront will inference your fine-tuned model on the samples in your validation dataset. You can view the results in the fine-tuning UI. For best results, make sure your validation dataset does not overlap with the training dataset. Validations will incur inference costs based on the model type

Evals

Forefront allows you to optionally run automatic evaluations on your fine-tuned models. Once the model has completed fine-tuning, Forefront will start the evals for your model. You can select more than one eval. The results will be shown in the UI once completed. Evals incur inference costs based on the model type.

Data collection strategies

To get the best fine-tuning results, you should optimize for high-quality training data. Remember, your dataset examples should be representative of how you plan to use your model in production.

A few heuristics for achieving this are:

  • Include good examples representative of your real-world use case, manually craft them if needed

  • Discard low-quality examples, even if that means having a smaller dataset

  • Remove duplicates

  • Generate high-quality outputs using better models, and fine-tune with them

  • Don't be afraid to iterate. Fine-tuning with a small dataset, and using your fine-tuned model to generate new examples for fine-tuning is a perfectly reasonable strategy.

For simpler tasks, the easiest way to collect high-quality data is to pull from real-world examples, or to train your model on the outputs of a better model.

For complex tasks, you might need to manually create your dataset or use clever prompting techniques like chain-of-thought to generate an initial dataset that you can fine-tune and iterate on.

Last updated