Introduction

The Forefront platform can be used to start—or transition to—fine-tuning and inferencing open-source models.

We offer a variety of models at different sizes and price points, and the ability to import, fine-tune, and export custom models.

Resources

Experiment in the Playground
Read the API reference
Fine-tune custom models
Store your model outputs in datasets
Export your models to self-host
Import models from HuggingFace

Protecting your data and model rights are fundamental to our mission. We do not log API requests or train models on your data. You retain all rights to your fine-tuned models and can export them at anytime.

Key concepts

Text generation models

Forefront enables you to fine-tune and inference open-source text generation models (often referred to as generative pre-trained transformers or "GPT" models for short).

These models have been trained to understand natural language, and will generate text outputs in response to their inputs. The inputs to these models are referred to as "prompts".

Writing a prompt is how you "program" a model, usually by providing instructions or some examples of how to successfully complete a task. Text generation models can be used for a variety of tasks including content or code generation, summarization, conversation, creative writing, and more.

Tokens

Text generation models process text in chunks called tokens. Tokens represent commonly occurring sequences of characters. As a rough rule of thumb, 1 token ≈ 4 characters.

For example, the word "reptile" is split into tokens " re", "pt", and "ile"—while a short and common word like " animal" is represented as a single token. Note that in a sentence, the first token of each word typically starts with a space character.

One limitation for text generation models is the prompt tokens and generated output tokens combined must be no more than the model's maximum context length. The maximum context lengths for each text generation model can be found here.

Fine-tuning

Fine-tuning enables you to customize a model for specific use cases. It lets you get more out of models by providing:

Higher quality results than prompting
Ability to train on more examples than can fit in a prompt
Token savings due to shorter prompts
Lower latency requests

Datasets

Datasets are JSONL files used during fine-tuning that contain example input-output pairs that you want your fine-tuned model to learn. The examples in your dataset should mimic how you expect to use the model in production.

Here's an example dataset with the two supported formats of prompt and chat completions:

{"prompt": "", "completion": ""}
{"prompt": "", "completion": ""}
{"prompt": "", "completion": ""}

{"messages": [{"role": "system", "content": ""}, {"role": "user", "content": ""}, {"role": "assistant", "content": ""}]
{"messages": [{"role": "system", "content": ""}, {"role": "user", "content": ""}, {"role": "assistant", "content": ""}]
{"messages": [{"role": "system", "content": ""}, {"role": "user", "content": ""}, {"role": "assistant", "content": ""}]

Pipelines

Pipelines are a tool for storing LLM outputs to use for fine-tuning.

A common approach to building applications with text generation models is to start by using the most powerful model for your use case, like OpenAI's GPT-4.

This makes it easy to build an initial proof of concept, but comes with tradeoffs like costly inference, high latency, and no model ownership which makes you subject to usage policies and changes to model performance that may not align with your use case.

Pipelines makes it simple to store your current model usage in ready to fine-tune datasets so you can transition to smaller, open-source models. This enables you to own your models and gives you faster latency, cheaper inference, more consistent performance, and the option to self-host.