Text generation


A large language model (LLM) is a type of machine learning model that has been trained to understand natural language text inputs. LLMs generate text in response to these inputs.

LLMs have been trained on vast amounts of data and excel at all types of tasks that involve text: creating marketing blogs, translating text, and writing computer code to name a few.

A typical LLM input, also known as a "prompt", could include a task, some background information, and instructions on how the model should respond.

For example, you might ask an LLM to write a marketing blog about a new product (the task) based on some product data (the background information) in a way that caters to 20-30 year olds (the instructions.) Sometimes the instructions are referred to as "system instructions."

Getting started

You can interact with LLMs through the Playground or API. You can also download models and run them from your own machine or in your cloud.


To access the playground, log in to https://platform.forefront.ai and click on "Playground" in the sidebar. You can select a model in the top-right corner of the screen, type a message into the main input, and click submit to generate a text response.

In the model drop-down you'll be able to select from three types of models:

Foundation models are "base layer" models that are trained on the majority of data on the public internet. They contain an incredible amount of knowledge, but are not trained to perform any specific task.

Community models are models made by members of the open source community that have been trained, or fine-tuned, to perform a specific task, such as conversation chat.

My models are models that you have fine-tuned on your own datasets.

Completions API

You can interact with models programmatically through our API. Forefront supports two popular formats for sending text inputs to LLMs, chat style and completion style. If you're not sure which format to use, start with chat style as it generally produces good results.

Chat style

Chat style is best for conversational use cases as it allows you to send a string of messages to the LLM, where each message has a role of "user" or "assistant". You can also add system instructions by providing a message with the "system" role.

Note: Not all models know how to use chat style out of the box as this ability is learned through specialized training. We sometimes perform this training on models to make them compatible with chat style inputs. We add the label chat-ml to these models when we do so.

Below is an example of sending a chat style input to an LLM through the API. Notice that the text input are passed in the messages parameter as a list of messages.

curl https://api.forefront.ai/v1/chat/completions \
  --header 'content-type: application/json' \
  --header 'authorization: Bearer $FOREFRONT_API_KEY' \
  --data '{
        "model": "mistralai/Mistral-7B-v0.1",
        "messages": [
                "role": "system",
                "content": "Respond to the user with beginner-friendly recipes using seasonal ingredients that are commonly found in most grocery stores."
                "role": "user",
                "content": "What is a good chicken recipe"
        "max_tokens": 64,
        "temperature": 0.5

Completion style

Completion style is used for models that have not been trained to understand chat style inputs and for use cases that require specialized non-conversational syntax.

Below is an example of sending a completion style API request. In this case, the text input is passed in the prompt parameter:

curl https://api.forefront.ai/v1/chat/completions \
  --header 'content-type: application/json' \
  --header 'authorization: Bearer $FOREFRONT_API_KEY' \
  --data '{
        "model": "mistralai/Mistral-7B-v0.1",
        "prompt": "Write a script for the opening scene of a movie where Mickey Mouse goes to the beach",
        "max_tokens": 64,
        "temperature": 0.5


In addition to the text input, you'll need to pass a few additional parameters to the API, some of which are optional. More information on these parameters can be found in the API reference, but will be briefly explained below.


The name of the model that you want to interact with


The maximum number of tokens the model should output. Depending on the model and use case, the model can generate. For info on what tokens are can be found below.


This is a number between 0 and 1 that represents the level of randomness, or creativity the model should use when generating text. For use cases that require high accuracy i.e writing code, set temperature between 0 and 0.2. For use cases that benefit from some creativity i.e. writing a marketing blog, set temperature between 0.3 and 1.


This an array of words or characters that are used to tell the model when to stop generating text.


It may seem that LLMs generate text one character at a time, but they actually generate text in chunks. These chunks are commonly referred to as "tokens". Tokens range from a single character, to a few characters, or in some cases a full word.

In the phrase "I am a friendly AI assistant!", the corresponding tokens are: "I, am, a, friendly, AI, assistant, !"

Context length

When a model receives a text input, it converts the text to tokens before generating new text. Every model has a maximum amount of tokens that it can process in a single interaction. The maximum number of tokens a model can process is called the content length. For example, the context length for the Mistral model on Forefront is 4,096. That means that the input and output tokens for a single request to this model cannot exceed 4,096 tokens.

Tokens as billable units

Tokens are the billable unit when sending requests or fine-tuning with LLMs. It is common for LLM providers (including Forefront) to bill in units of one thousand tokens. Visit our pricing page to see the current rates.

Last updated