Understand how pricing works on Forefront.
You can start using Forefront on the Free plan and grow with our flat-rate, simple pricing. As you scale, switch to dedicated resources for the best cost efficiency.

Pricing plans

Starter ($29/mo)
Growth ($99/mo)
Team ($299/mo)
Enterprise (custom pricing)
5M serverless tokens
20M serverless tokens
50M serverless tokens
Unlimited serverless tokens
5 fine-tuned models
10 fine-tuned models
20 fine-tuned models
Unlimited fine-tuned models
1 user
2 users
10 users
Unlimited users
Discord support
Discord support
Standard support
Priority support
Dedicated resources
Dedicated resource discounts
Export fine-tuned models
Export fine-tuned models
Frequently asked questions
Prompt + completion tokens are considered in overages.


Resources represent dedicated GPUs to host your models. Multiple models of the same type can be hosted and used on a single GPU with minimal effect to latency or throughput. View resource rates
Resources differ from pay-per-token in a few ways, making them more cost efficient as you scale:
  1. 1.
    You pay for the time that your GPUs are live (like Amazon EC2).
  2. 2.
    Dedicated GPUs give you more stable throughput and latency.
  3. 3.
    You can control scaling settings to process any volume of requests efficiently.
GPUs can be turned on / off through the dashboard and set to autoscale. Usage costs are prorated to the minute.
Example resource usage
Using the previous pay-per-token example, an example cost per request for resources would be the following:
Model: GPT-J
Resource: GPT-J Performance ($2.78 per hour)
Using the same prompt and completion in the previous example (20 tokens in, 102 tokens out), a single Performance GPU for GPT-J can process 50 requests per minute.

Cost per request

cost_per_hour / requests_per_hour = request_cost
$2.78 / 3000 = $0.00093


Fine-tuning is a method to train a model on a dataset to specialize the model for a specific task. View fine-tuning rates
Example fine-tuning usage
An example cost for fine-tuning would look like the following:
Model: GPT-J ($0.0002 / training example * epoch)
Dataset: 1000 training examples
Epochs: 4

Fine-tuning cost

model_rate * training_examples * epochs = fine_tuning_cost
$0.0002 * 1000 * 4 = $0.80