Understand how pricing works on Forefront.
You can use the Forefront platform for free every month until you exceed $10 of usage. After exceeding the free tier limit, you can use models on a pay-per-token basis or host models on resources.


Pay-per-token ("PPT") charges you based on the sum tokens in your prompt and completion multiplied by the model's pay-per-token rate. View pay-per-token rates
PPT is the default way to use the API. API URLs with PPT pricing start with shared-api.
Example pay-per-token usage


Resources represent dedicated GPUs to host your models. Multiple models of the same type can be hosted and used on a single GPU with minimal effect to latency or throughput. View resource rates
Resources differ from pay-per-token in a few ways, making them more cost efficient as you scale:
  1. 1.
    You pay for the time that your GPUs are live (like Amazon EC2).
  2. 2.
    Dedicated GPUs give you more stable throughput and latency.
  3. 3.
    You can control scaling settings to process any volume of requests efficiently.
GPUs can be turned on / off through the dashboard. Usage costs are prorated to the minute.
Example resource usage


Fine-tuning is a method to train a model on a dataset to specialize the model for a specific task. View fine-tuning rates
Example fine-tuning usage