Pricing
Understand how pricing works on Forefront.
You can start using Forefront on the Free plan and grow with our flat-rate, simple pricing. As you scale, switch to dedicated resources for the best cost efficiency.
Starter ($29/mo) | Growth ($99/mo) | Team ($299/mo) | Enterprise (custom pricing) |
---|---|---|---|
5M serverless tokens | 20M serverless tokens | 50M serverless tokens | Unlimited serverless tokens |
5 fine-tuned models | 10 fine-tuned models | 20 fine-tuned models | Unlimited fine-tuned models |
1 user | 2 users | 10 users | Unlimited users |
Discord support | Discord support | Standard support | Priority support |
| | Dedicated resources | Dedicated resource discounts |
| | Export fine-tuned models | Export fine-tuned models |
| | | |
Resources represent dedicated GPUs to host your models. Multiple models of the same type can be hosted and used on a single GPU with minimal effect to latency or throughput. View resource rates
Resources differ from pay-per-token in a few ways, making them more cost efficient as you scale:
- 1.You pay for the time that your GPUs are live (like Amazon EC2).
- 2.Dedicated GPUs give you more stable throughput and latency.
- 3.You can control scaling settings to process any volume of requests efficiently.
GPUs can be turned on / off through the dashboard and set to autoscale. Usage costs are prorated to the minute.
Using the previous pay-per-token example, an example cost per request for resources would be the following:
Model: GPT-J
Resource: GPT-J Performance ($2.78 per hour)
Using the same prompt and completion in the previous example (20 tokens in, 102 tokens out), a single Performance GPU for GPT-J can process 50 requests per minute.
cost_per_hour / requests_per_hour = request_cost
$2.78 / 3000 = $0.00093
Fine-tuning is a method to train a model on a dataset to specialize the model for a specific task. View fine-tuning rates
Last modified 1mo ago