API performance
Understand what affects latency and throughput.
The latency (response speed) and throughput (requests processed per minute per GPU) you can achieve on Forefront depends on your chosen GPU and the input and output token size of your requests. Shorter token lengths will increase response speeds and throughput on a single GPU.
Input tokens affect response speeds and throughput much less than output tokens. A change of +/- 10 input tokens would have a similar effect as +/- 1 output token (10:1 ratio).
View the response speeds and throughput of 300 token in, 30 token out requests for our GPU options.
Last modified 3mo ago
Copy link