Completions
Generate completions based on the provided prompt and parameters.
post
https://example.forefront.link
Create completion
Generate a new completion for the provided prompts and parameters. Asterisk (*) indicates a required field.
Parameters
Header
Authorization*
String
Bearer <YOUR_MODEL_KEY>
Model keys can be found in Settings -> API Keys
Body
prompt*
String
The prompt to generate a completion for.
max_tokens
Integer
The maximum number of tokens to generate in the completion. The token count of your prompt plus
max_tokens
cannot exceed 2048 tokens.temperature
Number
Temperature controls the randomness of the generated text. A value of 1 makes the model take the most risks and use a lot of creativity.
Between 0 and 1.
We generally recommend altering this or Top-P but not both.
top_p
Number
Controls diversity via nucleus sampling: 0.5 means half of all likelihood-weighted options are considered.
Between 0 and 1.
We generally recommend altering this or temperature but not both.
top_k
Integer
Sorting by probability and zero-ing out the probabilities for anything below the k'th token. A lower value improves quality by removing the tail and making it less likely to go off topic.
Between 1 to 50,400. No default.
repetition_penalty
Number
How much to penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Any float greater than 1. No default. 1 - 1.3 is recommended for most tasks.
stop
Array of strings
Sequences where the API will stop generating further tokens. The completion will not contain the stop sequence.
Array should contain strings.
n
Integer
How many completions to generate for the given prompt.
Because this parameter generates many completions, use carefully and ensure that you have reasonable settings for
length
and stop_sequences
.
Defaults to 1.echo
Boolean
Set to True if you want the response to return your prompt with the completion.
Defaults to False.
bad_words
Array of strings
Provide an array of strings where each string will not be generated by the model. Remember to be explicit with the strings (capitalization, punctuation, etc).
logit_bias
Map
Modify the likelihood of specified tokens appearing in the completion.
Defaults to null.
Accepts a json object that maps tokens (specified by the first word in a string or a token ID in the GPT tokenizer) to an associated bias value. Values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
As an example, you can pass
{"50256": -100}
to prevent the <|endoftext|> token from being generated. Or, pass {"dog": 1}
to increase the likelihood of dog appearing in the completion. Use OpenAI's tokenizer tool to get token IDs.tfs
Number
Similar to nucleus sampling, but it sets its cutoff point based on the cumulative sum of the accelerations (second derivatives) of the sorted token probabilities rather than the probabilities themselves. Learn more
Number between 0 and 1.0.
logprobs
Integer
Include the log probabilities on the
logprobs
most likely tokens. For example, if logprobs
is 3, the API will return a list of the 3 most likely tokens.
Defaults to null. Accepts an integer between 1 and 5.
stream
Boolean
Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a
data: [DONE]
message.
Defaults to False.
Responses
200
Completion successfully provided.
Python
Javascript
cURL
import requests
import json
headers = {"Authorization": "Bearer <INSERT_YOUR_TOKEN>"}
body = {
"prompt": "Once upon a time",
"max_tokens": 64,
"top_p": 1,
"top_k": 40,
"temperature": 0.8,
"repetition_penalty": 1,
"stop_sequences": ["bird", "cat", "dog"]
}
res = requests.post(
"https://vanilla-gpt-j-forefront.forefront.link",
json=body,
headers=headers
)
data = res.json()
completion = data['result'][0]['completion'])
const fetch = require('node-fetch');
const url = 'https://shared-api.forefront.link/organization/G1Yk5qgNGeTb/gpt-j-6b-vanilla/completions/2JrDQ5BhJAm6';
const headers = {
"Authorization": "Bearer 3ba7a48d87544fd4b54b13af",
"Content-Type": "application/json"
};
const body = {
prompt: "Once upon a time",
max_tokens: 64,
top_p: 1,
top_k: 40,
temperature: 0.8,
repetition_penalty: 1,
stop_sequences: ["bird", "cat", "dog"]
};
fetch(url, {
method: 'POST',
headers,
body: JSON.stringify(body)
}).then(async (res) => {
const data = await res.json()
console.log(data)
})java
curl https://DEPLOYMENT_NAME-TEAM_SLUG.forefront.link \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"prompt": "Once upon a time",
"max_tokens": 64,
"top_p": 1,
"top_k": 40,
"temperature": 0.8,
"repetition_penalty": 1,
"stop_sequences": ["bird", "cat", "dog"]
}'
get
https://example.forefront.link
Stream completion
Streams a new completion for the provided prompts and parameters with server sent events.
Parameters
Query
prompt*
String
The prompt to generate a completion for.
max_tokens
Integer
The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed 2048 tokens.
temperature
Number
Temperature controls the randomness of the generated text. A value of 1 makes the model take the most risks and use a lot of creativity.
Between 0 and 1.
We generally recommend altering this or Top-P but not both.
top_p
Number
Controls diversity via nucleus sampling: 0.5 means half of all likelihood-weighted options are considered.
Between 0 and 1.
We generally recommend altering this or temperature but not both.
top_k
Integer
Sorting by probability and zero-ing out the probabilities for anything below the k'th token. A lower value improves quality by removing the tail and making it less likely to go off topic.
Between 1 to 50,400. No default.
repetition_penalty
Number
How much to penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Any float greater than 1. No default.
stop
Array
Sequences where the API will stop generating further tokens. The completion will not contain the stop sequence.
Array should contain strings.
Header
Authorizaion*
String
Bearer token available in Settings > API Key
Responses
200: OK
401: Unauthorized
500: Internal Server Error
Our streaming API uses the built in Javascript EventSource object, streaming tokens via Server Sent Events (SSE). If you're using another language, you'll need to find an equivalent API to listen for these events. Because the EventSource API doesn't support POST requests, you'll have to put all of your params into a query string, and URI Encode it appropriately. Here's some sample front end JS code showing how you can do this, which you can use as a starting point for other languages.
url = '' // for example <https://example.forefront.link>
api_key = '' // navigate to Settings => API Key
encodeURIComponent(JSON.stringify(stop.map(s => s.replaceAll('\\n', '\n'))))
)
// helper function to construct URL
const makeUrlForEventSource = (input) => (
`${url}/?text=${encodeURIComponent(input)
}&length=${length
}&repetition_penalty=${rep
}&temperature=${temperature
}&top_p=${topP
}&top_k=${topK
}&stop_sequences=${processStopSequences(stop)
}&authorization=Bearer ${api_key}`
);
// helper function to submit request and listen for events
const handleEventSourceSubmit = (input) => {
const url = makeUrlForEventSource(input)
const source = new EventSource(url, { withCredentials: false });
source.addEventListener('update', (e) => {
handleStream(e.data)
})
source.addEventListener('end', () => {
source.close()
})
}
Last modified 1mo ago