Explore large language models offered on the Forefront platform.
OPT (coming soon)
BLOOM (coming soon)
BigScience Research Workshop
These large language models are pre-trained on a vast amount of text from the Internet. Pre-training involves taking a mathematical model with random mathematical parameters (weights) and iteratively adjusting those weights in response to differences between the output generated by the model and some point of comparison showing an expected output. For large language models, the most common method of training is next word prediction over enormous amounts of text, like the Pile.
In general, the more parameters a model has, the better it will perform on language tasks. Models can be improved through fine-tuning, whereby a base model is trained on a set of prompt-completion pairs showing ideal task performance.
Most models on the Forefront platform function identically. However, there are some exceptions depending on the model you're using.
T5-20B is a 20 billion parameter encoder-decoder model. Best practices to use this model are slightly different than decoder-only models:
- 1.Always start prompts with
[S2S]. Note the whitespace after the text.
- 2.Always end prompts with
<extra_id_0>. Note the whitespace prior to the text.
An example prompt would look like
"prompt": "[S2S] <prompt text> <extra_id_0>",
Newlines ("\n") are not apart of the UL2 token vocabulary which means that the model will have no understanding of newlines nor be able to generate newlines.