Store LLM outputs into ready to fine-tune datasets.
Pipelines are a tool for storing LLM outputs for fine-tuning.
Fine-tuning a smaller model on the outputs of a larger model is a common strategy to optimize cost and performance while ensuring consistent and quality responses.
Using smaller, open-source models can also enable you to self-host due to less-restrictive hardware requirements, giving you ownership of your models and keeping data within your network.
What is a pipeline?
A pipeline is a collection of LLM outputs that you can easily create, filter, and fine-tune on later.
There are a few steps in the pipeline lifecycle:
Create the pipeline
Add LLM outputs to the pipeline
Filter the pipeline to create a dataset
Fine-tune a model on the dataset
Collect more LLM outputs, add to the pipeline, and repeat
Samples added to a pipeline can be tagged with a user_id, group_id, or custom metadata. You can filter pipelines to view segments of your data and create training datasets from those segments.
Getting started
Currently pipelines are only supported through the Forefront Python and TypescriptSDK. Below is a walkthrough of how to get started:
Install the package
The Typescript SDK can be used in Node.js and serverless environments (including Cloudflare workers).
pip install forefront
npm i forefront
Initialize the Forefront client
from forefront import ForefrontClientclient =ForefrontClient(api_key="<YOUR_API_KEY>")
import Forefront from"forefront"constclient=newForefront("<YOUR API KEY>");
# Assume the messages are the output of an LLMmessages = [{"role":"user","content":"Write a hello world in rust."},{"role":"assistant","content":'```rust\nfn main() {\nprintln!("Hello, World!");\n}\n```',},]# Get the pipeline object if you haven't alreadypipe = ff.pipelines.get_by_id("pipe_123")# Add the data to your pipeline# Optionally add a user_id, group_id, or key-value metadata to filter by laterpipe.add( messages=messages, user_id='user_123', group_id='group_a', metadata={"lang": "rust" })
// Assume the messages are the output of an LLMconstmessages= [ { role:"user", content:"Write a hello world for me in rust" }, { role:"assistant", content:'```rust\nfn main() {\nprintln!("Hello, World!");\n}\n```', },]// Get the pipeline object if you haven't alreadyconstpipe=awaitclient.pipelines.getById("pipe_123");// Add the data to your pipeline// Optionally add a user_id, group_id, or key-value metadata to filter by laterawaitpipeline.add({ messages: messages, userId:"user_123", groupId:"group_A", metadata: { lang:"rust", },});
Filter pipeline data
# Get a pipeline of samples created by "user1"user_1_examples = pipe.filter_by_user_id("user1")# Get a pipeline of samples created by "group1"group_1_examples = pipe.filter_by_group_id("group1"# Get a pipeline of samples tagged with specific metadata rust_examples = pipe.filter_by_metadata({"lang": "rust"})
// Get a pipeline of samples created by "user1"let userSamples =pipe.filterByUserId("user1")// Get a pipeline of samples created by "group1"let groupSamples =pipe.filterByGroupId("group1")// Get a pipeline of samples tagged with specific metadata let rustSamples =pipe.filterByMetadata({lang:"rust"})
Inspect pipeline data
'''Returns an array of dataset samples that meets the filter criteria from the previous step'''data =await rust_examples.get_samples()
/*Returns an array of dataset samples that meets the filter criteria from the previous step*/let rustSampleData =rustSamples.getSamples()
Create dataset from pipeline
To fine-tune a model from a pipeline, you will first need to convert it to a dataset.
'''Create a dataset called "my-rust-dataset usingthe filtered pipeline object from the previous step'''rust_dataset = rust_examples.create_dataset_from_pipeline("my-rust-dataset")
/* Create a dataset called "my-rust-dataset usingthe filltered pipeline object from the previous step*/constmyRustdataset=awaitrustSamples.createDatasetFromPipline("my-rust-dataset");
Create a fine-tuned model and inference it
For completeness, here is an example of creating a fine-tuning job from a dataset and then inferencing the model once training is completed.
# Create fine-tuned modelmy_rust_llm = ff.fine_tunes.create( name="my-rust-llm", base_model="mistralai/mistral-7b", training_dataset=rust_dataset.dataset_string, epochs=1, public=False)# Get the model string > "team-name/my-rust-llm"model_string = my_rust_llm.model_string# Inference the modelcompletion = ff.chat.completions.create( messages=[ {"role":"system", "content":"You are a helpful coding assistant" }, {"role": "user", "content": "Write the Fibonacci sequence" }, ], model=model_string,)
// Create fine-tuned modelconstmyRustLlm=awaitclient.fineTunes.create({ name:"my-rust-llm", baseModel:"mistralai/mistral-7b", trainingDataset:myRustDataset.datasetString, epochs:1, isPublic:true, });// Get the model string > "team-name/my-rust-llm"constmodelString=myRustLlm.modelString// Inference the modelconstcompletion=awaitclient.chat.completions.create({ model: modelString, messages: [ { role:"system", content:"You are a helpful coding assistant" }, { role:"user", content:"Write the Fibonacci sequence" }, ],});