Pipelines

Store LLM outputs into ready to fine-tune datasets.

Pipelines are a tool for storing LLM outputs for fine-tuning.

Fine-tuning a smaller model on the outputs of a larger model is a common strategy to optimize cost and performance while ensuring consistent and quality responses.

Using smaller, open-source models can also enable you to self-host due to less-restrictive hardware requirements, giving you ownership of your models and keeping data within your network.

What is a pipeline?

A pipeline is a collection of LLM outputs that you can easily create, filter, and fine-tune on later.

There are a few steps in the pipeline lifecycle:

  1. Create the pipeline

  2. Add LLM outputs to the pipeline

  3. Filter the pipeline to create a dataset

  4. Fine-tune a model on the dataset

  5. Collect more LLM outputs, add to the pipeline, and repeat

Samples added to a pipeline can be tagged with a user_id, group_id, or custom metadata. You can filter pipelines to view segments of your data and create training datasets from those segments.

Getting started

Currently pipelines are only supported through the Forefront Python and Typescript SDK. Below is a walkthrough of how to get started:

Install the package

The Typescript SDK can be used in Node.js and serverless environments (including Cloudflare workers).

Initialize the Forefront client

Create a pipeline

Get pipelines

Get pipeline by ID

Add data to a pipeline

Filter pipeline data

Inspect pipeline data

Create dataset from pipeline

To fine-tune a model from a pipeline, you will first need to convert it to a dataset.

Create a fine-tuned model and inference it

For completeness, here is an example of creating a fine-tuning job from a dataset and then inferencing the model once training is completed.

Last updated