# Pipelines

Pipelines are a tool for storing LLM outputs for fine-tuning.

Fine-tuning a smaller model on the outputs of a larger model is a common strategy to optimize cost and performance while ensuring consistent and quality responses.

Using smaller, open-source models can also enable you to self-host due to less-restrictive hardware requirements, giving you ownership of your models and keeping data within your network.

## What is a pipeline?

A pipeline is a collection of LLM outputs that you can easily create, filter, and fine-tune on later.

There are a few steps in the pipeline lifecycle:

1. Create the pipeline
2. Add LLM outputs to the pipeline
3. Filter the pipeline to create a dataset&#x20;
4. Fine-tune a model on the dataset
5. Collect more LLM outputs, add to the pipeline, and repeat

Samples added to a pipeline can be tagged with a user\_id, group\_id, or custom metadata. You can filter pipelines to view segments of your data and create training datasets from those segments.&#x20;

## Getting started

Currently pipelines are only supported through the Forefront Python and Typescript SDK. Below is a walkthrough of how to get started:

### Install the package

The Typescript SDK can be used in Node.js and serverless environments (including Cloudflare workers).

{% tabs %}
{% tab title="Python" %}

```
pip install forefront
```

{% endtab %}

{% tab title="Node.js" %}

```
npm i forefront
```

{% endtab %}
{% endtabs %}

### Initialize the Forefront client

{% tabs %}
{% tab title="Python" %}

```python
from forefront import ForefrontClient

client = ForefrontClient(api_key="<YOUR_API_KEY>")
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
import Forefront from "forefront"

const client = new Forefront("<YOUR API KEY>");
```

{% endtab %}
{% endtabs %}

### Create a pipeline

{% tabs %}
{% tab title="Python" %}

```python
pipeline = ff.pipelines.create("my-first-pipeline")
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
const pipeline = await client.pipelines.create("my-first-pipeline");
```

{% endtab %}
{% endtabs %}

### Get pipelines

{% tabs %}
{% tab title="Python" %}

```python
pipelines = ff.pipelines.list()

print(pipelines[0].id)
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
const pipelines = await client.pipelines.list();

console.log(pipelines[0].id)
```

{% endtab %}
{% endtabs %}

### Get pipeline by ID

{% tabs %}
{% tab title="Python" %}

```python
pipe = ff.pipelines.get_by_id("pipe_123")
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
const pipe = await client.pipelines.getById("pipe_123");
```

{% endtab %}
{% endtabs %}

### Add data to a pipeline

{% tabs %}
{% tab title="Python" %}

````python
# Assume the messages are the output of an LLM
messages = [
    {
        "role": "user", 
        "content": "Write a hello world in rust."
    },
    {
        "role": "assistant",
        "content": '```rust\nfn main() {\nprintln!("Hello, World!");\n}\n```',
    },
]

# Get the pipeline object if you haven't already
pipe = ff.pipelines.get_by_id("pipe_123")

# Add the data to your pipeline
# Optionally add a user_id, group_id, or key-value metadata to filter by later
pipe.add(
    messages=messages,
    user_id='user_123',
    group_id='group_a',
    metadata={
      "lang": "rust"
      }
)
````

{% endtab %}

{% tab title="Node.js" %}

````typescript
// Assume the messages are the output of an LLM
const messages = [
    {
        role: "user", 
        content: "Write a hello world for me in rust"
    },
    {
        role: "assistant",
        content: '```rust\nfn main() {\nprintln!("Hello, World!");\n}\n```',
    },
]

// Get the pipeline object if you haven't already
const pipe = await client.pipelines.getById("pipe_123");

// Add the data to your pipeline
// Optionally add a user_id, group_id, or key-value metadata to filter by later
await pipeline.add({
  messages: messages,
  userId: "user_123",
  groupId: "group_A",
  metadata: {
    lang: "rust",
  },
});
````

{% endtab %}
{% endtabs %}

### Filter pipeline data

{% tabs %}
{% tab title="Python" %}

```python
# Get a pipeline of samples created by "user1"
user_1_examples = pipe.filter_by_user_id("user1")

# Get a pipeline of samples created by "group1"
group_1_examples = pipe.filter_by_group_id("group1"

# Get a pipeline of samples tagged with specific metadata 
rust_examples = pipe.filter_by_metadata({"lang": "rust"})
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
// Get a pipeline of samples created by "user1"
let userSamples = pipe.filterByUserId("user1")

// Get a pipeline of samples created by "group1"
let groupSamples = pipe.filterByGroupId("group1")

// Get a pipeline of samples tagged with specific metadata 
let rustSamples = pipe.filterByMetadata({lang: "rust"})
```

{% endtab %}
{% endtabs %}

### Inspect pipeline data

{% tabs %}
{% tab title="Python" %}

<pre class="language-python"><code class="lang-python">'''
<strong>Returns an array of dataset samples that 
</strong>meets the filter criteria from the previous step
'''
data = await rust_examples.get_samples()
</code></pre>

{% endtab %}

{% tab title="Node.js" %}

```typescript
/*
Returns an array of dataset samples that 
meets the filter criteria from the previous step
*/
let rustSampleData = rustSamples.getSamples()
```

{% endtab %}
{% endtabs %}

### Create dataset from pipeline

To fine-tune a model from a pipeline, you will first need to convert it to a dataset.

{% tabs %}
{% tab title="Python" %}

```python
'''
Create a dataset called "my-rust-dataset using
the filtered pipeline object from the previous step
'''
rust_dataset = rust_examples.create_dataset_from_pipeline("my-rust-dataset")
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
/* 
Create a dataset called "my-rust-dataset using
the filltered pipeline object from the previous step
*/
const myRustdataset = await rustSamples.createDatasetFromPipline("my-rust-dataset");
```

{% endtab %}
{% endtabs %}

### Create a fine-tuned model and inference it

For completeness, here is an example of creating a fine-tuning job from a dataset and then inferencing the model once training is completed.

{% tabs %}
{% tab title="Python" %}

```python
# Create fine-tuned model
my_rust_llm = ff.fine_tunes.create(
     name="my-rust-llm", 
     base_model="mistralai/mistral-7b",
     training_dataset=rust_dataset.dataset_string,
     epochs=1,
     public=False
)

# Get the model string > "team-name/my-rust-llm"
model_string = my_rust_llm.model_string

# Inference the model
completion = ff.chat.completions.create(
    messages=[
        {
            "role":"system", 
            "content":"You are a helpful coding assistant"
        },
        {
            "role": "user", 
            "content": "Write the Fibonacci sequence"
        },
    ],
    model=model_string,
)
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
// Create fine-tuned model
const myRustLlm = await client.fineTunes.create({
   name: "my-rust-llm",
   baseModel: "mistralai/mistral-7b",
   trainingDataset: myRustDataset.datasetString,
   epochs: 1,
   isPublic: true,
 });
 
// Get the model string > "team-name/my-rust-llm"
const modelString = myRustLlm.modelString

// Inference the model
const completion = await client.chat.completions.create({
    model: modelString,
    messages: [
        {
             role:"system", 
             content:"You are a helpful coding assistant"
        },
        {
              role: "user",
              content: "Write the Fibonacci sequence"
        },
    ],
});
```

{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.forefront.ai/features/pipelines.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
