# Pipelines

Pipelines are a tool for storing LLM outputs for fine-tuning.

Fine-tuning a smaller model on the outputs of a larger model is a common strategy to optimize cost and performance while ensuring consistent and quality responses.

Using smaller, open-source models can also enable you to self-host due to less-restrictive hardware requirements, giving you ownership of your models and keeping data within your network.

## What is a pipeline?

A pipeline is a collection of LLM outputs that you can easily create, filter, and fine-tune on later.

There are a few steps in the pipeline lifecycle:

1. Create the pipeline
2. Add LLM outputs to the pipeline
3. Filter the pipeline to create a dataset&#x20;
4. Fine-tune a model on the dataset
5. Collect more LLM outputs, add to the pipeline, and repeat

Samples added to a pipeline can be tagged with a user\_id, group\_id, or custom metadata. You can filter pipelines to view segments of your data and create training datasets from those segments.&#x20;

## Getting started

Currently pipelines are only supported through the Forefront Python and Typescript SDK. Below is a walkthrough of how to get started:

### Install the package

The Typescript SDK can be used in Node.js and serverless environments (including Cloudflare workers).

{% tabs %}
{% tab title="Python" %}

```
pip install forefront
```

{% endtab %}

{% tab title="Node.js" %}

```
npm i forefront
```

{% endtab %}
{% endtabs %}

### Initialize the Forefront client

{% tabs %}
{% tab title="Python" %}

```python
from forefront import ForefrontClient

client = ForefrontClient(api_key="<YOUR_API_KEY>")
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
import Forefront from "forefront"

const client = new Forefront("<YOUR API KEY>");
```

{% endtab %}
{% endtabs %}

### Create a pipeline

{% tabs %}
{% tab title="Python" %}

```python
pipeline = ff.pipelines.create("my-first-pipeline")
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
const pipeline = await client.pipelines.create("my-first-pipeline");
```

{% endtab %}
{% endtabs %}

### Get pipelines

{% tabs %}
{% tab title="Python" %}

```python
pipelines = ff.pipelines.list()

print(pipelines[0].id)
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
const pipelines = await client.pipelines.list();

console.log(pipelines[0].id)
```

{% endtab %}
{% endtabs %}

### Get pipeline by ID

{% tabs %}
{% tab title="Python" %}

```python
pipe = ff.pipelines.get_by_id("pipe_123")
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
const pipe = await client.pipelines.getById("pipe_123");
```

{% endtab %}
{% endtabs %}

### Add data to a pipeline

{% tabs %}
{% tab title="Python" %}

````python
# Assume the messages are the output of an LLM
messages = [
    {
        "role": "user", 
        "content": "Write a hello world in rust."
    },
    {
        "role": "assistant",
        "content": '```rust\nfn main() {\nprintln!("Hello, World!");\n}\n```',
    },
]

# Get the pipeline object if you haven't already
pipe = ff.pipelines.get_by_id("pipe_123")

# Add the data to your pipeline
# Optionally add a user_id, group_id, or key-value metadata to filter by later
pipe.add(
    messages=messages,
    user_id='user_123',
    group_id='group_a',
    metadata={
      "lang": "rust"
      }
)
````

{% endtab %}

{% tab title="Node.js" %}

````typescript
// Assume the messages are the output of an LLM
const messages = [
    {
        role: "user", 
        content: "Write a hello world for me in rust"
    },
    {
        role: "assistant",
        content: '```rust\nfn main() {\nprintln!("Hello, World!");\n}\n```',
    },
]

// Get the pipeline object if you haven't already
const pipe = await client.pipelines.getById("pipe_123");

// Add the data to your pipeline
// Optionally add a user_id, group_id, or key-value metadata to filter by later
await pipeline.add({
  messages: messages,
  userId: "user_123",
  groupId: "group_A",
  metadata: {
    lang: "rust",
  },
});
````

{% endtab %}
{% endtabs %}

### Filter pipeline data

{% tabs %}
{% tab title="Python" %}

```python
# Get a pipeline of samples created by "user1"
user_1_examples = pipe.filter_by_user_id("user1")

# Get a pipeline of samples created by "group1"
group_1_examples = pipe.filter_by_group_id("group1"

# Get a pipeline of samples tagged with specific metadata 
rust_examples = pipe.filter_by_metadata({"lang": "rust"})
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
// Get a pipeline of samples created by "user1"
let userSamples = pipe.filterByUserId("user1")

// Get a pipeline of samples created by "group1"
let groupSamples = pipe.filterByGroupId("group1")

// Get a pipeline of samples tagged with specific metadata 
let rustSamples = pipe.filterByMetadata({lang: "rust"})
```

{% endtab %}
{% endtabs %}

### Inspect pipeline data

{% tabs %}
{% tab title="Python" %}

<pre class="language-python"><code class="lang-python">'''
<strong>Returns an array of dataset samples that 
</strong>meets the filter criteria from the previous step
'''
data = await rust_examples.get_samples()
</code></pre>

{% endtab %}

{% tab title="Node.js" %}

```typescript
/*
Returns an array of dataset samples that 
meets the filter criteria from the previous step
*/
let rustSampleData = rustSamples.getSamples()
```

{% endtab %}
{% endtabs %}

### Create dataset from pipeline

To fine-tune a model from a pipeline, you will first need to convert it to a dataset.

{% tabs %}
{% tab title="Python" %}

```python
'''
Create a dataset called "my-rust-dataset using
the filtered pipeline object from the previous step
'''
rust_dataset = rust_examples.create_dataset_from_pipeline("my-rust-dataset")
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
/* 
Create a dataset called "my-rust-dataset using
the filltered pipeline object from the previous step
*/
const myRustdataset = await rustSamples.createDatasetFromPipline("my-rust-dataset");
```

{% endtab %}
{% endtabs %}

### Create a fine-tuned model and inference it

For completeness, here is an example of creating a fine-tuning job from a dataset and then inferencing the model once training is completed.

{% tabs %}
{% tab title="Python" %}

```python
# Create fine-tuned model
my_rust_llm = ff.fine_tunes.create(
     name="my-rust-llm", 
     base_model="mistralai/mistral-7b",
     training_dataset=rust_dataset.dataset_string,
     epochs=1,
     public=False
)

# Get the model string > "team-name/my-rust-llm"
model_string = my_rust_llm.model_string

# Inference the model
completion = ff.chat.completions.create(
    messages=[
        {
            "role":"system", 
            "content":"You are a helpful coding assistant"
        },
        {
            "role": "user", 
            "content": "Write the Fibonacci sequence"
        },
    ],
    model=model_string,
)
```

{% endtab %}

{% tab title="Node.js" %}

```typescript
// Create fine-tuned model
const myRustLlm = await client.fineTunes.create({
   name: "my-rust-llm",
   baseModel: "mistralai/mistral-7b",
   trainingDataset: myRustDataset.datasetString,
   epochs: 1,
   isPublic: true,
 });
 
// Get the model string > "team-name/my-rust-llm"
const modelString = myRustLlm.modelString

// Inference the model
const completion = await client.chat.completions.create({
    model: modelString,
    messages: [
        {
             role:"system", 
             content:"You are a helpful coding assistant"
        },
        {
              role: "user",
              content: "Write the Fibonacci sequence"
        },
    ],
});
```

{% endtab %}
{% endtabs %}
