Running AI Models

Run AI models in Edge Functions using the built-in Supabase AI API.

Edge Functions have a built-in API for running AI models. You can use this API to generate embeddings, build conversational workflows, and do other AI related tasks in your Edge Functions.

This allows you to:

Generate text embeddings without external dependencies
Run Large Language Models via Ollama or Llamafile
Build conversational AI workflows

Setup

There are no external dependencies or packages to install to enable the API.

Create a new inference session:

1
const model = new Supabase.ai.Session('model-name')

To get type hints and checks for the API, import types from functions-js:

1
import 'jsr:@supabase/functions-js/edge-runtime.d.ts'

Running a model inference

Once the session is instantiated, you can call it with inputs to perform inferences:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// For embeddings (gte-small model)const embeddings = await model.run('Hello world', {  mean_pool: true,  normalize: true,})// For text generation (non-streaming)const response = await model.run('Write a haiku about coding', {  stream: false,  timeout: 30,})// For streaming responsesconst stream = await model.run('Tell me a story', {  stream: true,  mode: 'ollama',})

Generate text embeddings

Generate text embeddings using the built-in gte-small model:

gte-small model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens. While you can provide inputs longer than 512 tokens, truncation may affect the accuracy.

1
2
3
4
5
6
7
8
9
10
11
12
13
const model = new Supabase.ai.Session('gte-small')Deno.serve(async (req: Request) => {  const params = new URL(req.url).searchParams  const input = params.get('input')  const output = await model.run(input, { mean_pool: true, normalize: true })  return new Response(JSON.stringify(output), {    headers: {      'Content-Type': 'application/json',      Connection: 'keep-alive',    },  })})

Using Large Language Models (LLM)

Inference via larger models is supported via Ollama and Mozilla Llamafile. In the first iteration, you can use it with a self-managed Ollama or Llamafile server.

We are progressively rolling out support for the hosted solution. To sign up for early access, fill out this form.

Running locally

Install Ollama

Install Ollama and pull the Mistral model

1
ollama pull mistral

Run the Ollama server

1
ollama serve

Set the function secret

Set a function secret called AI_INFERENCE_API_HOST to point to the Ollama server

1
echo "AI_INFERENCE_API_HOST=http://host.docker.internal:11434" >> supabase/functions/.env

Create a new function

1
supabase functions new ollama-test

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import 'jsr:@supabase/functions-js/edge-runtime.d.ts'const session = new Supabase.ai.Session('mistral')Deno.serve(async (req: Request) => {  const params = new URL(req.url).searchParams  const prompt = params.get('prompt') ?? ''  // Get the output as a stream  const output = await session.run(prompt, { stream: true })  const headers = new Headers({    'Content-Type': 'text/event-stream',    Connection: 'keep-alive',  })  // Create a stream  const stream = new ReadableStream({    async start(controller) {      const encoder = new TextEncoder()      try {        for await (const chunk of output) {          controller.enqueue(encoder.encode(chunk.response ?? ''))        }      } catch (err) {        console.error('Stream error:', err)      } finally {        controller.close()      }    },  })  // Return the stream to the user  return new Response(stream, {    headers,  })})

Serve the function

1
supabase functions serve --env-file supabase/functions/.env

Execute the function

1
2
3
curl --get "http://localhost:54321/functions/v1/ollama-test" \--data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \-H "Authorization: $ANON_KEY"

Deploying to production

Once the function is working locally, it's time to deploy to production.

Deploy an Ollama or Llamafile server

Deploy an Ollama or Llamafile server and set a function secret called AI_INFERENCE_API_HOST to point to the deployed server:

1
supabase secrets set AI_INFERENCE_API_HOST=https://path-to-your-llm-server/

Deploy the function

1
supabase functions deploy

Execute the function

1
2
3
curl --get "https://project-ref.supabase.co/functions/v1/ollama-test" \--data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \-H "Authorization: $ANON_KEY"

As demonstrated in the video above, running Ollama locally is typically slower than running it in on a server with dedicated GPUs. We are collaborating with the Ollama team to improve local performance.

In the future, a hosted LLM API, will be provided as part of the Supabase platform. Supabase will scale and manage the API and GPUs for you. To sign up for early access, fill up this form.

Running AI Models

Run AI models in Edge Functions using the built-in Supabase AI API.

Setup#

Running a model inference#

Generate text embeddings#

Using Large Language Models (LLM)#

Running locally#

Install Ollama

Run the Ollama server

Set the function secret

Create a new function

Serve the function

Execute the function

Deploying to production#

Deploy an Ollama or Llamafile server

Deploy the function

Execute the function

Is this helpful?

Setup

Running a model inference

Generate text embeddings

Using Large Language Models (LLM)

Running locally

Deploying to production