The product we are trying to enable
00

A user should install an agent from a marketplace and run it locally.

That agent may be a Pydantic AI script, local artifacts, tools, and one or more neural networks: LLM, STT, TTS, embeddings, vision, or something new.

Desired user story

agent-first
Marketplace

Install Agent A

App fetches package, metadata, scripts, and artifacts.

Launcher

Launch agent

User thinks in terms of tasks: code review, podcast production, voice assistant.

Runtime

Apollo satisfies resources

Loads required models locally or maps to remote provider routes.

Interface

Chat / AG-UI

The most likely interaction surface is chat, but it is not the whole runtime.

Transparency

Resource view

User can always see what is loaded, why, and what can be unloaded.

Users launch agents. Agents declare resources. Apollo manages those resources.

Where we are now
01

Today, Ravnar sits behind Apollo’s /api and connects chat to agents.

This is a useful bridge: React keeps speaking a stable Apollo-owned API, while Ravnar provides thread storage, run endpoints, and AG-UI streaming.

Current Ravnar-centered path

works today
UI

React chat

Calls Apollo /api for config, threads, messages, and runs.

Go backend

Reverse proxy

Apollo starts Ravnar as a child process and forwards /api traffic to it.

Ravnar

Threads + run

Ravnar stores chat state, exposes the run endpoint, and streams AG-UI events back to the UI.

Generated config

Agents from models

Apollo generates Ravnar agent entries from known/running model endpoints and can hot-reload that config.

Agent wrapper

Pydantic AI in Ravnar

Each generated agent is usually a generic Pydantic AI wrapper inside Ravnar’s Python environment.

Model endpoint

Direct provider URL

The wrapper calls an OpenAI-compatible base URL directly, such as local llama.cpp or a hub/provider endpoint.

What works well

The UI can chat through a stable Apollo /api, while Ravnar handles thread storage and AG-UI streaming.

How models connect

Models are translated into Ravnar agent config: model ID/name, base URL, provider, and sometimes API key are passed into the agent wrapper.

Why this is limiting

Ravnar sees runtime/provider details, and a model becoming available often means regenerating Ravnar agents rather than updating an Apollo-owned route.

Work already underway

model-manager MR
Models page

Download curated GGUF

Embedded catalog points at Hugging Face files.

Pixi

Provision llama.cpp

Stages a Pixi project with llama-server.

Process

One server per model

Starts llama-server --model file.gguf --port N.

Ravnar reload

Model becomes agent

Running model is registered as a generic chat agent.

Direct call

Ravnar → model port

No Apollo model route sits in between yet.

Proposed boundary for marketplace agents

agent-first
Agent package

App-managed artifacts

Manifest declares scripts, tools, and neural requirements.

Pixi

Isolated capability env

Runs agent with its own dependencies.

apollo-agent

Shared AG-UI host

Author writes agent logic, not an HTTP server.

Agent Gateway

Stable Ravnar target

Ravnar calls Apollo, not random Pixi ports.

Model Gateway

Stable model routes

Agents call route IDs, not raw ports or provider keys.

Runtime manager

Load/unload + resources

Tracks leases, sharing, RAM/VRAM, logs, and policies.

Implicit assumptions
02

Our current chat bridge assumes relatively static agents.

These assumptions were fine for generic chat, but each gets stressed by marketplace agents that are installed, launched, updated, and stopped dynamically.

Assumption: agents are configured ahead of time

The chat backend starts from generated agent config. Hot reload helps, but dynamic install/launch/upgrade/remove still feels like external orchestration.

Assumption: agent code can live near chat

Built-in agents can share a known runtime environment. Marketplace agents should not, because dependency conflicts and isolation matter.

Assumption: model endpoints are stable URLs

Local desktop models may be unloaded, loading, failed, restarted on another port, or shared by another agent.

Assumption: model and chat agent are often 1:1

Marketplace agents may use one LLM plus STT, TTS, embeddings, tools, and custom artifacts.

The current bridge is strongest when the world looks like “chat thread → configured agent → model URL”. Marketplace agents look more like “package → isolated runtime → many resources → chat surface”.

Emerging design pressures
03

Marketplace agents make runtime management part of the product.

A good marketplace experience needs to be easy for authors and trustworthy for users.

Authors ship logic

They should write Pydantic AI or workflow code, not an AG-UI server.

Agents declare resources

Named requirements like primary, stt, tts.

Apollo satisfies them

Download, route, load, unload, share, and monitor local/remote backends.

Users stay in control

Opaque loading is acceptable only if resource usage and unload controls are always visible.

The new control problem

not just chat
Nebi

Install package

Versioned scripts/artifacts.

Pixi

Run env

Dependency-isolated execution.

Apollo

Control plane

Agents, routes, leases, runtime policy.

Ravnar

Chat adapter

Handles threads and AG-UI stream.

Local machine

Resources

RAM, VRAM, CPU/GPU, disk, ports, logs.

Boundary and options
04

Keep the chat bridge thin while Apollo owns marketplace runtime.

The key choice is whether chat stays separate from the control plane for agents, models, credentials, and local resources.

Keep in the chat bridge
  • Thread/message persistence
  • Chat run endpoint
  • AG-UI stream to frontend
  • Dispatch to Agent Gateway
Move into Apollo
  • Marketplace package execution
  • Pixi environments
  • Model/runtime load and unload
  • Provider credentials and resource graph

Thin-adapter shape

separate control plane
UI

React

Stable Apollo /api.

Chat

Ravnar

Threads and AG-UI stream.

Apollo

Agent Gateway

Stable endpoint for capability agents.

Pixi

Agent runtime

Runs script in isolated env.

Apollo

Model Gateway

Stable resource routes.

Backends

Local/remote models

Loaded and shared by runtime manager.

Concrete example
05

Shipping Agent A and Agent B should not require two custom servers.

Both agents can be Pydantic AI-based, but they may have different dependencies and different neural resources. Apollo should host the transport; the author should provide the logic.

Agent A: Code Reviewer
  • Pydantic AI script
  • Repo/file tools
  • Primary LLM
  • Optional embedding model
Agent B: Voice Podcast Agent
  • Pydantic AI workflow
  • LLM
  • Speech-to-text
  • Text-to-speech
  • Audio tools / ffmpeg

Runtime consequence

RAM0 GB
VRAM0 GB
Processes0

              
Proposed design
06

The target is a layered control plane, not one giant runtime.

Each layer has a narrow job. The boundaries are what let Ravnar handle chat while Apollo manages marketplace agents, runtimes, and resources.

Nebi

Installs marketplace packages, artifacts, versions, signatures, metadata.

Pixi

Runs each capability/agent with isolated dependencies.

apollo-agent

Shared runtime that exposes health + AG-UI and imports the Pydantic AI script.

Agent Gateway

Stable target for Ravnar. Resolves agent ID to current runtime endpoint.

Model Gateway

Stable neural route layer. Owns credentials, route IDs, rewriting, access.

Runtime manager

Loads/unloads local models, tracks leases, logs, resource budgets.

Ravnar

Chat adapter for threads/messages/AG-UI, behind Apollo /api.

Resource UI

Shows what is loaded, why, how much it costs, and what can be stopped.

Future work
07

We can evolve without a big-bang rewrite.

The safest path keeps existing chat working while adding clear Apollo-owned boundaries for agent execution, model routing, and runtime resources.

Done

Ravnar hot reload

Main already has the ability to regenerate Ravnar agents and update the frontend without restarting Desktop.

Next

Model Gateway

Put an Apollo-owned route layer between Ravnar/agents and model providers. Stop passing raw provider URLs/keys where possible.

Then

Runtime manager behind routes

Adapt model-manager pieces so capability model requirements create routes and runtime leases, not generic chat agents.

Then

Agent Gateway + apollo-agent

Run marketplace Pydantic AI scripts in their Pixi environments while Ravnar talks to one stable Agent Gateway endpoint.