No description
  • Rust 70.9%
  • Shell 14.3%
  • HTML 13.6%
  • Makefile 1.2%
Find a file
thabeta 9ec769d881
Some checks failed
Build macOS / build-macos (macos-amd64, x86_64-apple-darwin) (push) Waiting to run
Build macOS / build-macos (macos-arm64, aarch64-apple-darwin) (push) Waiting to run
Build and Test / build-and-test (push) Successful in 1m55s
Build Linux / build-linux (linux-amd64, false, x86_64-unknown-linux-musl) (push) Failing after 2m8s
Build Linux / build-linux (linux-arm64, true, aarch64-unknown-linux-gnu) (push) Failing after 2m55s
Merge pull request 'fix linux build workflow' (#11) from development_linux_workflow into development
Reviewed-on: #11
2026-02-25 11:16:58 +00:00
.forgejo/workflows fix linux build workflow 2026-02-25 13:11:02 +02:00
crates Merge branch 'development' of https://forge.ourworld.tf/geomind_code/aibroker into development_fix_workflows 2026-02-25 12:52:23 +02:00
docs Remove dotenvy dependency and apply env_secrets standard 2026-02-24 01:19:06 +01:00
scripts fix linux build workflow 2026-02-25 13:11:02 +02:00
.env.example Remove dotenvy dependency and apply env_secrets standard 2026-02-24 01:19:06 +01:00
.gitignore Add apikeys.db to .gitignore 2026-02-17 05:17:14 +04:00
buildenv.sh fix linux build workflow 2026-02-25 13:11:02 +02:00
Cargo.toml fix linux build workflow 2026-02-25 13:11:02 +02:00
Makefile Remove dotenvy dependency and apply env_secrets standard 2026-02-24 01:19:06 +01:00
mcp_servers.example.json chore:improving the README.md and the mcp_servers.example 2025-12-25 18:46:24 +02:00
mcp_servers.json Add MCP servers for search and scraping APIs 2025-12-25 18:33:36 +02:00
modelsconfig.yml logging all usages per IPs 2026-02-08 18:51:51 +02:00
openrpc.json Cherry-pick features from development_mik/wasmos branches 2026-02-24 00:41:04 +01:00
README.md Remove dotenvy dependency and apply env_secrets standard 2026-02-24 01:19:06 +01:00

AI Broker

A lightweight LLM request broker with an OpenAI-compatible API that intelligently routes requests to multiple LLM providers with cost-aware strategies.

Features

  • OpenAI-Compatible API - Drop-in replacement for OpenAI clients
  • Multi-Provider Support - OpenAI, OpenRouter, Groq, SambaNova
  • Smart Routing - Automatic model selection based on cost or quality
  • Cost Tracking - Per-request cost calculation and tracking
  • Request Tracking - Detailed per-IP request tracking with timestamps and durations
  • Streaming Support - Real-time streaming responses via SSE
  • MCP Broker - Aggregate tools from multiple MCP (Model Context Protocol) servers
  • Rate Limiting - Per-IP rate limiting with configurable limits
  • Audio APIs - Text-to-speech and speech-to-text support (Groq, SambaNova, OpenAI)
  • Config-Based Audio Models - STT/TTS models defined in modelsconfig.yml with automatic fallback
  • Embeddings - Vector embedding generation
  • 37 Chat Models - Latest Claude 4.x, Gemini 3, GPT-5.2, o3-mini, Grok 4.1, Kimi K2.5
  • 2 Audio Models - Whisper STT (Groq/SambaNova/OpenAI), OpenAI TTS
  • Persistent Billing - SQLite-based request logging for billing and analytics
  • API Key Support - Optional API key authentication system

Project Structure (HERO Architecture)

aibroker/
├── crates/
│   ├── hero_aibroker_sdk/      # SDK library - shared types, protocols, RPC client
│   ├── llmbroker/              # Server library & binary (hero_aibroker_server)
│   ├── llmbroker_cli/          # CLI client (hero_aibroker binary)
│   ├── hero_aibroker_ui/       # Admin UI - Axum web dashboard
│   ├── hero_aibroker_rhai/     # Rhai scripting bindings
│   ├── mcp-common/             # Shared MCP utilities
│   ├── mcp-ping/               # MCP ping test server
│   ├── mcp-serpapi/            # SerpAPI search MCP server
│   ├── mcp-serper/             # Serper search MCP server
│   ├── mcp-exa/                # Exa search MCP server
│   ├── mcp-scraperapi/         # ScraperAPI MCP server
│   └── mcp-scrapfly/           # Scrapfly MCP server
├── modelsconfig.yml            # Model definitions and pricing
└── mcp_servers.json            # MCP server configuration

Dependency Graph

hero_aibroker_sdk (no internal dependencies)
    ↑         ↑          ↑           ↑
    |         |          |           |
 server     CLI         UI        rhai

Architecture Follows HERO Crate Standards:

  • SDK: pure library with types, protocols, and client
  • Server: binary exposing JSON-RPC/OpenAI API on port 8080
  • CLI: command-line client for interactive use
  • UI: admin dashboard (separate binary)
  • Rhai: scripting integration for automation

Quick Start

Prerequisites

  • Rust 1.70 or later
  • At least one LLM provider API key

Environment Variables

This project requires API keys to be set as environment variables. Source your env file before running:

source ~/.config/env.sh   # or wherever you keep your secrets

Required variables (at least one provider):

  • GROQ_API_KEY — Groq API key (get one)
  • OPENROUTER_API_KEY — OpenRouter API key (get one)

Optional variables:

  • SAMBANOVA_API_KEY — SambaNova API key (get one)
  • OPENAI_API_KEY — OpenAI API key

See .env.example for the full list of supported variables.

Run

source ~/.config/env.sh
make run

The server will start on http://127.0.0.1:3385 by default.

Configuration

All configuration is via environment variables (no .env files loaded by the application):

Variable Default Description
HOST 127.0.0.1 Server bind address
PORT 3385 Server port
ROUTING_STRATEGY cheapest cheapest or best
MCP_CONFIG_PATH Path to MCP server config JSON
ADMIN_TOKEN Simple admin auth token
HERO_SECRET Hero Auth JWT secret

Both singular (GROQ_API_KEY) and plural (GROQ_API_KEYS) env var names are accepted. Use comma-separated values for multiple keys per provider.

Multiple API Keys

The broker supports multiple API keys per provider for load distribution, higher rate limits, and automatic failover. When multiple keys are configured, the broker creates separate provider instances (e.g., openai-0, openai-1) and distributes requests across them.

API Reference

Chat Completions

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt4o",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Text-to-Speech

Generate speech from text using OpenAI TTS models:

curl http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello, world!",
    "voice": "alloy"
  }' \
  --output speech.mp3

Available TTS Models:

  • tts-1 - Standard quality (OpenAI only)
  • tts-1-hd - High definition (OpenAI only)

Note: TTS requires OPENAI_API_KEY set in your environment

Speech-to-Text

Transcribe audio using Whisper models with automatic provider fallback:

curl http://localhost:8080/v1/audio/transcriptions \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

Available STT Models:

  • whisper-1 - Standard Whisper model with multi-provider support
    • Priority 1: Groq (whisper-large-v3) - $0.111/hr
    • Priority 2: SambaNova (whisper-large-v3) - FREE
    • Priority 3: OpenAI (whisper-1) - $0.006/min
  • whisper-large-v3 - Direct access to Whisper Large v3
    • Priority 1: Groq - $0.111/hr
    • Priority 2: SambaNova - FREE

The system automatically tries providers in priority order (cheapest first) with fallback support.

Embeddings

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "Hello, world!"
  }'

List Models

curl http://localhost:8080/v1/models

Metrics

# Basic metrics
curl http://localhost:8080/metrics

# Detailed metrics with per-IP tracking
curl http://localhost:8080/metrics/detailed

The detailed metrics endpoint provides:

  • Total requests and errors per IP
  • First and last request timestamps
  • Currently active (in-flight) requests
  • Recent request history (last 10 per IP) including:
    • Request start and finish timestamps
    • Model used
    • Request duration
    • Success/error status

Billing & Usage

# View all IP usage and costs
curl http://localhost:8080/billing/usage

# View specific IP usage
curl http://localhost:8080/billing/usage/127.0.0.1

All requests are persisted to SQLite (requests.db) with:

  • IP address and model used
  • Token usage (input/output)
  • Costs in USD (calculated per-request)
  • Timestamps and duration
  • Success/error status

Export billing data:

# Export to CSV
sqlite3 -header -csv requests.db "SELECT * FROM request_logs;" > billing.csv

# Query specific IP
sqlite3 requests.db "SELECT * FROM request_logs WHERE ip='X.X.X.X';"

MCP Tools

# List all available tools
curl http://localhost:8080/mcp/tools

# Call a specific tool
curl http://localhost:8080/mcp/tools/search \
  -H "Content-Type: application/json" \
  -d '{"query": "rust programming"}'

Client Examples

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"  # Key is configured on the server
)

response = client.chat.completions.create(
    model="gpt4o",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Streaming

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)

stream = client.chat.completions.create(
    model="gpt4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Running Services

Start the Server

# Release build (production) - TCP mode (default)
make run

# Debug build with logging
make rundev

# Or manually on TCP
cargo run --release --bin hero_aibroker_server -- --port 8080

# Or with Unix socket (creates ~/hero/var/sockets/hero_aibroker_server.sock)
SOCKET_PATH=~/hero/var/sockets/hero_aibroker_server.sock \
  cargo run --release --bin hero_aibroker_server

Server Binding Modes:

  • TCP (default): http://127.0.0.1:8080 - use when you want HTTP access
  • Unix Socket: ~/hero/var/sockets/hero_aibroker_server.sock - use for local-only access via socket

Set SOCKET_PATH environment variable to enable Unix socket mode.

Exposed Endpoints:

  • OpenAI-compatible API endpoints (/v1/*)
  • JSON-RPC admin interface (/rpc)
  • Health check (/health)
  • Metrics (/metrics)
  • OpenRPC specification (/openrpc.json)

Start the Admin UI

# Terminal 1: Start the server
cargo run --release --bin hero_aibroker_server

# Terminal 2: Start the UI (connects to server via HTTP)
BROKER_URL=http://localhost:8080 cargo run --release --bin hero_aibroker_ui

The admin UI will be available at http://127.0.0.1:3000 and provides:

  • Chat interface
  • Model management
  • MCP tool integration
  • Request/usage metrics
  • Real-time logs

CLI Usage

# Interactive chat
cargo run --bin hero_aibroker -- chat --model gpt4o

# With global model option
cargo run --bin hero_aibroker -- --model deepseek-chat chat

# With custom retry attempts
cargo run --bin hero_aibroker -- --max-retries 5 chat

# List available models
cargo run --bin hero_aibroker -- models

# List MCP tools
cargo run --bin hero_aibroker -- tools

# Check server health
cargo run --bin hero_aibroker -- health

# Or use the make target
make cli

CLI Options

Global Options:

  • -u, --url <URL> - LLM Broker server URL (default: http://localhost:8080)
  • -m, --model <MODEL> - Model to use (can be overridden per command)
  • --max-retries <MAX_RETRIES> - Maximum number of retries on failure (default: 3)

Chat Command:

  • -m, --model <MODEL> - Model to use (overrides global --model)
  • -s, --stream - Enable streaming (default: true)

The CLI automatically retries failed requests with exponential backoff, making it resilient to temporary network issues or server errors.

Model Configuration

Models are configured in modelsconfig.yml:

models:
  gpt4o:
    display_name: "GPT-4o"
    tier: premium
    capabilities:
      - tool_calling
      - vision
    context_window: 128000
    backends:
      - provider: openai
        model_id: gpt-4o
        priority: 1
        input_cost: 2.5   # per million tokens
        output_cost: 10.0

Auto Model Selection

Use special model names for automatic selection:

Model Name Description
auto Use the configured routing strategy
autocheapest Select the cheapest available model
autobest Select the best premium model

MCP Integration

The broker can aggregate tools from multiple MCP (Model Context Protocol) servers. Configure servers in mcp_servers.json:

{
  "mcpServers": [
    {
      "name": "search",
      "command": "cargo",
      "args": ["run", "--bin", "mcp-serper"],
      "transport": "stdio"
    },
    {
      "name": "scraper",
      "url": "http://localhost:3001/sse",
      "transport": "sse"
    }
  ]
}

MCP Endpoints

Endpoint Description
GET /mcp/tools List all aggregated tools
POST /mcp/tools/:name Call a specific tool
GET /mcp/sse SSE endpoint for MCP clients

Included MCP Servers

  • mcp-serper - Web search via Serper API
  • mcp-serpapi - Web search via SerpAPI
  • mcp-exa - Semantic search via Exa
  • mcp-scraperapi - Web scraping via ScraperAPI
  • mcp-scrapfly - Web scraping via Scrapfly
  • mcp-ping - Simple ping server for testing

Architecture

┌─────────────────────────────────────────────────────┐
│                    API Layer                         │
│  (OpenAI-compatible endpoints: chat, tts, stt, etc) │
└─────────────────────────────────────────────────────┘
                          │
┌─────────────────────────────────────────────────────┐
│                  Service Layer                       │
│  (Routing logic, model selection, cost calculation) │
└─────────────────────────────────────────────────────┘
                          │
┌─────────────────────────────────────────────────────┐
│                 Registry Layer                       │
│  (Model catalog, backend resolution, pricing)       │
└─────────────────────────────────────────────────────┘
                          │
┌─────────────────────────────────────────────────────┐
│                 Provider Layer                       │
│  (OpenAI, Groq, SambaNova, OpenRouter adapters)     │
└─────────────────────────────────────────────────────┘

Development

Building

# Debug build
cargo build

# Release build
cargo build --release

# Build specific crate
cargo build -p llmbroker
cargo build -p llmbroker_cli

Running Tests

# Run all tests
cargo test

# Run tests for specific crate
cargo test -p llmbroker

Running Individual MCP Servers

# Run the Serper search server
SERPER_API_KEY=your-key cargo run --bin mcp-serper

# Run the ping test server
cargo run --bin mcp-ping

Documentation

Comprehensive documentation is available in the docs/ directory:

Document Description
Architecture System architecture and design principles
Technical Specs Requirements and specifications
Component Design Detailed component documentation
API Reference Complete API documentation
MCP Integration MCP tool integration guide
Data Flow Request/response data flows
Deployment Guide Production deployment guide

License

MIT License