No description

Rust 70.8%
Shell 14.3%
HTML 13.7%
Makefile 1.2%

Find a file

thabeta 2f05c8ad3d Some checks failed Build macOS / build-macos (macos-amd64, x86_64-apple-darwin) (push) Has been cancelled Details Build macOS / build-macos (macos-arm64, aarch64-apple-darwin) (push) Has been cancelled Details Build and Test / build-and-test (push) Successful in 1m58s Details Build Linux / build-linux (linux-amd64, false, x86_64-unknown-linux-musl) (push) Failing after 2m6s Details Build Linux / build-linux (linux-arm64, true, aarch64-unknown-linux-gnu) (push) Failing after 3m11s Details Merge pull request 'Wire up admin UI, edition 2024, CI fixes' (#16 ) from development_admin_ui into development Reviewed-on: #16 Reviewed-by: thabeta <thabeta@incubaid.com>		2026-03-06 17:53:48 +00:00
.forgejo/workflows	fix linux build workflow	2026-02-25 13:11:02 +02:00
crates	fix: format let-chain expressions for edition 2024	2026-03-06 10:10:34 -05:00
docs	Remove dotenvy dependency and apply env_secrets standard	2026-02-24 01:19:06 +01:00
scripts	fix linux build workflow	2026-02-25 13:11:02 +02:00
.env.example	Remove dotenvy dependency and apply env_secrets standard	2026-02-24 01:19:06 +01:00
.gitignore	Add apikeys.db to .gitignore	2026-02-17 05:17:14 +04:00
buildenv.sh	feat: wire up admin UI, hero:theme sync, edition 2024	2026-03-06 10:01:36 -05:00
Cargo.toml	feat: wire up admin UI, hero:theme sync, edition 2024	2026-03-06 10:01:36 -05:00
Makefile	Remove dotenvy dependency and apply env_secrets standard	2026-02-24 01:19:06 +01:00
mcp_servers.example.json	chore:improving the README.md and the mcp_servers.example	2025-12-25 18:46:24 +02:00
mcp_servers.json	Add MCP servers for search and scraping APIs	2025-12-25 18:33:36 +02:00
modelsconfig.yml	logging all usages per IPs	2026-02-08 18:51:51 +02:00
openrpc.json	Cherry-pick features from development_mik/wasmos branches	2026-02-24 00:41:04 +01:00
README.md	Remove dotenvy dependency and apply env_secrets standard	2026-02-24 01:19:06 +01:00

README.md

AI Broker

A lightweight LLM request broker with an OpenAI-compatible API that intelligently routes requests to multiple LLM providers with cost-aware strategies.

Features

OpenAI-Compatible API - Drop-in replacement for OpenAI clients
Multi-Provider Support - OpenAI, OpenRouter, Groq, SambaNova
Smart Routing - Automatic model selection based on cost or quality
Cost Tracking - Per-request cost calculation and tracking
Request Tracking - Detailed per-IP request tracking with timestamps and durations
Streaming Support - Real-time streaming responses via SSE
MCP Broker - Aggregate tools from multiple MCP (Model Context Protocol) servers
Rate Limiting - Per-IP rate limiting with configurable limits
Audio APIs - Text-to-speech and speech-to-text support (Groq, SambaNova, OpenAI)
Config-Based Audio Models - STT/TTS models defined in modelsconfig.yml with automatic fallback
Embeddings - Vector embedding generation
37 Chat Models - Latest Claude 4.x, Gemini 3, GPT-5.2, o3-mini, Grok 4.1, Kimi K2.5
2 Audio Models - Whisper STT (Groq/SambaNova/OpenAI), OpenAI TTS
Persistent Billing - SQLite-based request logging for billing and analytics
API Key Support - Optional API key authentication system

Project Structure (HERO Architecture)

aibroker/
├── crates/
│   ├── hero_aibroker_sdk/      # SDK library - shared types, protocols, RPC client
│   ├── llmbroker/              # Server library & binary (hero_aibroker_server)
│   ├── llmbroker_cli/          # CLI client (hero_aibroker binary)
│   ├── hero_aibroker_ui/       # Admin UI - Axum web dashboard
│   ├── hero_aibroker_rhai/     # Rhai scripting bindings
│   ├── mcp-common/             # Shared MCP utilities
│   ├── mcp-ping/               # MCP ping test server
│   ├── mcp-serpapi/            # SerpAPI search MCP server
│   ├── mcp-serper/             # Serper search MCP server
│   ├── mcp-exa/                # Exa search MCP server
│   ├── mcp-scraperapi/         # ScraperAPI MCP server
│   └── mcp-scrapfly/           # Scrapfly MCP server
├── modelsconfig.yml            # Model definitions and pricing
└── mcp_servers.json            # MCP server configuration

Dependency Graph

hero_aibroker_sdk (no internal dependencies)
    ↑         ↑          ↑           ↑
    |         |          |           |
 server     CLI         UI        rhai

Architecture Follows HERO Crate Standards:

SDK: pure library with types, protocols, and client
Server: binary exposing JSON-RPC/OpenAI API on port 8080
CLI: command-line client for interactive use
UI: admin dashboard (separate binary)
Rhai: scripting integration for automation

Quick Start

Prerequisites

Rust 1.70 or later
At least one LLM provider API key

Environment Variables

This project requires API keys to be set as environment variables. Source your env file before running:

source ~/.config/env.sh   # or wherever you keep your secrets

Required variables (at least one provider):

GROQ_API_KEY — Groq API key (get one)
OPENROUTER_API_KEY — OpenRouter API key (get one)

Optional variables:

SAMBANOVA_API_KEY — SambaNova API key (get one)
OPENAI_API_KEY — OpenAI API key

See .env.example for the full list of supported variables.

Run

source ~/.config/env.sh
make run

The server will start on http://127.0.0.1:3385 by default.

Configuration

All configuration is via environment variables (no .env files loaded by the application):

Variable	Default	Description
`HOST`	`127.0.0.1`	Server bind address
`PORT`	`3385`	Server port
`ROUTING_STRATEGY`	`cheapest`	`cheapest` or `best`
`MCP_CONFIG_PATH`	—	Path to MCP server config JSON
`ADMIN_TOKEN`	—	Simple admin auth token
`HERO_SECRET`	—	Hero Auth JWT secret

Both singular (GROQ_API_KEY) and plural (GROQ_API_KEYS) env var names are accepted. Use comma-separated values for multiple keys per provider.

Multiple API Keys

The broker supports multiple API keys per provider for load distribution, higher rate limits, and automatic failover. When multiple keys are configured, the broker creates separate provider instances (e.g., openai-0, openai-1) and distributes requests across them.

API Reference

Chat Completions

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt4o",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Text-to-Speech

Generate speech from text using OpenAI TTS models:

curl http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello, world!",
    "voice": "alloy"
  }' \
  --output speech.mp3

Available TTS Models:

tts-1 - Standard quality (OpenAI only)
tts-1-hd - High definition (OpenAI only)

Note: TTS requires OPENAI_API_KEY set in your environment

Speech-to-Text

Transcribe audio using Whisper models with automatic provider fallback:

curl http://localhost:8080/v1/audio/transcriptions \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

Available STT Models:

whisper-1 - Standard Whisper model with multi-provider support
- Priority 1: Groq (whisper-large-v3) - $0.111/hr
- Priority 2: SambaNova (whisper-large-v3) - FREE
- Priority 3: OpenAI (whisper-1) - $0.006/min
whisper-large-v3 - Direct access to Whisper Large v3
- Priority 1: Groq - $0.111/hr
- Priority 2: SambaNova - FREE

The system automatically tries providers in priority order (cheapest first) with fallback support.

Embeddings

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "Hello, world!"
  }'

List Models

curl http://localhost:8080/v1/models

Metrics

# Basic metrics
curl http://localhost:8080/metrics

# Detailed metrics with per-IP tracking
curl http://localhost:8080/metrics/detailed

The detailed metrics endpoint provides:

Total requests and errors per IP
First and last request timestamps
Currently active (in-flight) requests
Recent request history (last 10 per IP) including:
- Request start and finish timestamps
- Model used
- Request duration
- Success/error status

Billing & Usage

# View all IP usage and costs
curl http://localhost:8080/billing/usage

# View specific IP usage
curl http://localhost:8080/billing/usage/127.0.0.1

All requests are persisted to SQLite (requests.db) with:

IP address and model used
Token usage (input/output)
Costs in USD (calculated per-request)
Timestamps and duration
Success/error status

Export billing data:

# Export to CSV
sqlite3 -header -csv requests.db "SELECT * FROM request_logs;" > billing.csv

# Query specific IP
sqlite3 requests.db "SELECT * FROM request_logs WHERE ip='X.X.X.X';"

MCP Tools

# List all available tools
curl http://localhost:8080/mcp/tools

# Call a specific tool
curl http://localhost:8080/mcp/tools/search \
  -H "Content-Type: application/json" \
  -d '{"query": "rust programming"}'

Client Examples

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"  # Key is configured on the server
)

response = client.chat.completions.create(
    model="gpt4o",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Streaming

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)

stream = client.chat.completions.create(
    model="gpt4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Running Services

Start the Server

# Release build (production) - TCP mode (default)
make run

# Debug build with logging
make rundev

# Or manually on TCP
cargo run --release --bin hero_aibroker_server -- --port 8080

# Or with Unix socket (creates ~/hero/var/sockets/hero_aibroker_server.sock)
SOCKET_PATH=~/hero/var/sockets/hero_aibroker_server.sock \
  cargo run --release --bin hero_aibroker_server

Server Binding Modes:

TCP (default): http://127.0.0.1:8080 - use when you want HTTP access
Unix Socket: ~/hero/var/sockets/hero_aibroker_server.sock - use for local-only access via socket

Set SOCKET_PATH environment variable to enable Unix socket mode.

Exposed Endpoints:

OpenAI-compatible API endpoints (/v1/*)
JSON-RPC admin interface (/rpc)
Health check (/health)
Metrics (/metrics)
OpenRPC specification (/openrpc.json)

Start the Admin UI

# Terminal 1: Start the server
cargo run --release --bin hero_aibroker_server

# Terminal 2: Start the UI (connects to server via HTTP)
BROKER_URL=http://localhost:8080 cargo run --release --bin hero_aibroker_ui

The admin UI will be available at http://127.0.0.1:3000 and provides:

Chat interface
Model management
MCP tool integration
Request/usage metrics
Real-time logs

CLI Usage

# Interactive chat
cargo run --bin hero_aibroker -- chat --model gpt4o

# With global model option
cargo run --bin hero_aibroker -- --model deepseek-chat chat

# With custom retry attempts
cargo run --bin hero_aibroker -- --max-retries 5 chat

# List available models
cargo run --bin hero_aibroker -- models

# List MCP tools
cargo run --bin hero_aibroker -- tools

# Check server health
cargo run --bin hero_aibroker -- health

# Or use the make target
make cli

CLI Options

Global Options:

-u, --url <URL> - LLM Broker server URL (default: http://localhost:8080)
-m, --model <MODEL> - Model to use (can be overridden per command)
--max-retries <MAX_RETRIES> - Maximum number of retries on failure (default: 3)

Chat Command:

-m, --model <MODEL> - Model to use (overrides global --model)
-s, --stream - Enable streaming (default: true)

The CLI automatically retries failed requests with exponential backoff, making it resilient to temporary network issues or server errors.

Model Configuration

Models are configured in modelsconfig.yml:

models:
  gpt4o:
    display_name: "GPT-4o"
    tier: premium
    capabilities:
      - tool_calling
      - vision
    context_window: 128000
    backends:
      - provider: openai
        model_id: gpt-4o
        priority: 1
        input_cost: 2.5   # per million tokens
        output_cost: 10.0

Auto Model Selection

Use special model names for automatic selection:

Model Name	Description
`auto`	Use the configured routing strategy
`autocheapest`	Select the cheapest available model
`autobest`	Select the best premium model

MCP Integration

The broker can aggregate tools from multiple MCP (Model Context Protocol) servers. Configure servers in mcp_servers.json:

{
  "mcpServers": [
    {
      "name": "search",
      "command": "cargo",
      "args": ["run", "--bin", "mcp-serper"],
      "transport": "stdio"
    },
    {
      "name": "scraper",
      "url": "http://localhost:3001/sse",
      "transport": "sse"
    }
  ]
}

MCP Endpoints

Endpoint	Description
`GET /mcp/tools`	List all aggregated tools
`POST /mcp/tools/:name`	Call a specific tool
`GET /mcp/sse`	SSE endpoint for MCP clients

Included MCP Servers

mcp-serper - Web search via Serper API
mcp-serpapi - Web search via SerpAPI
mcp-exa - Semantic search via Exa
mcp-scraperapi - Web scraping via ScraperAPI
mcp-scrapfly - Web scraping via Scrapfly
mcp-ping - Simple ping server for testing

Architecture

┌─────────────────────────────────────────────────────┐
│                    API Layer                         │
│  (OpenAI-compatible endpoints: chat, tts, stt, etc) │
└─────────────────────────────────────────────────────┘
                          │
┌─────────────────────────────────────────────────────┐
│                  Service Layer                       │
│  (Routing logic, model selection, cost calculation) │
└─────────────────────────────────────────────────────┘
                          │
┌─────────────────────────────────────────────────────┐
│                 Registry Layer                       │
│  (Model catalog, backend resolution, pricing)       │
└─────────────────────────────────────────────────────┘
                          │
┌─────────────────────────────────────────────────────┐
│                 Provider Layer                       │
│  (OpenAI, Groq, SambaNova, OpenRouter adapters)     │
└─────────────────────────────────────────────────────┘

Development

Building

# Debug build
cargo build

# Release build
cargo build --release

# Build specific crate
cargo build -p llmbroker
cargo build -p llmbroker_cli

Running Tests

# Run all tests
cargo test

# Run tests for specific crate
cargo test -p llmbroker

Running Individual MCP Servers

# Run the Serper search server
SERPER_API_KEY=your-key cargo run --bin mcp-serper

# Run the ping test server
cargo run --bin mcp-ping

Documentation

Comprehensive documentation is available in the docs/ directory:

Document	Description
Architecture	System architecture and design principles
Technical Specs	Requirements and specifications
Component Design	Detailed component documentation
API Reference	Complete API documentation
MCP Integration	MCP tool integration guide
Data Flow	Request/response data flows
Deployment Guide	Production deployment guide

License

MIT License