Voice-to-text widgets backed by local STT daemon with Groq cloud fallback.

JavaScript 47.4%
Rust 37.9%
HTML 9.1%
CSS 4.7%
CMake 0.7%
Other 0.2%

Find a file

mahmoud 7b8880edf7 Some checks failed lab publish / publish (push) Failing after 9m48s Details lab publish (gnu) / publish-gnu (push) Successful in 26m37s Details ci: trigger lab-publish run		2026-06-09 18:01:00 +02:00
.cargo	chore: apply rustfmt and commit .cargo/config.toml	2026-05-19 20:06:03 +02:00
.forgejo/workflows	ci: canonical lab-publish workflow (build main/development/integration)	2026-06-09 17:23:04 +02:00
crates	fix(server): raise rpc body limit to 64MB for long transcriptions	2026-06-08 18:00:40 +02:00
docs/schemas	Refactor: split hero_voice into server, web, and admin crates	2026-05-26 12:57:59 +02:00
wasm/kws-vad	Move STT/TTS to sherpa-onnx fork, add kws+vad wasm scaffold	2026-05-07 01:16:48 +00:00
.gitignore	chore: drop machine-specific hero build artifacts (.hero/, Cargo.toml.hero_builder_backup) and gitignore them	2026-05-29 04:45:14 +00:00
apikeys.db	Switch to direct AI client from hero_proc secret store	2026-05-16 00:19:27 +00:00
Cargo.lock	migrate streaming transcription from chunks+SSE to WebSocket	2026-05-30 01:29:29 +00:00
Cargo.toml	Merge development_refactor: crate-split architecture on hero_blueprint/rpc2 (port server to hand-written model types, herolib transcription, theme fix)	2026-05-29 04:28:09 +00:00
LICENSE	feat: align hero_voice workspace structure with hero_services reference pattern	2026-02-24 11:16:17 +02:00
PURPOSE.md	Refactor: split hero_voice into server, web, and admin crates	2026-05-26 12:57:59 +02:00
README.md	migrate streaming transcription from chunks+SSE to WebSocket	2026-05-30 01:29:29 +00:00
request_logs.db	Switch to direct AI client from hero_proc secret store	2026-05-16 00:19:27 +00:00

README.md

Hero Voice

Drop-in voice-to-text widgets for any host UI in the Hero ecosystem, backed by a local STT daemon with a cloud Groq fallback and an AI text-transform pipeline.

Features

Drop-in browser widgets — <hero-voice-input>, <hero-voice-floating>, <hero-voice-button>, and a data-hero-voice boost for any text input. See Browser widgets below.
Click-bounded one-shot capture — MediaRecorder posts an Opus blob to /hero_voice/rest/transcribe; the server transcodes to 16 kHz mono WAV.
Local-first STT — hero_voiced runs sherpa-onnx Parakeet locally; Groq WhisperLargeV3Turbo takes over on failure.
Text transformations - 14 built-in AI transformation styles:
spellcheck - Grammar and spelling correction
specs - Technical specifications
code - Software architecture documentation
docs - User-friendly documentation
legal - Legal document formatting
story - Creative narrative
summary - Bullet-point summary
technical - Technical documentation
business - Business analysis
meeting - Meeting minutes
email - Professional email
Language translations: Dutch, French, Arabic
Topic organization - Hierarchical folder structure for transcriptions
Audio archival - Saves recordings as WAV and compressed OGG

Browser widgets

The widgets live at crates/hero_voice_admin/static/voice-widget/, embedded into hero_voice_admin and served under /hero_voice/admin/voice-widget/. Each one is a vanilla custom element with no framework dependency.

<!-- 1. Mic button bound to a target field -->
<hero-voice-input target="#desc"></hero-voice-input>
<textarea id="desc"></textarea>

<!-- 2. Boost any input — mic appears on hover/focus -->
<input data-hero-voice />

<!-- 3. Fixed corner mic — fills the last-focused input -->
<hero-voice-floating position="bottom-right"></hero-voice-floating>

<!-- 4. Event-only mic — emits `hero:voice-text`, calls window.fn -->
<hero-voice-button on-text="window.onVoiceText"></hero-voice-button>

Pull in the scripts (relative to the same socket the page is served from):

<link rel="stylesheet" href="/hero_voice/admin/voice-widget/voice-widget.css" />
<script src="/hero_voice/admin/voice-widget/components.js"></script>
<script src="/hero_voice/admin/voice-widget/floating.js"></script>
<script src="/hero_voice/admin/voice-widget/boost.js"></script>

A standalone demo lives at /hero_voice/admin/voice-widget/test.html.

Requirements

Rust 1.92+
A running hero_proc (provides the secret store; AI keys are read from it)
GROQ_API_KEY in hero_proc — required for the cloud STT/TTS fallback
OPENROUTER_API_KEY in hero_proc — required for the transform_content RPC
Modern browser with MediaRecorder + microphone support (see Browser Support)

Configuration

hero_voice reads AI provider keys directly from the hero_proc secret store via herolib_ai_direct. There is no AI broker daemon in this path.

hero_proc secret set GROQ_API_KEY gsk_...
hero_proc secret set OPENROUTER_API_KEY sk-or-...

Optional environment variables:

Var	Default	Purpose
`RUST_LOG`	(unset)	Tracing filter, e.g. `hero_voice=info`
`HERO_VOICED_PORT`	`8094`	Local `hero_voiced` HTTP port — STT/TTS is tried here first
`HERO_VOICE_LOCAL_DISABLE`	(unset)	Set to `1` to skip the local fast path and go straight to cloud
`HERO_VOICE_SHERPA_DIR`	`~/hero/share/hero_voice/voice-widget/sherpa`	Browser-side sherpa WASM/data dir (parked wake-word bundle)

Usage

lab service voice --start

Services listen on Unix sockets only (no TCP). Use hero_proxy for external access.

Sockets

Socket	Mount via hero_router	Purpose
`~/hero/var/sockets/hero_voice/rpc.sock`	`/hero_voice/rpc/`	JSON-RPC 2.0 (domain methods)
`~/hero/var/sockets/hero_voice/admin.sock`	`/hero_voice/admin/`	Admin UI, widget bundle, file downloads, MCP
`~/hero/var/sockets/hero_voice/rest.sock`	`/hero_voice/rest/`	Transcribe (one-shot + streaming SSE; optional topic-scoped archival), TTS

hero_voiced — local OpenAI-compatible STT/TTS daemon

hero_voiced is a stateless TCP daemon that loads sherpa-onnx Parakeet (STT) and Kokoro (TTS) once and exposes them over an OpenAI-compatible API. hero_voice_admin calls it directly via herolib_ai_direct — overriding Provider::Groq's base URL to http://127.0.0.1:${HERO_VOICED_PORT:-8094}/v1 — and falls back to cloud Groq Whisper / Orpheus on error. Set HERO_VOICE_LOCAL_DISABLE=1 to skip the local fast path entirely.

Endpoints:

POST /v1/audio/transcriptions — multipart form (file, model, language, prompt, response_format). Default response {"text": "..."}.
POST /v1/audio/speech — JSON {model, input, voice, response_format, speed}. Supports response_format of wav (default) and pcm.
GET /v1/models — local engine identifiers.
GET /health — {status, service, version, models_ready}.
GET /.well-known/heroservice.json — discovery manifest.

Environment:

Var	Default	Purpose
`HERO_VOICED_PORT`	`8094`	Loopback TCP port
`HERO_VOICED_ADDRESS`	(unset)	Optional second bind (e.g. mycelium IPv6)
`HERO_VOICE_STT_SHERPA_DIR`	`~/hero/share/hero_voice/stt/parakeet`	Parakeet bundle dir
`HERO_VOICE_TTS_KOKORO_DIR`	`~/hero/share/hero_voice/kokoro-en-v0_19`	Kokoro bundle dir

Both bundle dirs auto-populate on first hero_voiced start (~770 MB combined download from the sherpa-onnx GitHub releases).

Run standalone:

lab service voice --start

Architecture

Hero Voice follows the standard Hero three-crate model:

hero_voice/
├── crates/
│ ├── hero_voice/ # Core library (types, domain logic, audio, transcription)
│ ├── hero_voice_server/ # JSON-RPC 2.0 server over Unix socket (rpc.sock)
│ ├── hero_voice_admin/ # Admin UI on admin.sock + REST (transcribe/tts/uploads) on rest.sock (Axum HTTP)
│ ├── hero_voiced/ # Local OpenAI-compatible STT/TTS daemon (TCP)
│ ├── hero_voice_sdk/ # Generated client SDK
│ └── hero_voice_examples/ # Example programs using the SDK
├── schemas/voice/voice.oschema # Domain schema (source of truth)
├── Cargo.toml
└── wasm/ # Browser-side WASM build for the parked wake-word bundle (KWS/VAD)

Data flow

Browser widget (MediaRecorder → Opus blob)
 │
 ▼ POST /hero_voice/rest/transcribe
hero_voice_admin → rest.sock
 ├── /transcribe[?topic_sid=...] → multipart Opus → 16 kHz WAV → STT
 │ ├── hero_voiced (local, priority 0)
 │ ├── Groq Whisper (cloud fallback)
 │ └── (optional) archive original under data/audio/{topic_sid}/
 │ + voiceservice.register_audio bookkeeping
 ├── /transcribe/ws/{sid} → WebSocket: stream Opus/PCM up, transcript segments down
 └── /tts, /tts/voices → speech synthesis

hero_voice_admin → admin.sock (UI)
 ├── /voice-widget/* → widget bundle (components.js, floating.js,
 │ boost.js, bar.js, test.html, parked wake-word/)
 ├── /files/audio/*, /files/transforms/* → data downloads
 ├── /mcp → MCP-to-OpenRPC translation
 └── /* → embedded admin UI assets

hero_voice_server → rpc.sock (reached at /hero_voice/rpc/ via hero_router)
 ├── rpc.health → {"status":"ok"}
 ├── rpc.discover → OpenRPC spec
 └── domain methods (folder.*, topic.*, voiceservice.*)

API

JSON-RPC Endpoint

All data operations use JSON-RPC 2.0 at /hero_voice/rpc/rpc — served by hero_router directly from rpc.sock (no admin-side proxy).

Auto-generated CRUD (Topic and Folder root objects):

topic.new, topic.get, topic.set, topic.delete, topic.list
folder.new, folder.get, folder.set, folder.delete, folder.list

Custom service methods (VoiceService):

voiceservice.create_topic / voiceservice.create_folder
voiceservice.rename_topic / voiceservice.rename_folder
voiceservice.move_topic / voiceservice.move_folder
voiceservice.delete_topic / voiceservice.delete_folder
voiceservice.save_content / voiceservice.transform_content
voiceservice.register_audio / voiceservice.delete_audio
voiceservice.reset_topic / voiceservice.get_audio_path

Transcribe

POST /hero_voice/rest/transcribe[?topic_sid=...] — multipart/form-data with an audio field (Opus in Ogg or WebM, or WAV). Server transcodes to 16 kHz mono WAV before handing to STT. When topic_sid is set, the original (un-transcoded) bytes are archived under data/audio/{topic_sid}/{timestamp}.{ext} and registered via voiceservice.register_audio as a best-effort side effect — archival errors log a warning but don't fail the transcription.

Response: {text, model_id, latency_ms, archived?: {filename, format, size}}.

Static Files

GET /hero_voice/admin/files/audio/{filename} - Audio file downloads
GET /hero_voice/admin/files/transforms/{filename} - Transform file downloads

Audio Processing

Capture: Browser MediaRecorder, Opus in Ogg (Firefox) or WebM (Chromium / Safari 18.4+) container, click-bounded one-shot per recording.
Server-side transcode: Opus → 16 kHz mono WAV before STT.
Archival: Saved recordings stored as WAV plus compressed OGG Vorbis (~10% of WAV size).

Browser Support

Chrome 120+
Firefox 120+
Safari 18.4+ (Opus in MediaRecorder; older Safari falls back to MP4 which the server doesn't currently decode)
Edge 120+

Requires microphone permission.

Embedding & CORS

Hero Voice allows iframe embedding (no X-Frame-Options restrictions), cross-origin API calls, and WebSocket connections from any origin.

License

Apache-2.0