No description
  • Rust 48.4%
  • JavaScript 20.3%
  • Shell 18.8%
  • HTML 10.4%
  • Makefile 2.1%
Find a file
2026-02-24 11:50:17 +00:00
.cargo build: update local Cargo config for monorepo dev 2026-02-24 12:19:00 +02:00
.forgejo/workflows ci: add Cargo cache and SSH keepalive to fix git clone hangs 2026-02-24 13:25:41 +02:00
crates fix: resolve all clippy lint warnings 2026-02-24 13:16:37 +02:00
docs/schemas fix: resolve cargo check failures for CI 2026-02-24 13:07:17 +02:00
schemas/voice feat: align hero_voice workspace structure with hero_services reference pattern 2026-02-24 11:16:17 +02:00
scripts ci: Enhance build system with version sync and workflow improvements 2026-02-07 11:08:12 +04:00
.gitignore fix: resolve cargo check failures for CI 2026-02-24 13:07:17 +02:00
buildenv.sh refactor: Split Hero Voice server into dedicated RPC and HTTP services 2026-02-24 11:56:25 +02:00
Cargo.lock ci: update hero_rpc lock + add Cargo cache and SSH keepalive 2026-02-24 13:45:11 +02:00
Cargo.toml build: update local Cargo config for monorepo dev 2026-02-24 12:19:00 +02:00
LICENSE feat: align hero_voice workspace structure with hero_services reference pattern 2026-02-24 11:16:17 +02:00
Makefile refactor: Split Hero Voice server into dedicated RPC and HTTP services 2026-02-24 11:56:25 +02:00
README.md feat: align hero_voice workspace structure with hero_services reference pattern 2026-02-24 11:16:17 +02:00

HeroVoice

Voice-to-markdown transcription server with real-time speech recognition, AI-powered text transformation, and live preview.

Features

  • Real-time voice transcription - Stream audio from browser to server via WebSocket
  • Voice Activity Detection - Silero VAD V5 neural network detects speech/silence transitions
  • Automatic segmentation - Transcribes on natural pauses (350ms silence threshold)
  • AI transcription - Uses Groq WhisperLargeV3Turbo with automatic failover
  • Live markdown preview - Split-screen editor with real-time rendered HTML
  • Text transformations - 14 built-in AI transformation styles:
    • spellcheck - Grammar and spelling correction
    • specs - Technical specifications
    • code - Software architecture documentation
    • docs - User-friendly documentation
    • legal - Legal document formatting
    • story - Creative narrative
    • summary - Bullet-point summary
    • technical - Technical documentation
    • business - Business analysis
    • meeting - Meeting minutes
    • email - Professional email
    • Language translations: Dutch, French, Arabic
  • Topic organization - Hierarchical folder structure for transcriptions
  • Audio archival - Saves recordings as WAV and compressed OGG

Requirements

  • Rust 1.92+
  • Groq API key (required for transcription)
  • Modern browser with Web Audio API and microphone support

Configuration

# Required
export GROQ_API_KEY=your-groq-api-key

# Optional fallback providers
export OPENROUTER_API_KEY=your-openrouter-key
export SAMBANOVA_API_KEY=your-sambanova-key

# Server configuration (optional)
export HOST=0.0.0.0          # Bind address (default: 0.0.0.0)
export PORT=2756             # Listen port (default: 2756)
export RUST_LOG=hero_voice=info  # Log level

Usage

make run

Open http://localhost:2756 in your browser.

Recording

  1. Click the microphone button or press Ctrl+Shift+R to start recording
  2. Speak naturally - transcription happens automatically on pauses
  3. Click stop or press the shortcut again to end recording
  4. Use Ctrl+Shift+C to copy the markdown content

Topics

  • Create folders and topics in the sidebar tree
  • Each topic stores its content, audio recordings, and transforms
  • Right-click for rename, move, and delete options

Transformations

  1. Select a transformation style from the dropdown
  2. Click "Transform" to apply AI formatting
  3. Transforms are saved per-topic

Architecture

HeroVoice uses an OSchema-generated OpenRPC server with a custom WebSocket route for real-time audio streaming.

Browser (WebSocket)
    │
    ▼
Axum Server (AxumRpcServer)
    ├── JSON-RPC endpoint (/api/root/voice/rpc)
    │   └── OSchema-generated CRUD + VoiceService methods
    │
    ├── WebSocket Handler (/ws)
    │   ├── AudioRecorder (WAV file saving)
    │   └── AudioProcessor (VAD-based segmentation)
    │
    ├── Transcriber (herolib-ai)
    │   └── Groq → OpenRouter → SambaNova (failover)
    │
    └── TextTransformer (LLM transformations)

API

JSON-RPC Endpoint

All data operations use JSON-RPC 2.0 at POST /api/root/voice/rpc.

Auto-generated CRUD (Topic and Folder root objects):

  • topic.new, topic.get, topic.set, topic.delete, topic.list
  • folder.new, folder.get, folder.set, folder.delete, folder.list

Custom service methods (VoiceService):

  • voiceservice.create_topic / voiceservice.create_folder
  • voiceservice.rename_topic / voiceservice.rename_folder
  • voiceservice.move_topic / voiceservice.move_folder
  • voiceservice.delete_topic / voiceservice.delete_folder
  • voiceservice.save_content / voiceservice.transform_content
  • voiceservice.register_audio / voiceservice.delete_audio
  • voiceservice.reset_topic / voiceservice.get_audio_path

Inspector: GET /api/root/voice/inspector

WebSocket

GET /ws - Audio streaming endpoint

Client to Server:

{ "type": "start", "topic": "optional-topic-sid", "audio_dir": "optional-dir" }
{ "type": "stop" }

Plus binary audio data (16-bit PCM, 16kHz, mono, little-endian)

Server to Client:

{ "type": "transcription", "text": "...", "is_final": true }
{ "type": "status", "message": "..." }
{ "type": "error", "message": "..." }

Static Files

  • GET /files/audio/{filename} - Audio file downloads
  • GET /files/transforms/{filename} - Transform file downloads

Project Structure

hero_voice/
├── schemas/voice/voice.oschema   # Domain schema (source of truth)
├── build.rs                      # OSchema code generation
├── src/
│   ├── main.rs                   # Server setup (AxumRpcServer + WebSocket)
│   ├── lib.rs                    # Library root
│   ├── ws.rs                     # WebSocket audio streaming handler
│   ├── audio.rs                  # Voice Activity Detection (Silero VAD V5)
│   ├── convert.rs                # WAV to OGG conversion
│   ├── transcriber.rs            # AI transcription + text transformation
│   └── voice/                    # OSchema-generated domain
│       ├── core/types_generated.rs   # Generated types (DO NOT EDIT)
│       └── server/
│           ├── osis_server_generated.rs  # Generated server (DO NOT EDIT)
│           ├── rpc_generated.rs          # Generated trait (DO NOT EDIT)
│           └── rpc.rs                    # Business logic implementation
├── static/
│   ├── index.html                # Single-page application
│   └── app.js                    # Frontend (JSON-RPC client)
├── data/                         # Runtime data (OTOML storage, audio, transforms)
├── Cargo.toml
├── Makefile
└── buildenv.sh

Audio Processing

  • Sample rate: 16kHz (required for Silero VAD)
  • Chunk size: 512 samples for VAD analysis
  • Silence threshold: 350ms triggers transcription
  • Speech threshold: 0.20 probability
  • Maximum buffer: 30 seconds before forced transcription
  • Compression: OGG Vorbis at quality 0.4 (~10% of WAV size)

Browser Support

  • Chrome 120+
  • Firefox 120+
  • Safari 17+
  • Edge 120+

Requires microphone permission and WebSocket support.

Embedding & CORS

HeroVoice allows iframe embedding (no X-Frame-Options restrictions), cross-origin API calls, and WebSocket connections from any origin.

License

Apache-2.0