Voice: hybrid streaming TTS with trackbar player #89

Closed
opened 2026-03-25 04:06:44 +00:00 by mik-tf · 2 comments
Owner

Context

v0.7.1-dev has TTS working (Kokoro local + Groq fallback), pause/play/stop controls, and progress tracking infrastructure in voice.rs. Currently TTS generates audio AFTER the full AI response completes — user waits for entire response before hearing anything.

Goal

Hybrid streaming TTS: hear audio sentence-by-sentence while AI is still responding, then full trackbar with seek/replay after response completes.

Implementation

Phase 1: Sentence-level SSE audio streaming

  • hero_agent: split response text into sentences as tokens stream in
  • hero_agent: send TTS for each sentence immediately via SSE event: audio (multiple events per response)
  • hero_archipelagos: queue and play audio chunks sequentially
  • hero_archipelagos: keep all chunks in a combined AudioBuffer

Phase 2: Full trackbar after response

  • Once all chunks received, stitch into single AudioBuffer
  • Render progress bar: [⏸] [━━━━●━━━━] 0:12/0:35
  • Click-to-seek on trackbar (create new BufferSource at offset)
  • Replay button (seek to 0:00)
  • Time display (elapsed / total)

Phase 3: Polish

  • Smooth progress animation (requestAnimationFrame)
  • Mini player bar below toolbar (collapses when not playing)
  • Persist play speed from Settings (0.5x-2.0x)
  • Keyboard shortcuts (space = pause, left/right = seek)

Technical notes

  • AudioBuffer.duration gives total seconds
  • AudioContext.currentTime - startTime gives elapsed
  • source.start(0, offsetSeconds) for seeking
  • AudioContext.suspend/resume for pause/play
  • Sentence splitting: split on . ! ? \n with min 20 chars
  • #78 Voice AI pipeline
  • #88 SPA/WASM migration
## Context v0.7.1-dev has TTS working (Kokoro local + Groq fallback), pause/play/stop controls, and progress tracking infrastructure in voice.rs. Currently TTS generates audio AFTER the full AI response completes — user waits for entire response before hearing anything. ## Goal Hybrid streaming TTS: hear audio sentence-by-sentence while AI is still responding, then full trackbar with seek/replay after response completes. ## Implementation ### Phase 1: Sentence-level SSE audio streaming - [ ] hero_agent: split response text into sentences as tokens stream in - [ ] hero_agent: send TTS for each sentence immediately via SSE `event: audio` (multiple events per response) - [ ] hero_archipelagos: queue and play audio chunks sequentially - [ ] hero_archipelagos: keep all chunks in a combined AudioBuffer ### Phase 2: Full trackbar after response - [ ] Once all chunks received, stitch into single AudioBuffer - [ ] Render progress bar: `[⏸] [━━━━●━━━━] 0:12/0:35` - [ ] Click-to-seek on trackbar (create new BufferSource at offset) - [ ] Replay button (seek to 0:00) - [ ] Time display (elapsed / total) ### Phase 3: Polish - [ ] Smooth progress animation (requestAnimationFrame) - [ ] Mini player bar below toolbar (collapses when not playing) - [ ] Persist play speed from Settings (0.5x-2.0x) - [ ] Keyboard shortcuts (space = pause, left/right = seek) ## Technical notes - AudioBuffer.duration gives total seconds - AudioContext.currentTime - startTime gives elapsed - source.start(0, offsetSeconds) for seeking - AudioContext.suspend/resume for pause/play - Sentence splitting: split on `. ` `! ` `? ` `\n` with min 20 chars ## Related - #78 Voice AI pipeline - #88 SPA/WASM migration
Author
Owner

Status Assessment

Phase 1: Sentence-level SSE audio streaming — DONE

  • hero_agent: split response into sentences (min 20 chars)
  • hero_agent: send TTS per sentence via SSE event: audio with chunk/total metadata
  • hero_archipelagos: queue and play audio chunks sequentially
  • hero_archipelagos: track duration across all chunks

Phase 2: Full trackbar after response — PARTIAL

  • Progress bar with percentage fill
  • Time display (elapsed / total)
  • Play/Pause/Stop controls
  • Click-to-seek on trackbar
  • Replay button (seek to 0:00)

Phase 3: Polish — TODO

  • Smooth progress animation (requestAnimationFrame instead of CSS transition)
  • Mini player bar below toolbar
  • Persist play speed from Settings (0.5x-2.0x)
  • Keyboard shortcuts (space=pause, left/right=seek)

Implementing remaining features now.

Signed-off-by: mik-tf

## Status Assessment ### Phase 1: Sentence-level SSE audio streaming — DONE - [x] hero_agent: split response into sentences (min 20 chars) - [x] hero_agent: send TTS per sentence via SSE `event: audio` with chunk/total metadata - [x] hero_archipelagos: queue and play audio chunks sequentially - [x] hero_archipelagos: track duration across all chunks ### Phase 2: Full trackbar after response — PARTIAL - [x] Progress bar with percentage fill - [x] Time display (elapsed / total) - [x] Play/Pause/Stop controls - [ ] Click-to-seek on trackbar - [ ] Replay button (seek to 0:00) ### Phase 3: Polish — TODO - [ ] Smooth progress animation (requestAnimationFrame instead of CSS transition) - [ ] Mini player bar below toolbar - [ ] Persist play speed from Settings (0.5x-2.0x) - [ ] Keyboard shortcuts (space=pause, left/right=seek) Implementing remaining features now. Signed-off-by: mik-tf
Author
Owner

Complete

All 3 phases implemented:

Phase 1 (pre-existing)

  • Sentence-level SSE audio streaming
  • Queue-based sequential chunk playback

Phase 2

  • Progress bar with time display
  • Play/Pause/Stop controls
  • Click-to-seek on progress bar
  • Replay button

Phase 3

  • Playback speed control (0.5x/1.0x/1.5x/2.0x) with localStorage persistence
  • Keyboard shortcuts (Space=pause, Left/Right=seek ±5s)
  • Audio chunks stored for seek/replay via stitched AudioBuffer

Deployed as v0.7.6-dev.

Signed-off-by: mik-tf

## Complete All 3 phases implemented: ### Phase 1 (pre-existing) - [x] Sentence-level SSE audio streaming - [x] Queue-based sequential chunk playback ### Phase 2 - [x] Progress bar with time display - [x] Play/Pause/Stop controls - [x] Click-to-seek on progress bar - [x] Replay button ### Phase 3 - [x] Playback speed control (0.5x/1.0x/1.5x/2.0x) with localStorage persistence - [x] Keyboard shortcuts (Space=pause, Left/Right=seek ±5s) - [x] Audio chunks stored for seek/replay via stitched AudioBuffer Deployed as v0.7.6-dev. Signed-off-by: mik-tf
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/home#89
No description provided.