PDF generation from markdown URLs with content-hash caching #33

Closed
opened 2026-02-09 14:53:07 +00:00 by mik-tf · 1 comment
Owner

Feature Request

Generate PDF documents from markdown content URLs. Cache generated PDFs by content hash so the same content always maps to the same PDF, and serve via a unique persistent link.

Proposed Flow

  1. Accept a markdown URL (or use the already-fetched page content)
  2. Convert markdown to PDF
  3. Hash the content to produce a deterministic cache key
  4. Store the PDF in hero_embedder KVS (key-value store) by hash
  5. Return a unique, persistent download link: /pdf/{hash}

Scope

  • Per-library and per-book PDF generation
  • Cached PDFs are served directly on subsequent requests (no re-generation)
  • Content hash ensures cache invalidation when source changes

Notes

  • Related to closed issue #14 (caching PDF)
  • PDF conversion could use a Rust library or external tool
  • hero_embedder already has a KVS for persistent storage
## Feature Request Generate PDF documents from markdown content URLs. Cache generated PDFs by content hash so the same content always maps to the same PDF, and serve via a unique persistent link. ## Proposed Flow 1. Accept a markdown URL (or use the already-fetched page content) 2. Convert markdown to PDF 3. Hash the content to produce a deterministic cache key 4. Store the PDF in hero_embedder KVS (key-value store) by hash 5. Return a unique, persistent download link: `/pdf/{hash}` ## Scope - Per-library and per-book PDF generation - Cached PDFs are served directly on subsequent requests (no re-generation) - Content hash ensures cache invalidation when source changes ## Notes - Related to closed issue #14 (caching PDF) - PDF conversion could use a Rust library or external tool - hero_embedder already has a KVS for persistent storage
Author
Owner

Implemented in commit 51d5d3a on development.

What was done

  • Per-page PDF: /book/{name}/page/{page}/pdf generates a single-page PDF using the same Chrome pipeline as book PDFs
  • Persistent cache: PDF cache moved from /tmp/ to books_dir/.pdf_cache/ — survives reboots
  • Content-hash permanent URLs: /pdf/{hash} serves any cached PDF by its content hash. Same content always produces the same hash.
  • KVS markdown snapshots: On PDF generation, the source markdown is stored in hero_embedder KVS. If the cached PDF file is cleaned up, it can be regenerated from the stored markdown. Links never break.
  • UI: Page view now shows "Book PDF" + "Page PDF" buttons. Book view keeps existing buttons.
  • Chrome sandbox fix: sandbox(false) for headless Chrome when running as root (TFGrid VMs)
  • Deploy: setup.sh now installs Google Chrome for PDF generation

URLs

Route Purpose
/book/{name}/pdf Download book PDF
/book/{name}/pdf/view View book PDF in browser
/book/{name}/page/{page}/pdf Download page PDF
/book/{name}/page/{page}/pdf/view View page PDF in browser
/pdf/{hash} Permanent link — serves PDF by content hash
Implemented in commit 51d5d3a on `development`. ## What was done - **Per-page PDF**: `/book/{name}/page/{page}/pdf` generates a single-page PDF using the same Chrome pipeline as book PDFs - **Persistent cache**: PDF cache moved from `/tmp/` to `books_dir/.pdf_cache/` — survives reboots - **Content-hash permanent URLs**: `/pdf/{hash}` serves any cached PDF by its content hash. Same content always produces the same hash. - **KVS markdown snapshots**: On PDF generation, the source markdown is stored in hero_embedder KVS. If the cached PDF file is cleaned up, it can be regenerated from the stored markdown. Links never break. - **UI**: Page view now shows "Book PDF" + "Page PDF" buttons. Book view keeps existing buttons. - **Chrome sandbox fix**: `sandbox(false)` for headless Chrome when running as root (TFGrid VMs) - **Deploy**: `setup.sh` now installs Google Chrome for PDF generation ## URLs | Route | Purpose | |---|---| | `/book/{name}/pdf` | Download book PDF | | `/book/{name}/pdf/view` | View book PDF in browser | | `/book/{name}/page/{page}/pdf` | Download page PDF | | `/book/{name}/page/{page}/pdf/view` | View page PDF in browser | | `/pdf/{hash}` | Permanent link — serves PDF by content hash |
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_books#33
No description provided.