SQLite pluggable storage backend (scalability path beyond 10K users) #17

Open
opened 2026-03-27 00:54:01 +00:00 by mik-tf · 0 comments
Owner

Context

OSIS currently stores each object as an individual OTOML file on disk with Tantivy full-text search indexing. This works well for small-to-medium deployments (1K-10K entities) and has excellent properties:

  • Human-readable files, zero dependencies, easy debugging
  • Works on edge nodes without a database server

Problem at Scale

At 50K+ users with millions of transactions, file-per-object has limitations:

  • File system overhead (inode limits, directory listing performance)
  • Tantivy index rebuild on startup gets slow
  • Race conditions with multi-process writes (we hit duplicate records with 2 replicas sharing the same GlusterFS volume — see workaround below)

Request

Add SQLite as a pluggable storage backend option alongside the existing file backend. The API stays the same (_new, _get, _set, _delete, _list, _find) — only the storage layer changes.

Benefits:

  • ACID transactions (no race conditions on multi-replica startup)
  • Single file instead of thousands (better for containers, backups)
  • Mature WAL mode handles concurrent readers/writers
  • Still zero-dependency (SQLite is embedded)

Current Workaround

The freezone project (https://forge.ourworld.tf/znzfreezone_code) runs 2 backend replicas sharing a GlusterFS RWX volume. We hit a seed race condition where both replicas created duplicate records because Tantivy writes weren't visible across processes fast enough. Fixed with a dedup pass after seeding + rolling update deploy procedure.

See: znzfreezone_code/home#51

Priority

Not urgent. Current deployment (~100 users) works fine with file backend. This is for future scaling.

## Context OSIS currently stores each object as an individual OTOML file on disk with Tantivy full-text search indexing. This works well for small-to-medium deployments (1K-10K entities) and has excellent properties: - Human-readable files, zero dependencies, easy debugging - Works on edge nodes without a database server ## Problem at Scale At 50K+ users with millions of transactions, file-per-object has limitations: - File system overhead (inode limits, directory listing performance) - Tantivy index rebuild on startup gets slow - Race conditions with multi-process writes (we hit duplicate records with 2 replicas sharing the same GlusterFS volume — see workaround below) ## Request Add SQLite as a pluggable storage backend option alongside the existing file backend. The API stays the same (`_new`, `_get`, `_set`, `_delete`, `_list`, `_find`) — only the storage layer changes. Benefits: - ACID transactions (no race conditions on multi-replica startup) - Single file instead of thousands (better for containers, backups) - Mature WAL mode handles concurrent readers/writers - Still zero-dependency (SQLite is embedded) ## Current Workaround The freezone project (https://forge.ourworld.tf/znzfreezone_code) runs 2 backend replicas sharing a GlusterFS RWX volume. We hit a seed race condition where both replicas created duplicate records because Tantivy writes weren't visible across processes fast enough. Fixed with a dedup pass after seeding + rolling update deploy procedure. See: https://forge.ourworld.tf/znzfreezone_code/home/issues/51 ## Priority Not urgent. Current deployment (~100 users) works fine with file backend. This is for future scaling.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_osis#17
No description provided.