SQLite pluggable storage backend (scalability path beyond 10K users) #17

New issue

Open

opened 2026-03-27 00:54:01 +00:00 by mik-tf · 0 comments

mik-tf commented

2026-03-27 00:54:01 +00:00

Owner

Context

OSIS currently stores each object as an individual OTOML file on disk with Tantivy full-text search indexing. This works well for small-to-medium deployments (1K-10K entities) and has excellent properties:

Human-readable files, zero dependencies, easy debugging
Works on edge nodes without a database server

Problem at Scale

At 50K+ users with millions of transactions, file-per-object has limitations:

File system overhead (inode limits, directory listing performance)
Tantivy index rebuild on startup gets slow
Race conditions with multi-process writes (we hit duplicate records with 2 replicas sharing the same GlusterFS volume — see workaround below)

Request

Add SQLite as a pluggable storage backend option alongside the existing file backend. The API stays the same (_new, _get, _set, _delete, _list, _find) — only the storage layer changes.

Benefits:

ACID transactions (no race conditions on multi-replica startup)
Single file instead of thousands (better for containers, backups)
Mature WAL mode handles concurrent readers/writers
Still zero-dependency (SQLite is embedded)

Current Workaround

The freezone project (https://forge.ourworld.tf/znzfreezone_code) runs 2 backend replicas sharing a GlusterFS RWX volume. We hit a seed race condition where both replicas created duplicate records because Tantivy writes weren't visible across processes fast enough. Fixed with a dedup pass after seeding + rolling update deploy procedure.

See: znzfreezone_code/home#51

Priority

Not urgent. Current deployment (~100 users) works fine with file backend. This is for future scaling.

## Context OSIS currently stores each object as an individual OTOML file on disk with Tantivy full-text search indexing. This works well for small-to-medium deployments (1K-10K entities) and has excellent properties: - Human-readable files, zero dependencies, easy debugging - Works on edge nodes without a database server ## Problem at Scale At 50K+ users with millions of transactions, file-per-object has limitations: - File system overhead (inode limits, directory listing performance) - Tantivy index rebuild on startup gets slow - Race conditions with multi-process writes (we hit duplicate records with 2 replicas sharing the same GlusterFS volume — see workaround below) ## Request Add SQLite as a pluggable storage backend option alongside the existing file backend. The API stays the same (`_new`, `_get`, `_set`, `_delete`, `_list`, `_find`) — only the storage layer changes. Benefits: - ACID transactions (no race conditions on multi-replica startup) - Single file instead of thousands (better for containers, backups) - Mature WAL mode handles concurrent readers/writers - Still zero-dependency (SQLite is embedded) ## Current Workaround The freezone project (https://forge.ourworld.tf/znzfreezone_code) runs 2 backend replicas sharing a GlusterFS RWX volume. We hit a seed race condition where both replicas created duplicate records because Tantivy writes weren't visible across processes fast enough. Fixed with a dedup pass after seeding + rolling update deploy procedure. See: https://forge.ourworld.tf/znzfreezone_code/home/issues/51 ## Priority Not urgent. Current deployment (~100 users) works fine with file backend. This is for future scaling.