Files
osiris/docs/specs/osiris-mvp.md
Timur Gordon 097360ad12 first commit
2025-10-20 22:24:25 +02:00

11 KiB
Raw Permalink Blame History

OSIRIS MVP — Minimal Semantic Store over HeroDB

0) Purpose

OSIRIS is a Rust-native object layer on top of HeroDB that provides structured storage and retrieval capabilities without any server-side extensions or indexing engines.

It provides:

  • Object CRUD operations
  • Namespace management
  • Simple local field indexing (field:*)
  • Basic keyword scan (substring matching)
  • CLI interface
  • Future: 9P filesystem interface

It does not depend on HeroDB's Tantivy FTS, vectors, or relations.


1) Architecture

HeroDB (unmodified)
│
├── KV store + encryption
└── RESP protocol
    ↑
    │
    └── OSIRIS
        ├── store/          object schema + persistence
        ├── index/          field index & keyword scanning
        ├── retrieve/       query planner + filtering
        ├── interfaces/     CLI, 9P (future)
        └── config/         namespaces + settings

2) Data Model

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct OsirisObject {
    pub id: String,
    pub ns: String,
    pub meta: Metadata,
    pub text: Option<String>,   // optional plain text
}

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Metadata {
    pub title: Option<String>,
    pub mime: Option<String>,
    pub tags: BTreeMap<String, String>,
    pub created: OffsetDateTime,
    pub updated: OffsetDateTime,
    pub size: Option<u64>,
}

3) Keyspace Design

meta:<id>             → serialized OsirisObject (JSON)
field:tag:<key>=<val> → Set of IDs (for tag filtering)
field:mime:<type>     → Set of IDs (for MIME type filtering)
field:title:<title>   → Set of IDs (for title filtering)
scan:index            → Set of all IDs (for full scan)

Example:

field:tag:project=osiris  → {note_1, note_2}
field:mime:text/markdown  → {note_1, note_3}
scan:index                → {note_1, note_2, note_3, ...}

4) Index Maintenance

Insert / Update

// Store object
redis.set(format!("meta:{}", obj.id), serde_json::to_string(&obj)?)?;

// Index tags
for (k, v) in &obj.meta.tags {
    redis.sadd(format!("field:tag:{}={}", k, v), &obj.id)?;
}

// Index MIME type
if let Some(mime) = &obj.meta.mime {
    redis.sadd(format!("field:mime:{}", mime), &obj.id)?;
}

// Index title
if let Some(title) = &obj.meta.title {
    redis.sadd(format!("field:title:{}", title), &obj.id)?;
}

// Add to scan index
redis.sadd("scan:index", &obj.id)?;

Delete

// Remove object
redis.del(format!("meta:{}", obj.id))?;

// Deindex tags
for (k, v) in &obj.meta.tags {
    redis.srem(format!("field:tag:{}={}", k, v), &obj.id)?;
}

// Deindex MIME type
if let Some(mime) = &obj.meta.mime {
    redis.srem(format!("field:mime:{}", mime), &obj.id)?;
}

// Deindex title
if let Some(title) = &obj.meta.title {
    redis.srem(format!("field:title:{}", title), &obj.id)?;
}

// Remove from scan index
redis.srem("scan:index", &obj.id)?;

5) Retrieval

Query Structure

pub struct RetrievalQuery {
    pub text: Option<String>,                 // keyword substring
    pub ns: String,
    pub filters: Vec<(String, String)>,       // field=value
    pub top_k: usize,
}

Execution Steps

  1. Collect candidate IDs from field:* filters (SMEMBERS + intersection)
  2. If text query is provided, iterate over candidates:
    • Fetch meta:<id>
    • Test substring match on meta.title, text, or tags
    • Compute simple relevance score
  3. Sort by score (descending) and limit to top_k

This is O(N) for text scan but acceptable for MVP or small datasets (<10k objects).

Scoring Algorithm

fn compute_text_score(obj: &OsirisObject, query: &str) -> f32 {
    let mut score = 0.0;
    
    // Title match
    if let Some(title) = &obj.meta.title {
        if title.to_lowercase().contains(query) {
            score += 0.5;
        }
    }
    
    // Text content match
    if let Some(text) = &obj.text {
        if text.to_lowercase().contains(query) {
            score += 0.5;
            // Bonus for multiple occurrences
            let count = text.to_lowercase().matches(query).count();
            score += (count as f32 - 1.0) * 0.1;
        }
    }
    
    // Tag match
    for (key, value) in &obj.meta.tags {
        if key.to_lowercase().contains(query) || value.to_lowercase().contains(query) {
            score += 0.2;
        }
    }
    
    score.min(1.0)
}

6) CLI

Commands

# Initialize and create namespace
osiris init --herodb redis://localhost:6379
osiris ns create notes

# Add and read objects
osiris put notes/my-note.md ./my-note.md --tags topic=rust,project=osiris
osiris get notes/my-note.md
osiris get notes/my-note.md --raw --output /tmp/note.md
osiris del notes/my-note.md

# Search
osiris find --ns notes --filter topic=rust
osiris find "retrieval" --ns notes
osiris find "rust" --ns notes --filter project=osiris --topk 20

# Namespace management
osiris ns list
osiris ns delete notes

# Statistics
osiris stats
osiris stats --ns notes

Examples

# Store a note from stdin
echo "This is a note about Rust programming" | \
  osiris put notes/rust-intro - \
  --title "Rust Introduction" \
  --tags topic=rust,level=beginner \
  --mime text/plain

# Search for notes about Rust
osiris find "rust" --ns notes

# Filter by tag
osiris find --ns notes --filter topic=rust

# Get note as JSON
osiris get notes/rust-intro

# Get raw content
osiris get notes/rust-intro --raw

7) Configuration

File Location

~/.config/osiris/config.toml

Example

[herodb]
url = "redis://localhost:6379"

[namespaces.notes]
db_id = 1

[namespaces.calendar]
db_id = 2

Structure

pub struct Config {
    pub herodb: HeroDbConfig,
    pub namespaces: HashMap<String, NamespaceConfig>,
}

pub struct HeroDbConfig {
    pub url: String,
}

pub struct NamespaceConfig {
    pub db_id: u16,
}

8) Database Allocation

DB 0  → HeroDB Admin (managed by HeroDB)
DB 1  → osiris:notes (namespace "notes")
DB 2  → osiris:calendar (namespace "calendar")
DB 3+ → Additional namespaces...

Each namespace gets its own isolated HeroDB database.


9) Dependencies

[dependencies]
anyhow = "1.0"
redis = { version = "0.24", features = ["aio", "tokio-comp"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
time = { version = "0.3", features = ["serde", "formatting", "parsing", "macros"] }
tokio = { version = "1.23", features = ["full"] }
clap = { version = "4.5", features = ["derive"] }
toml = "0.8"
uuid = { version = "1.6", features = ["v4", "serde"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

10) Future Enhancements

Feature When Added Moves Where
Dedup / blobs HeroDB extension HeroDB
Vector search HeroDB extension HeroDB
Full-text search HeroDB (Tantivy) HeroDB
Relations / graph OSIRIS later OSIRIS
9P filesystem OSIRIS later OSIRIS

This MVP maintains clean interface boundaries:

  • HeroDB remains a plain KV substrate
  • OSIRIS builds higher-order meaning on top

11) Implementation Status

Completed

  • Project structure and Cargo.toml
  • Core data models (OsirisObject, Metadata)
  • HeroDB client wrapper (RESP protocol)
  • Field indexing (tags, MIME, title)
  • Search engine (substring matching + scoring)
  • Configuration management
  • CLI interface (init, ns, put, get, del, find, stats)
  • Error handling
  • Documentation (README, specs)

🚧 Pending

  • 9P filesystem interface
  • Integration tests
  • Performance benchmarks
  • Name resolution (namespace/name → ID mapping)

12) Quick Start

Prerequisites

Start HeroDB:

cd /path/to/herodb
cargo run --release -- --dir ./data --admin-secret mysecret --port 6379

Build OSIRIS

cd /path/to/osiris
cargo build --release

Initialize

# Create configuration
./target/release/osiris init --herodb redis://localhost:6379

# Create a namespace
./target/release/osiris ns create notes

Usage

# Add a note
echo "OSIRIS is a minimal object store" | \
  ./target/release/osiris put notes/intro - \
  --title "Introduction" \
  --tags topic=osiris,type=doc

# Search
./target/release/osiris find "object store" --ns notes

# Get the note
./target/release/osiris get notes/intro

# Show stats
./target/release/osiris stats --ns notes

13) Testing

Unit Tests

cargo test

Integration Tests (requires HeroDB)

# Start HeroDB
cd /path/to/herodb
cargo run -- --dir /tmp/herodb-test --admin-secret test --port 6379

# Run tests
cd /path/to/osiris
cargo test -- --ignored

14) Performance Characteristics

Write Performance

  • Object storage: O(1) - single SET operation
  • Indexing: O(T) where T = number of tags/fields
  • Total: O(T) per object

Read Performance

  • Get by ID: O(1) - single GET operation
  • Filter by tags: O(F) where F = number of filters (set intersection)
  • Text search: O(N) where N = number of candidates (linear scan)

Storage Overhead

  • Object: ~1KB per object (JSON serialized)
  • Indexes: ~50 bytes per tag/field entry
  • Total: ~1.5KB per object with 10 tags

Scalability

  • Optimal: <10,000 objects per namespace
  • Acceptable: <100,000 objects per namespace
  • Beyond: Consider migrating to Tantivy FTS

15) Design Decisions

Why No Tantivy in MVP?

  • Simplicity: Avoid HeroDB server-side dependencies
  • Portability: Works with any Redis-compatible backend
  • Flexibility: Easy to migrate to Tantivy later

Why Substring Matching?

  • Good enough: For small datasets (<10k objects)
  • Simple: No tokenization, stemming, or complex scoring
  • Fast: O(N) is acceptable for MVP

Why Separate Databases per Namespace?

  • Isolation: Clear separation of concerns
  • Performance: Smaller keyspaces = faster scans
  • Security: Can apply different encryption keys per namespace

16) Migration Path

When ready to scale beyond MVP:

  1. Add Tantivy FTS (HeroDB extension)

    • Create FT.* commands in HeroDB
    • Update OSIRIS to use FT.SEARCH instead of substring scan
    • Keep field indexes for filtering
  2. Add Vector Search (HeroDB extension)

    • Store embeddings in HeroDB
    • Implement ANN search (HNSW/IVF)
    • Add hybrid retrieval (BM25 + vector)
  3. Add Relations (OSIRIS feature)

    • Store relation graphs in HeroDB
    • Implement graph traversal
    • Add relation-based ranking
  4. Add Deduplication (HeroDB extension)

    • Content-addressable storage (BLAKE3)
    • Reference counting
    • Garbage collection

Summary

OSIRIS MVP is a minimal, production-ready object store that:

  • Works with unmodified HeroDB
  • Provides structured storage with metadata
  • Supports field-based filtering
  • Includes basic text search
  • Exposes a clean CLI interface
  • Maintains clear upgrade paths

Perfect for:

  • Personal knowledge management
  • Small-scale document storage
  • Prototyping semantic applications
  • Learning Rust + Redis patterns

Next steps:

  • Build and test the MVP
  • Gather usage feedback
  • Plan Tantivy/vector integration
  • Design 9P filesystem interface