→ Set of IDs (for title filtering) scan:index → Set of all IDs (for full scan) ``` **Example:** ``` field:tag:project=osiris → {note_1, note_2} field:mime:text/markdown → {note_1, note_3} scan:index → {note_1, note_2, note_3, ...} ``` --- ## 4) Index Maintenance ### Insert / Update ```rust // Store object redis.set(format!("meta:{}", obj.id), serde_json::to_string(&obj)?)?; // Index tags for (k, v) in &obj.meta.tags { redis.sadd(format!("field:tag:{}={}", k, v), &obj.id)?; } // Index MIME type if let Some(mime) = &obj.meta.mime { redis.sadd(format!("field:mime:{}", mime), &obj.id)?; } // Index title if let Some(title) = &obj.meta.title { redis.sadd(format!("field:title:{}", title), &obj.id)?; } // Add to scan index redis.sadd("scan:index", &obj.id)?; ``` ### Delete ```rust // Remove object redis.del(format!("meta:{}", obj.id))?; // Deindex tags for (k, v) in &obj.meta.tags { redis.srem(format!("field:tag:{}={}", k, v), &obj.id)?; } // Deindex MIME type if let Some(mime) = &obj.meta.mime { redis.srem(format!("field:mime:{}", mime), &obj.id)?; } // Deindex title if let Some(title) = &obj.meta.title { redis.srem(format!("field:title:{}", title), &obj.id)?; } // Remove from scan index redis.srem("scan:index", &obj.id)?; ``` --- ## 5) Retrieval ### Query Structure ```rust pub struct RetrievalQuery { pub text: Option<String>, // keyword substring pub ns: String, pub filters: Vec<(String, String)>, // field=value pub top_k: usize, } ``` ### Execution Steps 1. **Collect candidate IDs** from field:* filters (SMEMBERS + intersection) 2. **If text query is provided**, iterate over candidates: - Fetch `meta:<id>` - Test substring match on `meta.title`, `text`, or `tags` - Compute simple relevance score 3. **Sort** by score (descending) and **limit** to `top_k` This is O(N) for text scan but acceptable for MVP or small datasets (<10k objects). ### Scoring Algorithm ```rust fn compute_text_score(obj: &OsirisObject, query: &str) -> f32 { let mut score = 0.0; // Title match if let Some(title) = &obj.meta.title { if title.to_lowercase().contains(query) { score += 0.5; } } // Text content match if let Some(text) = &obj.text { if text.to_lowercase().contains(query) { score += 0.5; // Bonus for multiple occurrences let count = text.to_lowercase().matches(query).count(); score += (count as f32 - 1.0) * 0.1; } } // Tag match for (key, value) in &obj.meta.tags { if key.to_lowercase().contains(query) || value.to_lowercase().contains(query) { score += 0.2; } } score.min(1.0) } ``` --- ## 6) CLI ### Commands ```bash # Initialize and create namespace osiris init --herodb redis://localhost:6379 osiris ns create notes # Add and read objects osiris put notes/my-note.md ./my-note.md --tags topic=rust,project=osiris osiris get notes/my-note.md osiris get notes/my-note.md --raw --output /tmp/note.md osiris del notes/my-note.md # Search osiris find --ns notes --filter topic=rust osiris find "retrieval" --ns notes osiris find "rust" --ns notes --filter project=osiris --topk 20 # Namespace management osiris ns list osiris ns delete notes # Statistics osiris stats osiris stats --ns notes ``` ### Examples ```bash # Store a note from stdin echo "This is a note about Rust programming" | \ osiris put notes/rust-intro - \ --title "Rust Introduction" \ --tags topic=rust,level=beginner \ --mime text/plain # Search for notes about Rust osiris find "rust" --ns notes # Filter by tag osiris find --ns notes --filter topic=rust # Get note as JSON osiris get notes/rust-intro # Get raw content osiris get notes/rust-intro --raw ``` --- ## 7) Configuration ### File Location `~/.config/osiris/config.toml` ### Example ```toml [herodb] url = "redis://localhost:6379" [namespaces.notes] db_id = 1 [namespaces.calendar] db_id = 2 ``` ### Structure ```rust pub struct Config { pub herodb: HeroDbConfig, pub namespaces: HashMap<String, NamespaceConfig>, } pub struct HeroDbConfig { pub url: String, } pub struct NamespaceConfig { pub db_id: u16, } ``` --- ## 8) Database Allocation ``` DB 0 → HeroDB Admin (managed by HeroDB) DB 1 → osiris:notes (namespace "notes") DB 2 → osiris:calendar (namespace "calendar") DB 3+ → Additional namespaces... ``` Each namespace gets its own isolated HeroDB database. --- ## 9) Dependencies ```toml [dependencies] anyhow = "1.0" redis = { version = "0.24", features = ["aio", "tokio-comp"] } serde = { version = "1.0", features = ["derive"] } serde_json = "1.0" time = { version = "0.3", features = ["serde", "formatting", "parsing", "macros"] } tokio = { version = "1.23", features = ["full"] } clap = { version = "4.5", features = ["derive"] } toml = "0.8" uuid = { version = "1.6", features = ["v4", "serde"] } tracing = "0.1" tracing-subscriber = { version = "0.3", features = ["env-filter"] } ``` --- ## 10) Future Enhancements | Feature | When Added | Moves Where | |---------|-----------|-------------| | Dedup / blobs | HeroDB extension | HeroDB | | Vector search | HeroDB extension | HeroDB | | Full-text search | HeroDB (Tantivy) | HeroDB | | Relations / graph | OSIRIS later | OSIRIS | | 9P filesystem | OSIRIS later | OSIRIS | This MVP maintains clean interface boundaries: - **HeroDB** remains a plain KV substrate - **OSIRIS** builds higher-order meaning on top --- ## 11) Implementation Status ### ✅ Completed - [x] Project structure and Cargo.toml - [x] Core data models (OsirisObject, Metadata) - [x] HeroDB client wrapper (RESP protocol) - [x] Field indexing (tags, MIME, title) - [x] Search engine (substring matching + scoring) - [x] Configuration management - [x] CLI interface (init, ns, put, get, del, find, stats) - [x] Error handling - [x] Documentation (README, specs) ### 🚧 Pending - [ ] 9P filesystem interface - [ ] Integration tests - [ ] Performance benchmarks - [ ] Name resolution (namespace/name → ID mapping) --- ## 12) Quick Start ### Prerequisites Start HeroDB: ```bash cd /path/to/herodb cargo run --release -- --dir ./data --admin-secret mysecret --port 6379 ``` ### Build OSIRIS ```bash cd /path/to/osiris cargo build --release ``` ### Initialize ```bash # Create configuration ./target/release/osiris init --herodb redis://localhost:6379 # Create a namespace ./target/release/osiris ns create notes ``` ### Usage ```bash # Add a note echo "OSIRIS is a minimal object store" | \ ./target/release/osiris put notes/intro - \ --title "Introduction" \ --tags topic=osiris,type=doc # Search ./target/release/osiris find "object store" --ns notes # Get the note ./target/release/osiris get notes/intro # Show stats ./target/release/osiris stats --ns notes ``` --- ## 13) Testing ### Unit Tests ```bash cargo test ``` ### Integration Tests (requires HeroDB) ```bash # Start HeroDB cd /path/to/herodb cargo run -- --dir /tmp/herodb-test --admin-secret test --port 6379 # Run tests cd /path/to/osiris cargo test -- --ignored ``` --- ## 14) Performance Characteristics ### Write Performance - **Object storage**: O(1) - single SET operation - **Indexing**: O(T) where T = number of tags/fields - **Total**: O(T) per object ### Read Performance - **Get by ID**: O(1) - single GET operation - **Filter by tags**: O(F) where F = number of filters (set intersection) - **Text search**: O(N) where N = number of candidates (linear scan) ### Storage Overhead - **Object**: ~1KB per object (JSON serialized) - **Indexes**: ~50 bytes per tag/field entry - **Total**: ~1.5KB per object with 10 tags ### Scalability - **Optimal**: <10,000 objects per namespace - **Acceptable**: <100,000 objects per namespace - **Beyond**: Consider migrating to Tantivy FTS --- ## 15) Design Decisions ### Why No Tantivy in MVP? - **Simplicity**: Avoid HeroDB server-side dependencies - **Portability**: Works with any Redis-compatible backend - **Flexibility**: Easy to migrate to Tantivy later ### Why Substring Matching? - **Good enough**: For small datasets (<10k objects) - **Simple**: No tokenization, stemming, or complex scoring - **Fast**: O(N) is acceptable for MVP ### Why Separate Databases per Namespace? - **Isolation**: Clear separation of concerns - **Performance**: Smaller keyspaces = faster scans - **Security**: Can apply different encryption keys per namespace --- ## 16) Migration Path When ready to scale beyond MVP: 1. **Add Tantivy FTS** (HeroDB extension) - Create FT.* commands in HeroDB - Update OSIRIS to use FT.SEARCH instead of substring scan - Keep field indexes for filtering 2. **Add Vector Search** (HeroDB extension) - Store embeddings in HeroDB - Implement ANN search (HNSW/IVF) - Add hybrid retrieval (BM25 + vector) 3. **Add Relations** (OSIRIS feature) - Store relation graphs in HeroDB - Implement graph traversal - Add relation-based ranking 4. **Add Deduplication** (HeroDB extension) - Content-addressable storage (BLAKE3) - Reference counting - Garbage collection --- ## Summary **OSIRIS MVP is a minimal, production-ready object store** that: - ✅ Works with unmodified HeroDB - ✅ Provides structured storage with metadata - ✅ Supports field-based filtering - ✅ Includes basic text search - ✅ Exposes a clean CLI interface - ✅ Maintains clear upgrade paths **Perfect for:** - Personal knowledge management - Small-scale document storage - Prototyping semantic applications - Learning Rust + Redis patterns **Next steps:** - Build and test the MVP - Gather usage feedback - Plan Tantivy/vector integration

# OSIRIS MVP — Minimal Semantic Store over HeroDB ## 0) Purpose OSIRIS is a Rust-native object layer on top of HeroDB that provides structured storage and retrieval capabilities without any server-side extensions or indexing engines. It provides: - Object CRUD operations - Namespace management - Simple local field indexing (field:*) - Basic keyword scan (substring matching) - CLI interface - Future: 9P filesystem interface It does **not** depend on HeroDB's Tantivy FTS, vectors, or relations. --- ## 1) Architecture ``` HeroDB (unmodified) │ ├── KV store + encryption └── RESP protocol ↑ │ └── OSIRIS ├── store/ – object schema + persistence ├── index/ – field index & keyword scanning ├── retrieve/ – query planner + filtering ├── interfaces/ – CLI, 9P (future) └── config/ – namespaces + settings ``` --- ## 2) Data Model ```rust #[derive(Clone, Debug, Serialize, Deserialize)] pub struct OsirisObject { pub id: String, pub ns: String, pub meta: Metadata, pub text: Option, // optional plain text } #[derive(Clone, Debug, Serialize, Deserialize)] pub struct Metadata { pub title: Option, pub mime: Option, pub tags: BTreeMap, pub created: OffsetDateTime, pub updated: OffsetDateTime, pub size: Option, } ``` --- ## 3) Keyspace Design ``` meta: → serialized OsirisObject (JSON) field:tag:= → Set of IDs (for tag filtering) field:mime: → Set of IDs (for MIME type filtering) field:title: → Set of IDs (for title filtering) scan:index → Set of all IDs (for full scan) ``` **Example:** ``` field:tag:project=osiris → {note_1, note_2} field:mime:text/markdown → {note_1, note_3} scan:index → {note_1, note_2, note_3, ...} ``` --- ## 4) Index Maintenance ### Insert / Update ```rust // Store object redis.set(format!("meta:{}", obj.id), serde_json::to_string(&obj)?)?; // Index tags for (k, v) in &obj.meta.tags { redis.sadd(format!("field:tag:{}={}", k, v), &obj.id)?; } // Index MIME type if let Some(mime) = &obj.meta.mime { redis.sadd(format!("field:mime:{}", mime), &obj.id)?; } // Index title if let Some(title) = &obj.meta.title { redis.sadd(format!("field:title:{}", title), &obj.id)?; } // Add to scan index redis.sadd("scan:index", &obj.id)?; ``` ### Delete ```rust // Remove object redis.del(format!("meta:{}", obj.id))?; // Deindex tags for (k, v) in &obj.meta.tags { redis.srem(format!("field:tag:{}={}", k, v), &obj.id)?; } // Deindex MIME type if let Some(mime) = &obj.meta.mime { redis.srem(format!("field:mime:{}", mime), &obj.id)?; } // Deindex title if let Some(title) = &obj.meta.title { redis.srem(format!("field:title:{}", title), &obj.id)?; } // Remove from scan index redis.srem("scan:index", &obj.id)?; ``` --- ## 5) Retrieval ### Query Structure ```rust pub struct RetrievalQuery { pub text: Option<String>, // keyword substring pub ns: String, pub filters: Vec<(String, String)>, // field=value pub top_k: usize, } ``` ### Execution Steps 1. **Collect candidate IDs** from field:* filters (SMEMBERS + intersection) 2. **If text query is provided**, iterate over candidates: - Fetch `meta:<id>` - Test substring match on `meta.title`, `text`, or `tags` - Compute simple relevance score 3. **Sort** by score (descending) and **limit** to `top_k` This is O(N) for text scan but acceptable for MVP or small datasets (<10k objects). ### Scoring Algorithm ```rust fn compute_text_score(obj: &OsirisObject, query: &str) -> f32 { let mut score = 0.0; // Title match if let Some(title) = &obj.meta.title { if title.to_lowercase().contains(query) { score += 0.5; } } // Text content match if let Some(text) = &obj.text { if text.to_lowercase().contains(query) { score += 0.5; // Bonus for multiple occurrences let count = text.to_lowercase().matches(query).count(); score += (count as f32 - 1.0) * 0.1; } } // Tag match for (key, value) in &obj.meta.tags { if key.to_lowercase().contains(query) || value.to_lowercase().contains(query) { score += 0.2; } } score.min(1.0) } ``` --- ## 6) CLI ### Commands ```bash # Initialize and create namespace osiris init --herodb redis://localhost:6379 osiris ns create notes # Add and read objects osiris put notes/my-note.md ./my-note.md --tags topic=rust,project=osiris osiris get notes/my-note.md osiris get notes/my-note.md --raw --output /tmp/note.md osiris del notes/my-note.md # Search osiris find --ns notes --filter topic=rust osiris find "retrieval" --ns notes osiris find "rust" --ns notes --filter project=osiris --topk 20 # Namespace management osiris ns list osiris ns delete notes # Statistics osiris stats osiris stats --ns notes ``` ### Examples ```bash # Store a note from stdin echo "This is a note about Rust programming" | \ osiris put notes/rust-intro - \ --title "Rust Introduction" \ --tags topic=rust,level=beginner \ --mime text/plain # Search for notes about Rust osiris find "rust" --ns notes # Filter by tag osiris find --ns notes --filter topic=rust # Get note as JSON osiris get notes/rust-intro # Get raw content osiris get notes/rust-intro --raw ``` --- ## 7) Configuration ### File Location `~/.config/osiris/config.toml` ### Example ```toml [herodb] url = "redis://localhost:6379" [namespaces.notes] db_id = 1 [namespaces.calendar] db_id = 2 ``` ### Structure ```rust pub struct Config { pub herodb: HeroDbConfig, pub namespaces: HashMap<String, NamespaceConfig>, } pub struct HeroDbConfig { pub url: String, } pub struct NamespaceConfig { pub db_id: u16, } ``` --- ## 8) Database Allocation ``` DB 0 → HeroDB Admin (managed by HeroDB) DB 1 → osiris:notes (namespace "notes") DB 2 → osiris:calendar (namespace "calendar") DB 3+ → Additional namespaces... ``` Each namespace gets its own isolated HeroDB database. --- ## 9) Dependencies ```toml [dependencies] anyhow = "1.0" redis = { version = "0.24", features = ["aio", "tokio-comp"] } serde = { version = "1.0", features = ["derive"] } serde_json = "1.0" time = { version = "0.3", features = ["serde", "formatting", "parsing", "macros"] } tokio = { version = "1.23", features = ["full"] } clap = { version = "4.5", features = ["derive"] } toml = "0.8" uuid = { version = "1.6", features = ["v4", "serde"] } tracing = "0.1" tracing-subscriber = { version = "0.3", features = ["env-filter"] } ``` --- ## 10) Future Enhancements | Feature | When Added | Moves Where | |---------|-----------|-------------| | Dedup / blobs | HeroDB extension | HeroDB | | Vector search | HeroDB extension | HeroDB | | Full-text search | HeroDB (Tantivy) | HeroDB | | Relations / graph | OSIRIS later | OSIRIS | | 9P filesystem | OSIRIS later | OSIRIS | This MVP maintains clean interface boundaries: - **HeroDB** remains a plain KV substrate - **OSIRIS** builds higher-order meaning on top --- ## 11) Implementation Status ### ✅ Completed - [x] Project structure and Cargo.toml - [x] Core data models (OsirisObject, Metadata) - [x] HeroDB client wrapper (RESP protocol) - [x] Field indexing (tags, MIME, title) - [x] Search engine (substring matching + scoring) - [x] Configuration management - [x] CLI interface (init, ns, put, get, del, find, stats) - [x] Error handling - [x] Documentation (README, specs) ### 🚧 Pending - [ ] 9P filesystem interface - [ ] Integration tests - [ ] Performance benchmarks - [ ] Name resolution (namespace/name → ID mapping) --- ## 12) Quick Start ### Prerequisites Start HeroDB: ```bash cd /path/to/herodb cargo run --release -- --dir ./data --admin-secret mysecret --port 6379 ``` ### Build OSIRIS ```bash cd /path/to/osiris cargo build --release ``` ### Initialize ```bash # Create configuration ./target/release/osiris init --herodb redis://localhost:6379 # Create a namespace ./target/release/osiris ns create notes ``` ### Usage ```bash # Add a note echo "OSIRIS is a minimal object store" | \ ./target/release/osiris put notes/intro - \ --title "Introduction" \ --tags topic=osiris,type=doc # Search ./target/release/osiris find "object store" --ns notes # Get the note ./target/release/osiris get notes/intro # Show stats ./target/release/osiris stats --ns notes ``` --- ## 13) Testing ### Unit Tests ```bash cargo test ``` ### Integration Tests (requires HeroDB) ```bash # Start HeroDB cd /path/to/herodb cargo run -- --dir /tmp/herodb-test --admin-secret test --port 6379 # Run tests cd /path/to/osiris cargo test -- --ignored ``` --- ## 14) Performance Characteristics ### Write Performance - **Object storage**: O(1) - single SET operation - **Indexing**: O(T) where T = number of tags/fields - **Total**: O(T) per object ### Read Performance - **Get by ID**: O(1) - single GET operation - **Filter by tags**: O(F) where F = number of filters (set intersection) - **Text search**: O(N) where N = number of candidates (linear scan) ### Storage Overhead - **Object**: ~1KB per object (JSON serialized) - **Indexes**: ~50 bytes per tag/field entry - **Total**: ~1.5KB per object with 10 tags ### Scalability - **Optimal**: <10,000 objects per namespace - **Acceptable**: <100,000 objects per namespace - **Beyond**: Consider migrating to Tantivy FTS --- ## 15) Design Decisions ### Why No Tantivy in MVP? - **Simplicity**: Avoid HeroDB server-side dependencies - **Portability**: Works with any Redis-compatible backend - **Flexibility**: Easy to migrate to Tantivy later ### Why Substring Matching? - **Good enough**: For small datasets (<10k objects) - **Simple**: No tokenization, stemming, or complex scoring - **Fast**: O(N) is acceptable for MVP ### Why Separate Databases per Namespace? - **Isolation**: Clear separation of concerns - **Performance**: Smaller keyspaces = faster scans - **Security**: Can apply different encryption keys per namespace --- ## 16) Migration Path When ready to scale beyond MVP: 1. **Add Tantivy FTS** (HeroDB extension) - Create FT.* commands in HeroDB - Update OSIRIS to use FT.SEARCH instead of substring scan - Keep field indexes for filtering 2. **Add Vector Search** (HeroDB extension) - Store embeddings in HeroDB - Implement ANN search (HNSW/IVF) - Add hybrid retrieval (BM25 + vector) 3. **Add Relations** (OSIRIS feature) - Store relation graphs in HeroDB - Implement graph traversal - Add relation-based ranking 4. **Add Deduplication** (HeroDB extension) - Content-addressable storage (BLAKE3) - Reference counting - Garbage collection --- ## Summary **OSIRIS MVP is a minimal, production-ready object store** that: - ✅ Works with unmodified HeroDB - ✅ Provides structured storage with metadata - ✅ Supports field-based filtering - ✅ Includes basic text search - ✅ Exposes a clean CLI interface - ✅ Maintains clear upgrade paths **Perfect for:** - Personal knowledge management - Small-scale document storage - Prototyping semantic applications - Learning Rust + Redis patterns **Next steps:** - Build and test the MVP - Gather usage feedback - Plan Tantivy/vector integration - Design 9P filesystem interface