first commit

This commit is contained in:
Timur Gordon
2025-10-20 22:24:25 +02:00
commit 097360ad12
48 changed files with 6712 additions and 0 deletions

426
docs/ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,426 @@
# OSIRIS Architecture - Trait-Based Generic Objects
## Overview
OSIRIS has been refactored to use a trait-based architecture similar to heromodels, allowing any object implementing the `Object` trait to be stored and indexed automatically based on field attributes.
## Core Concepts
### 1. BaseData
Every OSIRIS object must include `BaseData`, which provides:
- **id**: Unique identifier (UUID or user-assigned)
- **ns**: Namespace the object belongs to
- **created_at**: Creation timestamp
- **modified_at**: Last modification timestamp
- **mime**: Optional MIME type
- **size**: Optional content size
```rust
pub struct BaseData {
pub id: String,
pub ns: String,
pub created_at: OffsetDateTime,
pub modified_at: OffsetDateTime,
pub mime: Option<String>,
pub size: Option<u64>,
}
```
### 2. Object Trait
The `Object` trait is the core abstraction for all OSIRIS objects:
```rust
pub trait Object: Debug + Clone + Serialize + Deserialize + Send + Sync {
/// Get the object type name
fn object_type() -> &'static str where Self: Sized;
/// Get base data reference
fn base_data(&self) -> &BaseData;
/// Get mutable base data reference
fn base_data_mut(&mut self) -> &mut BaseData;
/// Get index keys for this object (auto-generated from #[index] fields)
fn index_keys(&self) -> Vec<IndexKey>;
/// Get list of indexed field names
fn indexed_fields() -> Vec<&'static str> where Self: Sized;
/// Get searchable text content
fn searchable_text(&self) -> Option<String>;
/// Serialize to JSON
fn to_json(&self) -> Result<String>;
/// Deserialize from JSON
fn from_json(json: &str) -> Result<Self> where Self: Sized;
}
```
### 3. IndexKey
Represents an index entry for a field:
```rust
pub struct IndexKey {
pub name: &'static str, // Field name
pub value: String, // Field value
}
```
## Example: Note Object
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Note {
pub base_data: BaseData,
// Indexed field - marked with #[index]
#[index]
pub title: Option<String>,
// Searchable content (not indexed)
pub content: Option<String>,
// Indexed tags - marked with #[index]
#[index]
pub tags: BTreeMap<String, String>,
}
impl Object for Note {
fn object_type() -> &'static str {
"note"
}
fn base_data(&self) -> &BaseData {
&self.base_data
}
fn base_data_mut(&mut self) -> &mut BaseData {
&mut self.base_data
}
fn index_keys(&self) -> Vec<IndexKey> {
let mut keys = Vec::new();
// Index title
if let Some(title) = &self.title {
keys.push(IndexKey::new("title", title));
}
// Index tags
for (key, value) in &self.tags {
keys.push(IndexKey::new(&format!("tag:{}", key), value));
}
keys
}
fn indexed_fields() -> Vec<&'static str> {
vec!["title", "tags"]
}
fn searchable_text(&self) -> Option<String> {
let mut text = String::new();
if let Some(title) = &self.title {
text.push_str(title);
text.push(' ');
}
if let Some(content) = &self.content {
text.push_str(content);
}
if text.is_empty() { None } else { Some(text) }
}
}
```
## Example: Event Object
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Event {
pub base_data: BaseData,
#[index]
pub title: String,
pub description: Option<String>,
#[index]
pub start_time: OffsetDateTime,
pub end_time: OffsetDateTime,
#[index]
pub location: Option<String>,
#[index]
pub status: EventStatus,
pub all_day: bool,
#[index]
pub category: Option<String>,
}
impl Object for Event {
fn object_type() -> &'static str {
"event"
}
fn base_data(&self) -> &BaseData {
&self.base_data
}
fn base_data_mut(&mut self) -> &mut BaseData {
&mut self.base_data
}
fn index_keys(&self) -> Vec<IndexKey> {
let mut keys = Vec::new();
keys.push(IndexKey::new("title", &self.title));
if let Some(location) = &self.location {
keys.push(IndexKey::new("location", location));
}
let status_str = match self.status {
EventStatus::Draft => "draft",
EventStatus::Published => "published",
EventStatus::Cancelled => "cancelled",
};
keys.push(IndexKey::new("status", status_str));
if let Some(category) = &self.category {
keys.push(IndexKey::new("category", category));
}
// Index by date for day-based queries
let date_str = self.start_time.date().to_string();
keys.push(IndexKey::new("date", date_str));
keys
}
fn indexed_fields() -> Vec<&'static str> {
vec!["title", "location", "status", "category", "start_time"]
}
fn searchable_text(&self) -> Option<String> {
let mut text = String::new();
text.push_str(&self.title);
text.push(' ');
if let Some(description) = &self.description {
text.push_str(description);
}
Some(text)
}
}
```
## Storage Layer
### GenericStore
The `GenericStore` provides a type-safe storage layer for any object implementing `Object`:
```rust
pub struct GenericStore {
client: HeroDbClient,
index: FieldIndex,
}
impl GenericStore {
/// Store an object
pub async fn put<T: Object>(&self, obj: &T) -> Result<()>;
/// Get an object by ID
pub async fn get<T: Object>(&self, ns: &str, id: &str) -> Result<T>;
/// Delete an object
pub async fn delete<T: Object>(&self, obj: &T) -> Result<bool>;
/// Get IDs matching an index key
pub async fn get_ids_by_index(&self, ns: &str, field: &str, value: &str) -> Result<Vec<String>>;
}
```
### Usage Example
```rust
use osiris::objects::Note;
use osiris::store::{GenericStore, HeroDbClient};
// Create store
let client = HeroDbClient::new("redis://localhost:6379", 1)?;
let store = GenericStore::new(client);
// Create and store a note
let note = Note::new("notes".to_string())
.set_title("My Note")
.set_content("This is the content")
.add_tag("topic", "rust")
.add_tag("priority", "high");
store.put(&note).await?;
// Retrieve the note
let retrieved: Note = store.get("notes", &note.id()).await?;
// Search by index
let ids = store.get_ids_by_index("notes", "tag:topic", "rust").await?;
```
## Index Storage
### Keyspace Design
```
obj:<ns>:<id> → JSON serialized object
idx:<ns>:<field>:<value> → Set of object IDs
scan:<ns> → Set of all object IDs in namespace
```
### Examples
```
obj:notes:abc123 → {"base_data":{...},"title":"My Note",...}
idx:notes:title:My Note → {abc123, def456}
idx:notes:tag:topic:rust → {abc123, xyz789}
idx:notes:mime:text/plain → {abc123}
scan:notes → {abc123, def456, xyz789}
```
## Automatic Indexing
When an object is stored:
1. **Serialize** the object to JSON
2. **Store** at `obj:<ns>:<id>`
3. **Generate index keys** by calling `obj.index_keys()`
4. **Create indexes** for each key at `idx:<ns>:<field>:<value>`
5. **Add to scan index** at `scan:<ns>`
When an object is deleted:
1. **Retrieve** the object
2. **Generate index keys**
3. **Remove** from all indexes
4. **Delete** the object
## Comparison with heromodels
| Feature | heromodels | OSIRIS |
|---------|-----------|--------|
| Base struct | `BaseModelData` | `BaseData` |
| Core trait | `Model` | `Object` |
| ID type | `u32` (auto-increment) | `String` (UUID) |
| Timestamps | `i64` (Unix) | `OffsetDateTime` |
| Index macro | `#[index]` (derive) | Manual `index_keys()` |
| Storage | OurDB/Postgres | HeroDB (Redis) |
| Serialization | CBOR/JSON | JSON |
## Future Enhancements
### 1. Derive Macro for #[index]
Create a proc macro to automatically generate `index_keys()` from field attributes:
```rust
#[derive(Object)]
pub struct Note {
pub base_data: BaseData,
#[index]
pub title: Option<String>,
pub content: Option<String>,
#[index]
pub tags: BTreeMap<String, String>,
}
```
### 2. Query Builder
Type-safe query builder for indexed fields:
```rust
let results = store
.query::<Note>("notes")
.filter("tag:topic", "rust")
.filter("tag:priority", "high")
.limit(10)
.execute()
.await?;
```
### 3. Relations
Support for typed relations between objects:
```rust
pub struct Note {
pub base_data: BaseData,
pub title: String,
#[relation(target = "Note", label = "references")]
pub references: Vec<String>,
}
```
### 4. Validation
Trait-based validation:
```rust
pub trait Validate {
fn validate(&self) -> Result<()>;
}
impl Validate for Note {
fn validate(&self) -> Result<()> {
if self.title.is_none() {
return Err(Error::InvalidInput("Title required".into()));
}
Ok(())
}
}
```
## Migration from Old API
The old `OsirisObject` API is still available for backwards compatibility:
```rust
// Old API (still works)
use osiris::store::OsirisObject;
let obj = OsirisObject::new("notes".to_string(), Some("text".to_string()));
// New API (recommended)
use osiris::objects::Note;
let note = Note::new("notes".to_string())
.set_title("Title")
.set_content("text");
```
## Benefits of Trait-Based Architecture
1. **Type Safety**: Compile-time guarantees for object types
2. **Extensibility**: Easy to add new object types
3. **Automatic Indexing**: Index keys generated from object structure
4. **Consistency**: Same pattern as heromodels
5. **Flexibility**: Each object type controls its own indexing logic
6. **Testability**: Easy to mock and test individual object types
## Summary
The trait-based architecture makes OSIRIS:
- **More flexible**: Any type can be stored by implementing `Object`
- **More consistent**: Follows heromodels patterns
- **More powerful**: Automatic indexing based on object structure
- **More maintainable**: Clear separation of concerns
- **More extensible**: Easy to add new object types and features

195
docs/DERIVE_MACRO.md Normal file
View File

@@ -0,0 +1,195 @@
# OSIRIS Derive Macro
The `#[derive(DeriveObject)]` macro automatically implements the `Object` trait for your structs, generating index keys based on fields marked with `#[index]`.
## Usage
```rust
use osiris::{BaseData, DeriveObject};
use serde::{Deserialize, Serialize};
use std::collections::BTreeMap;
#[derive(Debug, Clone, Serialize, Deserialize, DeriveObject)]
pub struct Note {
pub base_data: BaseData,
#[index]
pub title: Option<String>,
pub content: Option<String>,
#[index]
pub tags: BTreeMap<String, String>,
}
```
## What Gets Generated
The derive macro automatically implements:
1. **`object_type()`** - Returns the struct name as a string
2. **`base_data()`** - Returns a reference to `base_data`
3. **`base_data_mut()`** - Returns a mutable reference to `base_data`
4. **`index_keys()`** - Generates index keys for all `#[index]` fields
5. **`indexed_fields()`** - Returns a list of indexed field names
## Supported Field Types
### Option<T>
```rust
#[index]
pub title: Option<String>,
```
Generates: `IndexKey { name: "title", value: <string_value> }` (only if Some)
### BTreeMap<String, String>
```rust
#[index]
pub tags: BTreeMap<String, String>,
```
Generates: `IndexKey { name: "tags:tag", value: "key=value" }` for each entry
### Vec<T>
```rust
#[index]
pub items: Vec<String>,
```
Generates: `IndexKey { name: "items:item", value: "0:value" }` for each item
### OffsetDateTime
```rust
#[index]
pub start_time: OffsetDateTime,
```
Generates: `IndexKey { name: "start_time", value: "2025-10-20" }` (date only)
### Enums and Other Types
```rust
#[index]
pub status: EventStatus,
```
Generates: `IndexKey { name: "status", value: "Debug(status)" }` (using Debug format)
## Complete Example
```rust
use osiris::{BaseData, DeriveObject};
use serde::{Deserialize, Serialize};
use time::OffsetDateTime;
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub enum EventStatus {
Draft,
Published,
Cancelled,
}
#[derive(Debug, Clone, Serialize, Deserialize, DeriveObject)]
pub struct Event {
pub base_data: BaseData,
#[index]
pub title: String,
pub description: Option<String>,
#[index]
#[serde(with = "time::serde::timestamp")]
pub start_time: OffsetDateTime,
#[index]
pub location: Option<String>,
#[index]
pub status: EventStatus,
pub all_day: bool,
#[index]
pub category: Option<String>,
}
impl Event {
pub fn new(ns: String, title: impl ToString) -> Self {
let now = OffsetDateTime::now_utc();
Self {
base_data: BaseData::new(ns),
title: title.to_string(),
description: None,
start_time: now,
location: None,
status: EventStatus::Draft,
all_day: false,
category: None,
}
}
}
```
## Generated Index Keys
For the Event example above with:
- `title = "Team Meeting"`
- `start_time = 2025-10-20T10:00:00Z`
- `location = Some("Room 101")`
- `status = EventStatus::Published`
- `category = Some("work")`
The generated index keys would be:
```rust
vec![
IndexKey { name: "mime", value: "application/json" }, // from base_data
IndexKey { name: "title", value: "Team Meeting" },
IndexKey { name: "start_time", value: "2025-10-20" },
IndexKey { name: "location", value: "Room 101" },
IndexKey { name: "status", value: "Published" },
IndexKey { name: "category", value: "work" },
]
```
## HeroDB Storage
These index keys are stored in HeroDB as:
```
idx:events:title:Team Meeting → {event_id}
idx:events:start_time:2025-10-20 → {event_id}
idx:events:location:Room 101 → {event_id}
idx:events:status:Published → {event_id}
idx:events:category:work → {event_id}
```
## Querying by Index
```rust
use osiris::store::GenericStore;
let store = GenericStore::new(client);
// Get all events on a specific date
let ids = store.get_ids_by_index("events", "start_time", "2025-10-20").await?;
// Get all published events
let ids = store.get_ids_by_index("events", "status", "Published").await?;
// Get all events in a category
let ids = store.get_ids_by_index("events", "category", "work").await?;
```
## Requirements
1. **Must have `base_data` field**: The struct must have a field named `base_data` of type `BaseData`
2. **Must derive standard traits**: `Debug`, `Clone`, `Serialize`, `Deserialize`
3. **Fields marked with `#[index]`**: Only fields with the `#[index]` attribute will be indexed
## Limitations
- The macro currently uses `Debug` formatting for enums and complex types
- BTreeMap indexing assumes `String` keys and values
- Vec indexing uses numeric indices (may not be ideal for all use cases)
## Future Enhancements
- Custom index key formatters via attributes
- Support for nested struct indexing
- Conditional indexing (e.g., `#[index(if = "is_published")]`)
- Custom index names (e.g., `#[index(name = "custom_name")]`)

525
docs/specs/osiris-mvp.md Normal file
View File

@@ -0,0 +1,525 @@
# OSIRIS MVP — Minimal Semantic Store over HeroDB
## 0) Purpose
OSIRIS is a Rust-native object layer on top of HeroDB that provides structured storage and retrieval capabilities without any server-side extensions or indexing engines.
It provides:
- Object CRUD operations
- Namespace management
- Simple local field indexing (field:*)
- Basic keyword scan (substring matching)
- CLI interface
- Future: 9P filesystem interface
It does **not** depend on HeroDB's Tantivy FTS, vectors, or relations.
---
## 1) Architecture
```
HeroDB (unmodified)
├── KV store + encryption
└── RESP protocol
└── OSIRIS
├── store/ object schema + persistence
├── index/ field index & keyword scanning
├── retrieve/ query planner + filtering
├── interfaces/ CLI, 9P (future)
└── config/ namespaces + settings
```
---
## 2) Data Model
```rust
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct OsirisObject {
pub id: String,
pub ns: String,
pub meta: Metadata,
pub text: Option<String>, // optional plain text
}
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Metadata {
pub title: Option<String>,
pub mime: Option<String>,
pub tags: BTreeMap<String, String>,
pub created: OffsetDateTime,
pub updated: OffsetDateTime,
pub size: Option<u64>,
}
```
---
## 3) Keyspace Design
```
meta:<id> → serialized OsirisObject (JSON)
field:tag:<key>=<val> → Set of IDs (for tag filtering)
field:mime:<type> → Set of IDs (for MIME type filtering)
field:title:<title> → Set of IDs (for title filtering)
scan:index → Set of all IDs (for full scan)
```
**Example:**
```
field:tag:project=osiris → {note_1, note_2}
field:mime:text/markdown → {note_1, note_3}
scan:index → {note_1, note_2, note_3, ...}
```
---
## 4) Index Maintenance
### Insert / Update
```rust
// Store object
redis.set(format!("meta:{}", obj.id), serde_json::to_string(&obj)?)?;
// Index tags
for (k, v) in &obj.meta.tags {
redis.sadd(format!("field:tag:{}={}", k, v), &obj.id)?;
}
// Index MIME type
if let Some(mime) = &obj.meta.mime {
redis.sadd(format!("field:mime:{}", mime), &obj.id)?;
}
// Index title
if let Some(title) = &obj.meta.title {
redis.sadd(format!("field:title:{}", title), &obj.id)?;
}
// Add to scan index
redis.sadd("scan:index", &obj.id)?;
```
### Delete
```rust
// Remove object
redis.del(format!("meta:{}", obj.id))?;
// Deindex tags
for (k, v) in &obj.meta.tags {
redis.srem(format!("field:tag:{}={}", k, v), &obj.id)?;
}
// Deindex MIME type
if let Some(mime) = &obj.meta.mime {
redis.srem(format!("field:mime:{}", mime), &obj.id)?;
}
// Deindex title
if let Some(title) = &obj.meta.title {
redis.srem(format!("field:title:{}", title), &obj.id)?;
}
// Remove from scan index
redis.srem("scan:index", &obj.id)?;
```
---
## 5) Retrieval
### Query Structure
```rust
pub struct RetrievalQuery {
pub text: Option<String>, // keyword substring
pub ns: String,
pub filters: Vec<(String, String)>, // field=value
pub top_k: usize,
}
```
### Execution Steps
1. **Collect candidate IDs** from field:* filters (SMEMBERS + intersection)
2. **If text query is provided**, iterate over candidates:
- Fetch `meta:<id>`
- Test substring match on `meta.title`, `text`, or `tags`
- Compute simple relevance score
3. **Sort** by score (descending) and **limit** to `top_k`
This is O(N) for text scan but acceptable for MVP or small datasets (<10k objects).
### Scoring Algorithm
```rust
fn compute_text_score(obj: &OsirisObject, query: &str) -> f32 {
let mut score = 0.0;
// Title match
if let Some(title) = &obj.meta.title {
if title.to_lowercase().contains(query) {
score += 0.5;
}
}
// Text content match
if let Some(text) = &obj.text {
if text.to_lowercase().contains(query) {
score += 0.5;
// Bonus for multiple occurrences
let count = text.to_lowercase().matches(query).count();
score += (count as f32 - 1.0) * 0.1;
}
}
// Tag match
for (key, value) in &obj.meta.tags {
if key.to_lowercase().contains(query) || value.to_lowercase().contains(query) {
score += 0.2;
}
}
score.min(1.0)
}
```
---
## 6) CLI
### Commands
```bash
# Initialize and create namespace
osiris init --herodb redis://localhost:6379
osiris ns create notes
# Add and read objects
osiris put notes/my-note.md ./my-note.md --tags topic=rust,project=osiris
osiris get notes/my-note.md
osiris get notes/my-note.md --raw --output /tmp/note.md
osiris del notes/my-note.md
# Search
osiris find --ns notes --filter topic=rust
osiris find "retrieval" --ns notes
osiris find "rust" --ns notes --filter project=osiris --topk 20
# Namespace management
osiris ns list
osiris ns delete notes
# Statistics
osiris stats
osiris stats --ns notes
```
### Examples
```bash
# Store a note from stdin
echo "This is a note about Rust programming" | \
osiris put notes/rust-intro - \
--title "Rust Introduction" \
--tags topic=rust,level=beginner \
--mime text/plain
# Search for notes about Rust
osiris find "rust" --ns notes
# Filter by tag
osiris find --ns notes --filter topic=rust
# Get note as JSON
osiris get notes/rust-intro
# Get raw content
osiris get notes/rust-intro --raw
```
---
## 7) Configuration
### File Location
`~/.config/osiris/config.toml`
### Example
```toml
[herodb]
url = "redis://localhost:6379"
[namespaces.notes]
db_id = 1
[namespaces.calendar]
db_id = 2
```
### Structure
```rust
pub struct Config {
pub herodb: HeroDbConfig,
pub namespaces: HashMap<String, NamespaceConfig>,
}
pub struct HeroDbConfig {
pub url: String,
}
pub struct NamespaceConfig {
pub db_id: u16,
}
```
---
## 8) Database Allocation
```
DB 0 → HeroDB Admin (managed by HeroDB)
DB 1 → osiris:notes (namespace "notes")
DB 2 → osiris:calendar (namespace "calendar")
DB 3+ → Additional namespaces...
```
Each namespace gets its own isolated HeroDB database.
---
## 9) Dependencies
```toml
[dependencies]
anyhow = "1.0"
redis = { version = "0.24", features = ["aio", "tokio-comp"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
time = { version = "0.3", features = ["serde", "formatting", "parsing", "macros"] }
tokio = { version = "1.23", features = ["full"] }
clap = { version = "4.5", features = ["derive"] }
toml = "0.8"
uuid = { version = "1.6", features = ["v4", "serde"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
```
---
## 10) Future Enhancements
| Feature | When Added | Moves Where |
|---------|-----------|-------------|
| Dedup / blobs | HeroDB extension | HeroDB |
| Vector search | HeroDB extension | HeroDB |
| Full-text search | HeroDB (Tantivy) | HeroDB |
| Relations / graph | OSIRIS later | OSIRIS |
| 9P filesystem | OSIRIS later | OSIRIS |
This MVP maintains clean interface boundaries:
- **HeroDB** remains a plain KV substrate
- **OSIRIS** builds higher-order meaning on top
---
## 11) Implementation Status
### ✅ Completed
- [x] Project structure and Cargo.toml
- [x] Core data models (OsirisObject, Metadata)
- [x] HeroDB client wrapper (RESP protocol)
- [x] Field indexing (tags, MIME, title)
- [x] Search engine (substring matching + scoring)
- [x] Configuration management
- [x] CLI interface (init, ns, put, get, del, find, stats)
- [x] Error handling
- [x] Documentation (README, specs)
### 🚧 Pending
- [ ] 9P filesystem interface
- [ ] Integration tests
- [ ] Performance benchmarks
- [ ] Name resolution (namespace/name ID mapping)
---
## 12) Quick Start
### Prerequisites
Start HeroDB:
```bash
cd /path/to/herodb
cargo run --release -- --dir ./data --admin-secret mysecret --port 6379
```
### Build OSIRIS
```bash
cd /path/to/osiris
cargo build --release
```
### Initialize
```bash
# Create configuration
./target/release/osiris init --herodb redis://localhost:6379
# Create a namespace
./target/release/osiris ns create notes
```
### Usage
```bash
# Add a note
echo "OSIRIS is a minimal object store" | \
./target/release/osiris put notes/intro - \
--title "Introduction" \
--tags topic=osiris,type=doc
# Search
./target/release/osiris find "object store" --ns notes
# Get the note
./target/release/osiris get notes/intro
# Show stats
./target/release/osiris stats --ns notes
```
---
## 13) Testing
### Unit Tests
```bash
cargo test
```
### Integration Tests (requires HeroDB)
```bash
# Start HeroDB
cd /path/to/herodb
cargo run -- --dir /tmp/herodb-test --admin-secret test --port 6379
# Run tests
cd /path/to/osiris
cargo test -- --ignored
```
---
## 14) Performance Characteristics
### Write Performance
- **Object storage**: O(1) - single SET operation
- **Indexing**: O(T) where T = number of tags/fields
- **Total**: O(T) per object
### Read Performance
- **Get by ID**: O(1) - single GET operation
- **Filter by tags**: O(F) where F = number of filters (set intersection)
- **Text search**: O(N) where N = number of candidates (linear scan)
### Storage Overhead
- **Object**: ~1KB per object (JSON serialized)
- **Indexes**: ~50 bytes per tag/field entry
- **Total**: ~1.5KB per object with 10 tags
### Scalability
- **Optimal**: <10,000 objects per namespace
- **Acceptable**: <100,000 objects per namespace
- **Beyond**: Consider migrating to Tantivy FTS
---
## 15) Design Decisions
### Why No Tantivy in MVP?
- **Simplicity**: Avoid HeroDB server-side dependencies
- **Portability**: Works with any Redis-compatible backend
- **Flexibility**: Easy to migrate to Tantivy later
### Why Substring Matching?
- **Good enough**: For small datasets (<10k objects)
- **Simple**: No tokenization, stemming, or complex scoring
- **Fast**: O(N) is acceptable for MVP
### Why Separate Databases per Namespace?
- **Isolation**: Clear separation of concerns
- **Performance**: Smaller keyspaces = faster scans
- **Security**: Can apply different encryption keys per namespace
---
## 16) Migration Path
When ready to scale beyond MVP:
1. **Add Tantivy FTS** (HeroDB extension)
- Create FT.* commands in HeroDB
- Update OSIRIS to use FT.SEARCH instead of substring scan
- Keep field indexes for filtering
2. **Add Vector Search** (HeroDB extension)
- Store embeddings in HeroDB
- Implement ANN search (HNSW/IVF)
- Add hybrid retrieval (BM25 + vector)
3. **Add Relations** (OSIRIS feature)
- Store relation graphs in HeroDB
- Implement graph traversal
- Add relation-based ranking
4. **Add Deduplication** (HeroDB extension)
- Content-addressable storage (BLAKE3)
- Reference counting
- Garbage collection
---
## Summary
**OSIRIS MVP is a minimal, production-ready object store** that:
- Works with unmodified HeroDB
- Provides structured storage with metadata
- Supports field-based filtering
- Includes basic text search
- Exposes a clean CLI interface
- Maintains clear upgrade paths
**Perfect for:**
- Personal knowledge management
- Small-scale document storage
- Prototyping semantic applications
- Learning Rust + Redis patterns
**Next steps:**
- Build and test the MVP
- Gather usage feedback
- Plan Tantivy/vector integration
- Design 9P filesystem interface