This commit is contained in:
2025-08-23 04:58:41 +02:00
parent b68325016d
commit d4d3660bac
48 changed files with 29 additions and 38 deletions

188
docs/age.md Normal file
View File

@@ -0,0 +1,188 @@
# HeroDB AGE usage: Stateless vs KeyManaged
This document explains how to use the AGE cryptography commands exposed by HeroDB over the Redis protocol in two modes:
- Stateless (ephemeral keys; nothing stored on the server)
- Keymanaged (serverpersisted, named keys)
If you are new to the codebase, the exact tests that exercise these behaviors are:
- [rust.test_07_age_stateless_suite()](herodb/tests/usage_suite.rs:495)
- [rust.test_08_age_persistent_named_suite()](herodb/tests/usage_suite.rs:555)
Implementation entry points:
- [herodb/src/age.rs](herodb/src/age.rs)
- Dispatch from [herodb/src/cmd.rs](herodb/src/cmd.rs)
Note: Database-at-rest encryption flags in the test harness are unrelated to AGE commands; those flags control storage-level encryption of DB files. See the harness near [rust.start_test_server()](herodb/tests/usage_suite.rs:10).
## Quick start
Assuming the server is running on localhost on some $PORT:
```bash
~/code/git.ourworld.tf/herocode/herodb/herodb/build.sh
~/code/git.ourworld.tf/herocode/herodb/target/release/herodb --dir /tmp/data --debug --$PORT 6381 --encryption-key 1234 --encrypt
```
```bash
export PORT=6381
# Generate an ephemeral keypair and encrypt/decrypt a message (stateless mode)
redis-cli -p $PORT AGE GENENC
# → returns an array: [recipient, identity]
redis-cli -p $PORT AGE ENCRYPT <recipient> "hello world"
# → returns ciphertext (base64 in a bulk string)
redis-cli -p $PORT AGE DECRYPT <identity> <ciphertext_b64>
# → returns "hello world"
```
For keymanaged mode, generate a named key once and reference it by name afterwards:
```bash
redis-cli -p $PORT AGE KEYGEN app1
# → persists encryption keypair under name "app1"
redis-cli -p $PORT AGE ENCRYPTNAME app1 "hello"
redis-cli -p $PORT AGE DECRYPTNAME app1 <ciphertext_b64>
```
## Stateless AGE (ephemeral)
Characteristics
- No serverside storage of keys.
- You pass the actual key material with every call.
- Not listable via AGE LIST.
Commands and examples
1) Ephemeral encryption keys
```bash
# Generate an ephemeral encryption keypair
redis-cli -p $PORT AGE GENENC
# Example output (abridged):
# 1) "age1qz..." # recipient (public key) = can be used by others e.g. to verify what I sign
# 2) "AGE-SECRET-KEY-1..." # identity (secret) = is like my private, cannot lose this one
# Encrypt with the recipient public key
redis-cli -p $PORT AGE ENCRYPT "age1qz..." "hello world"
# → returns bulk string payload: base64 ciphertext (encrypted content)
# Decrypt with the identity (secret) in other words your private key
redis-cli -p $PORT AGE DECRYPT "AGE-SECRET-KEY-1..." "<ciphertext_b64>"
# → "hello world"
```
2) Ephemeral signing keys
> ? is this same as my private key
```bash
# Generate an ephemeral signing keypair
redis-cli -p $PORT AGE GENSIGN
# Example output:
# 1) "<verify_pub_b64>"
# 2) "<sign_secret_b64>"
# Sign a message with the secret
redis-cli -p $PORT AGE SIGN "<sign_secret_b64>" "msg"
# → returns "<signature_b64>"
# Verify with the public key
redis-cli -p $PORT AGE VERIFY "<verify_pub_b64>" "msg" "<signature_b64>"
# → 1 (valid) or 0 (invalid)
```
When to use
- You do not want the server to store private keys.
- You already manage key material on the client side.
- You need adhoc operations without persistence.
Reference test: [rust.test_07_age_stateless_suite()](herodb/tests/usage_suite.rs:495)
## Keymanaged AGE (persistent, named)
Characteristics
- Server generates and persists keypairs under a chosen name.
- Clients refer to keys by name; raw secrets are not supplied on each call.
- Keys are discoverable via AGE LIST.
Commands and examples
1) Named encryption keys
```bash
# Create/persist a named encryption keypair
redis-cli -p $PORT AGE KEYGEN app1
# → returns [recipient, identity] but also stores them under name "app1"
> TODO: should not return identity (security, but there can be separate function to export it e.g. AGE EXPORTKEY app1)
# Encrypt using the stored public key
redis-cli -p $PORT AGE ENCRYPTNAME app1 "hello"
# → returns bulk string payload: base64 ciphertext
# Decrypt using the stored secret
redis-cli -p $PORT AGE DECRYPTNAME app1 "<ciphertext_b64>"
# → "hello"
```
2) Named signing keys
```bash
# Create/persist a named signing keypair
redis-cli -p $PORT AGE SIGNKEYGEN app1
# → returns [verify_pub_b64, sign_secret_b64] and stores under name "app1"
> TODO: should not return sign_secret_b64 (for security, but there can be separate function to export it e.g. AGE EXPORTSIGNKEY app1)
# Sign using the stored secret
redis-cli -p $PORT AGE SIGNNAME app1 "msg"
# → returns "<signature_b64>"
# Verify using the stored public key
redis-cli -p $PORT AGE VERIFYNAME app1 "msg" "<signature_b64>"
# → 1 (valid) or 0 (invalid)
```
3) List stored AGE keys
```bash
redis-cli -p $PORT AGE LIST
# Example output includes labels such as "encpub" and your key names (e.g., "app1")
```
When to use
- You want centralized key storage/rotation and fewer secrets on the client.
- You need names/labels for workflows and can trust the server with secrets.
- You want discoverability (AGE LIST) and simpler client commands.
Reference test: [rust.test_08_age_persistent_named_suite()](herodb/tests/usage_suite.rs:555)
## Choosing a mode
- Prefer Stateless when:
- Minimizing server trust for secret material is the priority.
- Clients already have a secure mechanism to store/distribute keys.
- Prefer Keymanaged when:
- Centralized lifecycle, naming, and discoverability are beneficial.
- You plan to integrate rotation, ACLs, or auditability on the server side.
## Security notes
- Treat identities and signing secrets as sensitive; avoid logging them.
- For keymanaged mode, ensure server storage (and backups) are protected.
- AGE operations here are applicationlevel crypto and are distinct from database-at-rest encryption configured in the test harness.
## Repository pointers
- Stateless examples in tests: [rust.test_07_age_stateless_suite()](herodb/tests/usage_suite.rs:495)
- Keymanaged examples in tests: [rust.test_08_age_persistent_named_suite()](herodb/tests/usage_suite.rs:555)
- AGE implementation: [herodb/src/age.rs](herodb/src/age.rs)
- Command dispatch: [herodb/src/cmd.rs](herodb/src/cmd.rs)
- Bash demo: [herodb/examples/age_bash_demo.sh](herodb/examples/age_bash_demo.sh)
- Rust persistent demo: [herodb/examples/age_persist_demo.rs](herodb/examples/age_persist_demo.rs)
- Additional notes: [herodb/instructions/encrypt.md](herodb/instructions/encrypt.md)

623
docs/basics.md Normal file
View File

@@ -0,0 +1,623 @@
Here's an expanded version of the cmds.md documentation to include the list commands:
# HeroDB Commands
HeroDB implements a subset of Redis commands over the Redis protocol. This document describes the available commands and their usage.
## String Commands
### PING
Ping the server to test connectivity.
```bash
redis-cli -p $PORT PING
# → PONG
```
### ECHO
Echo the given message.
```bash
redis-cli -p $PORT ECHO "hello"
# → hello
```
### SET
Set a key to hold a string value.
```bash
redis-cli -p $PORT SET key value
# → OK
```
Options:
- EX seconds: Set expiration in seconds
- PX milliseconds: Set expiration in milliseconds
- NX: Only set if key doesn't exist
- XX: Only set if key exists
- GET: Return old value
Examples:
```bash
redis-cli -p $PORT SET key value EX 60
redis-cli -p $PORT SET key value PX 1000
redis-cli -p $PORT SET key value NX
redis-cli -p $PORT SET key value XX
redis-cli -p $PORT SET key value GET
```
### GET
Get the value of a key.
```bash
redis-cli -p $PORT GET key
# → value
```
### MGET
Get values of multiple keys.
```bash
redis-cli -p $PORT MGET key1 key2 key3
# → 1) "value1"
# 2) "value2"
# 3) (nil)
```
### MSET
Set multiple key-value pairs.
```bash
redis-cli -p $PORT MSET key1 value1 key2 value2
# → OK
```
### INCR
Increment the integer value of a key by 1.
```bash
redis-cli -p $PORT SET counter 10
redis-cli -p $PORT INCR counter
# → 11
```
### DEL
Delete a key.
```bash
redis-cli -p $PORT DEL key
# → 1
```
For multiple keys:
```bash
redis-cli -p $PORT DEL key1 key2 key3
# → number of keys deleted
```
### TYPE
Determine the type of a key.
```bash
redis-cli -p $PORT TYPE key
# → string
```
### EXISTS
Check if a key exists.
```bash
redis-cli -p $PORT EXISTS key
# → 1 (exists) or 0 (doesn't exist)
```
For multiple keys:
```bash
redis-cli -p $PORT EXISTS key1 key2 key3
# → count of existing keys
```
### EXPIRE / PEXPIRE
Set expiration time for a key.
```bash
redis-cli -p $PORT EXPIRE key 60
# → 1 (timeout set) or 0 (timeout not set)
redis-cli -p $PORT PEXPIRE key 1000
# → 1 (timeout set) or 0 (timeout not set)
```
### EXPIREAT / PEXPIREAT
Set expiration timestamp for a key.
```bash
redis-cli -p $PORT EXPIREAT key 1672531200
# → 1 (timeout set) or 0 (timeout not set)
redis-cli -p $PORT PEXPIREAT key 1672531200000
# → 1 (timeout set) or 0 (timeout not set)
```
### TTL
Get the time to live for a key.
```bash
redis-cli -p $PORT TTL key
# → remaining time in seconds
```
### PERSIST
Remove expiration from a key.
```bash
redis-cli -p $PORT PERSIST key
# → 1 (timeout removed) or 0 (key has no timeout)
```
## Hash Commands
### HSET
Set field-value pairs in a hash.
```bash
redis-cli -p $PORT HSET hashkey field1 value1 field2 value2
# → number of fields added
```
### HGET
Get value of a field in a hash.
```bash
redis-cli -p $PORT HGET hashkey field1
# → value1
```
### HGETALL
Get all field-value pairs in a hash.
```bash
redis-cli -p $PORT HGETALL hashkey
# → 1) "field1"
# 2) "value1"
# 3) "field2"
# 4) "value2"
```
### HDEL
Delete fields from a hash.
```bash
redis-cli -p $PORT HDEL hashkey field1 field2
# → number of fields deleted
```
### HEXISTS
Check if a field exists in a hash.
```bash
redis-cli -p $PORT HEXISTS hashkey field1
# → 1 (exists) or 0 (doesn't exist)
```
### HKEYS
Get all field names in a hash.
```bash
redis-cli -p $PORT HKEYS hashkey
# → 1) "field1"
# 2) "field2"
```
### HVALS
Get all values in a hash.
```bash
redis-cli -p $PORT HVALS hashkey
# → 1) "value1"
# 2) "value2"
```
### HLEN
Get number of fields in a hash.
```bash
redis-cli -p $PORT HLEN hashkey
# → number of fields
```
### HMGET
Get values of multiple fields in a hash.
```bash
redis-cli -p $PORT HMGET hashkey field1 field2 field3
# → 1) "value1"
# 2) "value2"
# 3) (nil)
```
### HSETNX
Set field-value pair in hash only if field doesn't exist.
```bash
redis-cli -p $PORT HSETNX hashkey field1 value1
# → 1 (field set) or 0 (field not set)
```
### HINCRBY
Increment integer value of a field in a hash.
```bash
redis-cli -p $PORT HINCRBY hashkey field1 5
# → new value
```
### HINCRBYFLOAT
Increment float value of a field in a hash.
```bash
redis-cli -p $PORT HINCRBYFLOAT hashkey field1 3.14
# → new value
```
### HSCAN
Incrementally iterate over fields in a hash.
```bash
redis-cli -p $PORT HSCAN hashkey 0
# → 1) "next_cursor"
# 2) 1) "field1"
# 2) "value1"
# 3) "field2"
# 4) "value2"
```
Options:
- MATCH pattern: Filter fields by pattern
- COUNT number: Suggest number of fields to return
Examples:
```bash
redis-cli -p $PORT HSCAN hashkey 0 MATCH f*
redis-cli -p $PORT HSCAN hashkey 0 COUNT 10
redis-cli -p $PORT HSCAN hashkey 0 MATCH f* COUNT 10
```
## List Commands
### LPUSH
Insert elements at the head of a list.
```bash
redis-cli -p $PORT LPUSH listkey element1 element2 element3
# → number of elements in the list
```
### RPUSH
Insert elements at the tail of a list.
```bash
redis-cli -p $PORT RPUSH listkey element1 element2 element3
# → number of elements in the list
```
### LPOP
Remove and return elements from the head of a list.
```bash
redis-cli -p $PORT LPOP listkey
# → element1
```
With count:
```bash
redis-cli -p $PORT LPOP listkey 2
# → 1) "element1"
# 2) "element2"
```
### RPOP
Remove and return elements from the tail of a list.
```bash
redis-cli -p $PORT RPOP listkey
# → element3
```
With count:
```bash
redis-cli -p $PORT RPOP listkey 2
# → 1) "element3"
# 2) "element2"
```
### LLEN
Get the length of a list.
```bash
redis-cli -p $PORT LLEN listkey
# → number of elements in the list
```
### LINDEX
Get element at index in a list.
```bash
redis-cli -p $PORT LINDEX listkey 0
# → first element
```
Negative indices count from the end:
```bash
redis-cli -p $PORT LINDEX listkey -1
# → last element
```
### LRANGE
Get a range of elements from a list.
```bash
redis-cli -p $PORT LRANGE listkey 0 -1
# → 1) "element1"
# 2) "element2"
# 3) "element3"
```
### LTRIM
Trim a list to specified range.
```bash
redis-cli -p $PORT LTRIM listkey 0 1
# → OK (list now contains only first 2 elements)
```
### LREM
Remove elements from a list.
```bash
redis-cli -p $PORT LREM listkey 2 element1
# → number of elements removed
```
Count values:
- Positive: Remove from head
- Negative: Remove from tail
- Zero: Remove all
### LINSERT
Insert element before or after pivot element.
```bash
redis-cli -p $PORT LINSERT listkey BEFORE pivot newelement
# → number of elements in the list
```
### BLPOP
Blocking remove and return elements from the head of a list.
```bash
redis-cli -p $PORT BLPOP listkey1 listkey2 5
# → 1) "listkey1"
# 2) "element1"
```
If no elements are available, blocks for specified timeout (in seconds) until an element is pushed to one of the lists.
### BRPOP
Blocking remove and return elements from the tail of a list.
```bash
redis-cli -p $PORT BRPOP listkey1 listkey2 5
# → 1) "listkey1"
# 2) "element1"
```
If no elements are available, blocks for specified timeout (in seconds) until an element is pushed to one of the lists.
## Keyspace Commands
### KEYS
Get all keys matching pattern.
```bash
redis-cli -p $PORT KEYS *
# → 1) "key1"
# 2) "key2"
```
### SCAN
Incrementally iterate over keys.
```bash
redis-cli -p $PORT SCAN 0
# → 1) "next_cursor"
# 2) 1) "key1"
# 2) "key2"
```
Options:
- MATCH pattern: Filter keys by pattern
- COUNT number: Suggest number of keys to return
Examples:
```bash
redis-cli -p $PORT SCAN 0 MATCH k*
redis-cli -p $PORT SCAN 0 COUNT 10
redis-cli -p $PORT SCAN 0 MATCH k* COUNT 10
```
### DBSIZE
Get number of keys in current database.
```bash
redis-cli -p $PORT DBSIZE
# → number of keys
```
### FLUSHDB
Remove all keys from current database.
```bash
redis-cli -p $PORT FLUSHDB
# → OK
```
## Configuration Commands
### CONFIG GET
Get configuration parameter.
```bash
redis-cli -p $PORT CONFIG GET dir
# → 1) "dir"
# 2) "/path/to/db"
redis-cli -p $PORT CONFIG GET dbfilename
# → 1) "dbfilename"
# 2) "0.db"
```
## Client Commands
### CLIENT SETNAME
Set current connection name.
```bash
redis-cli -p $PORT CLIENT SETNAME myconnection
# → OK
```
### CLIENT GETNAME
Get current connection name.
```bash
redis-cli -p $PORT CLIENT GETNAME
# → myconnection
```
## Transaction Commands
### MULTI
Start a transaction block.
```bash
redis-cli -p $PORT MULTI
# → OK
```
### EXEC
Execute all commands in transaction block.
```bash
redis-cli -p $PORT MULTI
redis-cli -p $PORT SET key1 value1
redis-cli -p $PORT SET key2 value2
redis-cli -p $PORT EXEC
# → 1) OK
# 2) OK
```
### DISCARD
Discard all commands in transaction block.
```bash
redis-cli -p $PORT MULTI
redis-cli -p $PORT SET key1 value1
redis-cli -p $PORT DISCARD
# → OK
```
## AGE Commands
### AGE GENENC
Generate ephemeral encryption keypair.
```bash
redis-cli -p $PORT AGE GENENC
# → 1) "recipient_public_key"
# 2) "identity_secret_key"
```
### AGE ENCRYPT
Encrypt message with recipient public key.
```bash
redis-cli -p $PORT AGE ENCRYPT recipient_public_key "message"
# → base64_encoded_ciphertext
```
### AGE DECRYPT
Decrypt ciphertext with identity secret key.
```bash
redis-cli -p $PORT AGE DECRYPT identity_secret_key base64_encoded_ciphertext
# → decrypted_message
```
### AGE GENSIGN
Generate ephemeral signing keypair.
```bash
redis-cli -p $PORT AGE GENSIGN
# → 1) "verify_public_key"
# 2) "sign_secret_key"
```
### AGE SIGN
Sign message with signing secret key.
```bash
redis-cli -p $PORT AGE SIGN sign_secret_key "message"
# → base64_encoded_signature
```
### AGE VERIFY
Verify signature with verify public key.
```bash
redis-cli -p $PORT AGE VERIFY verify_public_key "message" base64_encoded_signature
# → 1 (valid) or 0 (invalid)
```
### AGE KEYGEN
Generate and persist named encryption keypair.
```bash
redis-cli -p $PORT AGE KEYGEN keyname
# → 1) "recipient_public_key"
# 2) "identity_secret_key"
```
### AGE SIGNKEYGEN
Generate and persist named signing keypair.
```bash
redis-cli -p $PORT AGE SIGNKEYGEN keyname
# → 1) "verify_public_key"
# 2) "sign_secret_key"
```
### AGE ENCRYPTNAME
Encrypt message with named key.
```bash
redis-cli -p $PORT AGE ENCRYPTNAME keyname "message"
# → base64_encoded_ciphertext
```
### AGE DECRYPTNAME
Decrypt ciphertext with named key.
```bash
redis-cli -p $PORT AGE DECRYPTNAME keyname base64_encoded_ciphertext
# → decrypted_message
```
### AGE SIGNNAME
Sign message with named signing key.
```bash
redis-cli -p $PORT AGE SIGNNAME keyname "message"
# → base64_encoded_signature
```
### AGE VERIFYNAME
Verify signature with named verify key.
```bash
redis-cli -p $PORT AGE VERIFYNAME keyname "message" base64_encoded_signature
# → 1 (valid) or 0 (invalid)
```
### AGE LIST
List all stored AGE keys.
```bash
redis-cli -p $PORT AGE LIST
# → 1) "keyname1"
# 2) "keyname2"
```
## Server Information Commands
### INFO
Get server information.
```bash
redis-cli -p $PORT INFO
# → Server information
```
With section:
```bash
redis-cli -p $PORT INFO replication
# → Replication information
```
### COMMAND
Get command information (stub implementation).
```bash
redis-cli -p $PORT COMMAND
# → Empty array (stub)
```
## Database Selection
### SELECT
Select database by index.
```bash
redis-cli -p $PORT SELECT 0
# → OK
```
```
This expanded documentation includes all the list commands that were implemented in the cmd.rs file:
1. LPUSH - push elements to the left (head) of a list
2. RPUSH - push elements to the right (tail) of a list
3. LPOP - pop elements from the left (head) of a list
4. RPOP - pop elements from the right (tail) of a list
5. BLPOP - blocking pop from the left with timeout
6. BRPOP - blocking pop from the right with timeout
7. LLEN - get list length
8. LREM - remove elements from list
9. LTRIM - trim list to range
10. LINDEX - get element by index
11. LRANGE - get range of elements

125
docs/cmds.md Normal file
View File

@@ -0,0 +1,125 @@
## Backend Support
HeroDB supports two storage backends, both with full encryption support:
- **redb** (default): Full-featured, optimized for production use
- **sled**: Alternative embedded database with encryption support
### Starting HeroDB with Different Backends
```bash
# Use default redb backend
./target/release/herodb --dir /tmp/herodb_redb --port 6379
# Use sled backend
./target/release/herodb --dir /tmp/herodb_sled --port 6379 --sled
# Use redb with encryption
./target/release/herodb --dir /tmp/herodb_encrypted --port 6379 --encrypt --key mysecretkey
# Use sled with encryption
./target/release/herodb --dir /tmp/herodb_sled_encrypted --port 6379 --sled --encrypt --key mysecretkey
```
### Command Support by Backend
Command Category | redb | sled | Notes |
|-----------------|------|------|-------|
**Strings** | | | |
SET | ✅ | ✅ | Full support |
GET | ✅ | ✅ | Full support |
DEL | ✅ | ✅ | Full support |
EXISTS | ✅ | ✅ | Full support |
INCR/DECR | ✅ | ✅ | Full support |
MGET/MSET | ✅ | ✅ | Full support |
**Hashes** | | | |
HSET | ✅ | ✅ | Full support |
HGET | ✅ | ✅ | Full support |
HGETALL | ✅ | ✅ | Full support |
HDEL | ✅ | ✅ | Full support |
HEXISTS | ✅ | ✅ | Full support |
HKEYS | ✅ | ✅ | Full support |
HVALS | ✅ | ✅ | Full support |
HLEN | ✅ | ✅ | Full support |
HMGET | ✅ | ✅ | Full support |
HSETNX | ✅ | ✅ | Full support |
HINCRBY/HINCRBYFLOAT | ✅ | ✅ | Full support |
HSCAN | ✅ | ✅ | Full support with pattern matching |
**Lists** | | | |
LPUSH/RPUSH | ✅ | ✅ | Full support |
LPOP/RPOP | ✅ | ✅ | Full support |
LLEN | ✅ | ✅ | Full support |
LRANGE | ✅ | ✅ | Full support |
LINDEX | ✅ | ✅ | Full support |
LTRIM | ✅ | ✅ | Full support |
LREM | ✅ | ✅ | Full support |
BLPOP/BRPOP | ✅ | ❌ | Blocking operations not in sled |
**Expiration** | | | |
EXPIRE | ✅ | ✅ | Full support in both |
TTL | ✅ | ✅ | Full support in both |
PERSIST | ✅ | ✅ | Full support in both |
SETEX/PSETEX | ✅ | ✅ | Full support in both |
EXPIREAT/PEXPIREAT | ✅ | ✅ | Full support in both |
**Scanning** | | | |
KEYS | ✅ | ✅ | Full support with patterns |
SCAN | ✅ | ✅ | Full cursor-based iteration |
HSCAN | ✅ | ✅ | Full cursor-based iteration |
**Transactions** | | | |
MULTI/EXEC/DISCARD | ✅ | ❌ | Only supported in redb |
**Encryption** | | | |
Data-at-rest encryption | ✅ | ✅ | Both support [age](age.tech) encryption |
AGE commands | ✅ | ✅ | Both support AGE crypto commands |
**Full-Text Search** | | | |
FT.CREATE | ✅ | ✅ | Create search index with schema |
FT.ADD | ✅ | ✅ | Add document to search index |
FT.SEARCH | ✅ | ✅ | Search documents with query |
FT.DEL | ✅ | ✅ | Delete document from index |
FT.INFO | ✅ | ✅ | Get index information |
FT.DROP | ✅ | ✅ | Drop search index |
FT.ALTER | ✅ | ✅ | Alter index schema |
FT.AGGREGATE | ✅ | ✅ | Aggregate search results |
### Performance Considerations
- **redb**: Optimized for concurrent access, better for high-throughput scenarios
- **sled**: Lock-free architecture, excellent for specific workloads
### Encryption Features
Both backends support:
- Transparent data-at-rest encryption using the `age` encryption library
- Per-database encryption (databases >= 10 are encrypted when `--encrypt` flag is used)
- Secure key derivation using the master key
### Backend Selection Examples
```bash
# Example: Testing both backends
redis-cli -p 6379 SET mykey "redb value"
redis-cli -p 6381 SET mykey "sled value"
# Example: Using encryption with both
./target/release/herodb --port 6379 --encrypt --key secret123
./target/release/herodb --port 6381 --sled --encrypt --key secret123
# Both support the same Redis commands
redis-cli -p 6379 HSET user:1 name "Alice" age "30"
redis-cli -p 6381 HSET user:1 name "Alice" age "30"
# Both support SCAN operations
redis-cli -p 6379 SCAN 0 MATCH user:* COUNT 10
redis-cli -p 6381 SCAN 0 MATCH user:* COUNT 10
```
### Migration Between Backends
To migrate data between backends, use Redis replication or dump/restore:
```bash
# Export from redb
redis-cli -p 6379 --rdb dump.rdb
# Import to sled
redis-cli -p 6381 --pipe < dump.rdb
```

397
docs/search.md Normal file
View File

@@ -0,0 +1,397 @@
# Full-Text Search with Tantivy
HeroDB includes powerful full-text search capabilities powered by [Tantivy](https://github.com/quickwit-oss/tantivy), a fast full-text search engine library written in Rust. This provides Redis-compatible search commands similar to RediSearch.
## Overview
The search functionality allows you to:
- Create search indexes with custom schemas
- Index documents with multiple field types
- Perform complex queries with filters
- Support for text, numeric, date, and geographic data
- Real-time search with high performance
## Search Commands
### FT.CREATE - Create Search Index
Create a new search index with a defined schema.
```bash
FT.CREATE index_name SCHEMA field_name field_type [options] [field_name field_type [options] ...]
```
**Field Types:**
- `TEXT` - Full-text searchable text fields
- `NUMERIC` - Numeric fields (integers, floats)
- `TAG` - Tag fields for exact matching
- `GEO` - Geographic coordinates (lat,lon)
- `DATE` - Date/timestamp fields
**Field Options:**
- `STORED` - Store field value for retrieval
- `INDEXED` - Make field searchable
- `TOKENIZED` - Enable tokenization for text fields
- `FAST` - Enable fast access for numeric fields
**Example:**
```bash
# Create a product search index
FT.CREATE products SCHEMA
title TEXT STORED INDEXED TOKENIZED
description TEXT STORED INDEXED TOKENIZED
price NUMERIC STORED INDEXED FAST
category TAG STORED
location GEO STORED
created_date DATE STORED INDEXED
```
### FT.ADD - Add Document to Index
Add a document to a search index.
```bash
FT.ADD index_name doc_id [SCORE score] FIELDS field_name field_value [field_name field_value ...]
```
**Example:**
```bash
# Add a product document
FT.ADD products product:1 SCORE 1.0 FIELDS
title "Wireless Headphones"
description "High-quality wireless headphones with noise cancellation"
price 199.99
category "electronics"
location "37.7749,-122.4194"
created_date 1640995200000
```
### FT.SEARCH - Search Documents
Search for documents in an index.
```bash
FT.SEARCH index_name query [LIMIT offset count] [FILTER field min max] [RETURN field [field ...]]
```
**Query Syntax:**
- Simple terms: `wireless headphones`
- Phrase queries: `"noise cancellation"`
- Field-specific: `title:wireless`
- Boolean operators: `wireless AND headphones`
- Wildcards: `head*`
**Examples:**
```bash
# Simple text search
FT.SEARCH products "wireless headphones"
# Search with filters
FT.SEARCH products "headphones" FILTER price 100 300 LIMIT 0 10
# Field-specific search
FT.SEARCH products "title:wireless AND category:electronics"
# Return specific fields only
FT.SEARCH products "*" RETURN title price
```
### FT.DEL - Delete Document
Remove a document from the search index.
```bash
FT.DEL index_name doc_id
```
**Example:**
```bash
FT.DEL products product:1
```
### FT.INFO - Get Index Information
Get information about a search index.
```bash
FT.INFO index_name
```
**Returns:**
- Index name and document count
- Field definitions and types
- Index configuration
**Example:**
```bash
FT.INFO products
```
### FT.DROP - Drop Index
Delete an entire search index.
```bash
FT.DROP index_name
```
**Example:**
```bash
FT.DROP products
```
### FT.ALTER - Alter Index Schema
Add new fields to an existing index.
```bash
FT.ALTER index_name SCHEMA ADD field_name field_type [options]
```
**Example:**
```bash
FT.ALTER products SCHEMA ADD brand TAG STORED
```
### FT.AGGREGATE - Aggregate Search Results
Perform aggregations on search results.
```bash
FT.AGGREGATE index_name query [GROUPBY field] [REDUCE function field AS alias]
```
**Example:**
```bash
# Group products by category and count
FT.AGGREGATE products "*" GROUPBY category REDUCE COUNT 0 AS count
```
## Field Types in Detail
### TEXT Fields
- **Purpose**: Full-text search on natural language content
- **Features**: Tokenization, stemming, stop-word removal
- **Options**: `STORED`, `INDEXED`, `TOKENIZED`
- **Example**: Product titles, descriptions, content
### NUMERIC Fields
- **Purpose**: Numeric data for range queries and sorting
- **Types**: I64, U64, F64
- **Options**: `STORED`, `INDEXED`, `FAST`
- **Example**: Prices, quantities, ratings
### TAG Fields
- **Purpose**: Exact-match categorical data
- **Features**: No tokenization, exact string matching
- **Options**: `STORED`, case sensitivity control
- **Example**: Categories, brands, status values
### GEO Fields
- **Purpose**: Geographic coordinates
- **Format**: "latitude,longitude" (e.g., "37.7749,-122.4194")
- **Features**: Geographic distance queries
- **Options**: `STORED`
### DATE Fields
- **Purpose**: Timestamp and date data
- **Format**: Unix timestamp in milliseconds
- **Features**: Range queries, temporal filtering
- **Options**: `STORED`, `INDEXED`, `FAST`
## Search Query Syntax
### Basic Queries
```bash
# Single term
FT.SEARCH products "wireless"
# Multiple terms (AND by default)
FT.SEARCH products "wireless headphones"
# Phrase query
FT.SEARCH products "\"noise cancellation\""
```
### Field-Specific Queries
```bash
# Search in specific field
FT.SEARCH products "title:wireless"
# Multiple field queries
FT.SEARCH products "title:wireless AND description:bluetooth"
```
### Boolean Operators
```bash
# AND operator
FT.SEARCH products "wireless AND headphones"
# OR operator
FT.SEARCH products "wireless OR bluetooth"
# NOT operator
FT.SEARCH products "headphones NOT wired"
```
### Wildcards and Fuzzy Search
```bash
# Wildcard search
FT.SEARCH products "head*"
# Fuzzy search (approximate matching)
FT.SEARCH products "%headphone%"
```
### Range Queries
```bash
# Numeric range in query
FT.SEARCH products "@price:[100 300]"
# Date range
FT.SEARCH products "@created_date:[1640995200000 1672531200000]"
```
## Filtering and Sorting
### FILTER Clause
```bash
# Numeric filter
FT.SEARCH products "headphones" FILTER price 100 300
# Multiple filters
FT.SEARCH products "*" FILTER price 100 500 FILTER rating 4 5
```
### LIMIT Clause
```bash
# Pagination
FT.SEARCH products "wireless" LIMIT 0 10 # First 10 results
FT.SEARCH products "wireless" LIMIT 10 10 # Next 10 results
```
### RETURN Clause
```bash
# Return specific fields
FT.SEARCH products "*" RETURN title price
# Return all stored fields (default)
FT.SEARCH products "*"
```
## Performance Considerations
### Indexing Strategy
- Only index fields you need to search on
- Use `FAST` option for frequently filtered numeric fields
- Consider storage vs. search performance trade-offs
### Query Optimization
- Use specific field queries when possible
- Combine filters with text queries for better performance
- Use pagination with LIMIT for large result sets
### Memory Usage
- Tantivy indexes are memory-mapped for performance
- Index size depends on document count and field configuration
- Monitor disk space for index storage
## Integration with Redis Commands
Search indexes work alongside regular Redis data:
```bash
# Store product data in Redis hash
HSET product:1 title "Wireless Headphones" price "199.99"
# Index the same data for search
FT.ADD products product:1 FIELDS title "Wireless Headphones" price 199.99
# Search returns document IDs that can be used with Redis commands
FT.SEARCH products "wireless"
# Returns: product:1
# Retrieve full data using Redis
HGETALL product:1
```
## Example Use Cases
### E-commerce Product Search
```bash
# Create product catalog index
FT.CREATE catalog SCHEMA
name TEXT STORED INDEXED TOKENIZED
description TEXT INDEXED TOKENIZED
price NUMERIC STORED INDEXED FAST
category TAG STORED
brand TAG STORED
rating NUMERIC STORED FAST
# Add products
FT.ADD catalog prod:1 FIELDS name "iPhone 14" price 999 category "phones" brand "apple" rating 4.5
FT.ADD catalog prod:2 FIELDS name "Samsung Galaxy" price 899 category "phones" brand "samsung" rating 4.3
# Search queries
FT.SEARCH catalog "iPhone"
FT.SEARCH catalog "phones" FILTER price 800 1000
FT.SEARCH catalog "@brand:apple"
```
### Content Management
```bash
# Create content index
FT.CREATE content SCHEMA
title TEXT STORED INDEXED TOKENIZED
body TEXT INDEXED TOKENIZED
author TAG STORED
published DATE STORED INDEXED
tags TAG STORED
# Search content
FT.SEARCH content "machine learning"
FT.SEARCH content "@author:john AND @tags:ai"
FT.SEARCH content "*" FILTER published 1640995200000 1672531200000
```
### Geographic Search
```bash
# Create location-based index
FT.CREATE places SCHEMA
name TEXT STORED INDEXED TOKENIZED
location GEO STORED
type TAG STORED
# Add locations
FT.ADD places place:1 FIELDS name "Golden Gate Bridge" location "37.8199,-122.4783" type "landmark"
# Geographic queries (future feature)
FT.SEARCH places "@location:[37.7749 -122.4194 10 km]"
```
## Error Handling
Common error responses:
- `ERR index not found` - Index doesn't exist
- `ERR field not found` - Field not defined in schema
- `ERR invalid query syntax` - Malformed query
- `ERR document not found` - Document ID doesn't exist
## Best Practices
1. **Schema Design**: Plan your schema carefully - changes require reindexing
2. **Field Selection**: Only store and index fields you actually need
3. **Batch Operations**: Add multiple documents efficiently
4. **Query Testing**: Test queries for performance with realistic data
5. **Monitoring**: Monitor index size and query performance
6. **Backup**: Include search indexes in backup strategies
## Future Enhancements
Planned features:
- Geographic distance queries
- Advanced aggregations and faceting
- Highlighting of search results
- Synonyms and custom analyzers
- Real-time suggestions and autocomplete
- Index replication and sharding