Port HeroLib OurDB Package from V to Rust #1

New Issue

timur · 2025-04-09T08:26:18Z

timur commented

2025-04-09 08:26:18 +00:00

Port HeroLib OurDB Package from V to Rust

Overview

This issue involves porting the ourdb package from the HeroLib V codebase to a Rust implementation. OurDB is a lightweight, efficient key-value database implementation that provides data persistence with history tracking capabilities. The port should maintain all existing functionality while leveraging Rust's memory safety, performance, and ecosystem.

What is OurDB?

OurDB is a simple yet powerful key-value database designed for scenarios where you need fast key-value storage with the ability to track changes over time. It features a layered architecture with a frontend API, lookup table for efficient key-to-location mapping, and backend storage for actual data persistence.

Key characteristics:

Efficient key-value storage with history tracking
Data integrity verification using CRC32 checksums
Support for multiple backend files
Configurable record sizes and counts
Memory and disk-based lookup tables
Optional incremental ID mode

Current Implementation Details

Architecture

OurDB consists of three main components working together in a layered architecture:

Frontend (db.v)
- Provides the public API for database operations
- Handles high-level operations (set, get, delete, history)
- Coordinates between lookup and backend components
- Supports both key-value and incremental ID modes
Lookup Table (lookup.v)
- Maps keys to physical locations in the backend storage
- Supports both memory and disk-based lookup tables
- Automatically optimizes key sizes based on database configuration
- Handles sparse data efficiently
- Provides next ID generation for incremental mode
Backend Storage (backend.v)
- Manages the actual data storage in files
- Handles data integrity with CRC32 checksums
- Supports multiple file backends for large datasets
- Implements the low-level read/write operations

Core Data Structures

The V implementation consists of these primary structures:

OurDB: The main database structure

@[heap]
pub struct OurDB {
mut:
    lookup &LookupTable
pub:
    path             string // directory for storage
    incremental_mode bool
    file_size        u32 = 500 * (1 << 20) // 500MB
pub mut:
    file              os.File
    file_nr           u16 // the file which is open
    last_used_file_nr u16
}

LookupTable: Maps keys to physical storage locations

pub struct LookupTable {
    keysize    u8
    lookuppath string
mut:
    data        []u8
    incremental ?u32 // points to next empty slot if incremental mode is enabled
}

Location: Represents a physical location in the storage

pub struct Location {
pub mut:
    file_nr  u16
    position u32
}

Key Operations

The current implementation provides these core operations:

new(): Creates a new database with specified configuration
set(): Stores data with a specific ID or auto-incremented ID
get(): Retrieves data by ID
get_history(): Retrieves historical values for a specific ID
delete(): Removes data at a specific ID
get_next_id(): Gets the next available ID in incremental mode
save(): Persists the lookup table to disk
load(): Loads the lookup table from disk
close(): Closes the database
destroy(): Deletes the database files

Storage Format

Record Format

Each record in the backend storage includes:

2 bytes: Data size
4 bytes: CRC32 checksum
6 bytes: Previous record location (for history)
N bytes: Actual data

Lookup Table Optimization

The lookup table automatically optimizes its key size based on the database configuration:

2 bytes: For databases with < 65,536 records
3 bytes: For databases with < 16,777,216 records
4 bytes: For databases with < 4,294,967,296 records
6 bytes: For large databases requiring multiple files

File Management

Supports splitting data across multiple files when needed
Each file is limited to 500MB by default (configurable)
Automatic file selection based on record location
Files are created as needed with format: ${path}/${file_nr}.db
Lookup table state is persisted in ${path}/lookup_dump.db

Implementation Details

Data Storage and Retrieval

Storage Process (set)

In incremental mode, generate the next ID or use provided ID
Get the storage location from the lookup table
Calculate CRC32 checksum for data integrity
Store previous location for history tracking
Write the data to the backend file
Update the lookup table with the new location

Retrieval Process (get)

Get the location from the lookup table using the ID
Read the record header from the backend file
Extract size and CRC32 from the header
Read the actual data
Verify data integrity using the CRC32 checksum
Return the data if valid

History Tracking (get_history)

Start with the current location
Retrieve the data at that location
Extract the previous location from the header
Repeat until reaching the specified depth or no more history

Lookup Table Management

The lookup table maps IDs to physical locations in the storage files. It supports:

Memory-based lookup: Stores the entire mapping in memory
Disk-based lookup: Stores the mapping on disk for persistence
Sparse export/import: Efficiently stores only non-empty entries
Incremental ID generation: Automatically assigns sequential IDs

File Management

OurDB supports multiple storage files:

Automatically creates new files when current file reaches size limit
Selects appropriate file for read/write operations
Handles file opening/closing as needed
Supports up to 65,536 files per database

Requirements for Rust Port

The Rust implementation should:

Maintain API Compatibility: Provide equivalent functionality to the V implementation
Preserve Storage Format: Maintain compatibility with existing OurDB files
Performance: Maintain or improve the performance characteristics of the V implementation
Memory Safety: Leverage Rust's ownership model for memory safety
Error Handling: Use Rust's Result type for proper error handling
Documentation: Include comprehensive documentation and examples
Testing: Port existing tests and add new ones as needed

Proposed Rust Structure

// Core data structures
pub struct OurDB {
    path: String,
    incremental_mode: bool,
    file_size: u32,
    lookup: LookupTable,
    file: Option<std::fs::File>,
    file_nr: u16,
    last_used_file_nr: u16,
}

pub struct LookupTable {
    keysize: u8,
    lookuppath: String,
    data: Vec<u8>,
    incremental: Option<u32>,
}

pub struct Location {
    file_nr: u16,
    position: u32,
}

// Configuration
pub struct OurDBConfig {
    pub record_nr_max: u32,
    pub record_size_max: u32,
    pub file_size: u32,
    pub path: String,
    pub incremental_mode: bool,
    pub reset: bool,
}

// Public API
impl OurDB {
    pub fn new(config: OurDBConfig) -> Result<Self, Error> { ... }
    pub fn set(&mut self, id: Option<u32>, data: &[u8]) -> Result<u32, Error> { ... }
    pub fn get(&mut self, id: u32) -> Result<Vec<u8>, Error> { ... }
    pub fn get_history(&mut self, id: u32, depth: u8) -> Result<Vec<Vec<u8>>, Error> { ... }
    pub fn delete(&mut self, id: u32) -> Result<(), Error> { ... }
    pub fn get_next_id(&mut self) -> Result<u32, Error> { ... }
    pub fn close(&mut self) -> Result<(), Error> { ... }
    pub fn destroy(&mut self) -> Result<(), Error> { ... }
}

Implementation Considerations

File I/O:
- Use Rust's standard library for file operations
- Consider using BufReader/BufWriter for improved performance
- Implement proper error handling for all I/O operations
Memory Management:
- Use Rust's ownership model to ensure memory safety
- Minimize unnecessary copying of data
- Consider using Bytes or similar crates for efficient byte manipulation
Error Handling:
- Implement a custom Error enum for OurDB-specific errors
- Use Result for all operations that can fail
- Provide detailed error messages
Concurrency:
- Consider thread safety for concurrent access
- Implement interior mutability patterns where appropriate
- Potentially use RwLock for shared read access
Performance Optimizations:
- Consider memory-mapped files for lookup tables
- Implement caching for frequently accessed records
- Use zero-copy operations where possible

Use Cases

OurDB is particularly useful for:

Applications requiring simple but efficient data persistence
Systems needing to track historical changes to data
Embedded applications with limited resources
Scenarios where data integrity is critical
Use cases requiring both memory and disk-based storage options

Acceptance Criteria

All existing functionality is properly ported
All tests pass with equivalent behavior to the V implementation
API is idiomatic Rust while maintaining functional equivalence
Performance is at least on par with the V implementation
Documentation is comprehensive and includes examples
Code follows Rust best practices and passes clippy lints
Existing OurDB files can be read by the Rust implementation

Resources

Original V implementation: github/freeflowuniverse/herolib/lib/data/ourdb
Rust documentation: https://doc.rust-lang.org/book/
Crates to consider:
- bytes for efficient byte manipulation
- thiserror for error handling
- memmap2 for memory-mapped files
- crc32fast for CRC32 calculation

# Port HeroLib OurDB Package from V to Rust ## Overview This issue involves porting the `ourdb` package from the HeroLib V codebase to a Rust implementation. OurDB is a lightweight, efficient key-value database implementation that provides data persistence with history tracking capabilities. The port should maintain all existing functionality while leveraging Rust's memory safety, performance, and ecosystem. ## What is OurDB? OurDB is a simple yet powerful key-value database designed for scenarios where you need fast key-value storage with the ability to track changes over time. It features a layered architecture with a frontend API, lookup table for efficient key-to-location mapping, and backend storage for actual data persistence. Key characteristics: - Efficient key-value storage with history tracking - Data integrity verification using CRC32 checksums - Support for multiple backend files - Configurable record sizes and counts - Memory and disk-based lookup tables - Optional incremental ID mode ## Current Implementation Details ### Architecture OurDB consists of three main components working together in a layered architecture: 1. **Frontend (db.v)** - Provides the public API for database operations - Handles high-level operations (set, get, delete, history) - Coordinates between lookup and backend components - Supports both key-value and incremental ID modes 2. **Lookup Table (lookup.v)** - Maps keys to physical locations in the backend storage - Supports both memory and disk-based lookup tables - Automatically optimizes key sizes based on database configuration - Handles sparse data efficiently - Provides next ID generation for incremental mode 3. **Backend Storage (backend.v)** - Manages the actual data storage in files - Handles data integrity with CRC32 checksums - Supports multiple file backends for large datasets - Implements the low-level read/write operations ### Core Data Structures The V implementation consists of these primary structures: 1. **OurDB**: The main database structure ```v @[heap] pub struct OurDB { mut: lookup &LookupTable pub: path string // directory for storage incremental_mode bool file_size u32 = 500 * (1 << 20) // 500MB pub mut: file os.File file_nr u16 // the file which is open last_used_file_nr u16 } ``` 2. **LookupTable**: Maps keys to physical storage locations ```v pub struct LookupTable { keysize u8 lookuppath string mut: data []u8 incremental ?u32 // points to next empty slot if incremental mode is enabled } ``` 3. **Location**: Represents a physical location in the storage ```v pub struct Location { pub mut: file_nr u16 position u32 } ``` ### Key Operations The current implementation provides these core operations: 1. **new()**: Creates a new database with specified configuration 2. **set()**: Stores data with a specific ID or auto-incremented ID 3. **get()**: Retrieves data by ID 4. **get_history()**: Retrieves historical values for a specific ID 5. **delete()**: Removes data at a specific ID 6. **get_next_id()**: Gets the next available ID in incremental mode 7. **save()**: Persists the lookup table to disk 8. **load()**: Loads the lookup table from disk 9. **close()**: Closes the database 10. **destroy()**: Deletes the database files ### Storage Format #### Record Format Each record in the backend storage includes: - 2 bytes: Data size - 4 bytes: CRC32 checksum - 6 bytes: Previous record location (for history) - N bytes: Actual data #### Lookup Table Optimization The lookup table automatically optimizes its key size based on the database configuration: - 2 bytes: For databases with < 65,536 records - 3 bytes: For databases with < 16,777,216 records - 4 bytes: For databases with < 4,294,967,296 records - 6 bytes: For large databases requiring multiple files #### File Management - Supports splitting data across multiple files when needed - Each file is limited to 500MB by default (configurable) - Automatic file selection based on record location - Files are created as needed with format: `${path}/${file_nr}.db` - Lookup table state is persisted in `${path}/lookup_dump.db` ## Implementation Details ### Data Storage and Retrieval #### Storage Process (set) 1. In incremental mode, generate the next ID or use provided ID 2. Get the storage location from the lookup table 3. Calculate CRC32 checksum for data integrity 4. Store previous location for history tracking 5. Write the data to the backend file 6. Update the lookup table with the new location #### Retrieval Process (get) 1. Get the location from the lookup table using the ID 2. Read the record header from the backend file 3. Extract size and CRC32 from the header 4. Read the actual data 5. Verify data integrity using the CRC32 checksum 6. Return the data if valid #### History Tracking (get_history) 1. Start with the current location 2. Retrieve the data at that location 3. Extract the previous location from the header 4. Repeat until reaching the specified depth or no more history ### Lookup Table Management The lookup table maps IDs to physical locations in the storage files. It supports: 1. **Memory-based lookup**: Stores the entire mapping in memory 2. **Disk-based lookup**: Stores the mapping on disk for persistence 3. **Sparse export/import**: Efficiently stores only non-empty entries 4. **Incremental ID generation**: Automatically assigns sequential IDs ### File Management OurDB supports multiple storage files: 1. Automatically creates new files when current file reaches size limit 2. Selects appropriate file for read/write operations 3. Handles file opening/closing as needed 4. Supports up to 65,536 files per database ## Requirements for Rust Port The Rust implementation should: 1. **Maintain API Compatibility**: Provide equivalent functionality to the V implementation 2. **Preserve Storage Format**: Maintain compatibility with existing OurDB files 3. **Performance**: Maintain or improve the performance characteristics of the V implementation 4. **Memory Safety**: Leverage Rust's ownership model for memory safety 5. **Error Handling**: Use Rust's Result type for proper error handling 6. **Documentation**: Include comprehensive documentation and examples 7. **Testing**: Port existing tests and add new ones as needed ## Proposed Rust Structure ```rust // Core data structures pub struct OurDB { path: String, incremental_mode: bool, file_size: u32, lookup: LookupTable, file: Option<std::fs::File>, file_nr: u16, last_used_file_nr: u16, } pub struct LookupTable { keysize: u8, lookuppath: String, data: Vec<u8>, incremental: Option<u32>, } pub struct Location { file_nr: u16, position: u32, } // Configuration pub struct OurDBConfig { pub record_nr_max: u32, pub record_size_max: u32, pub file_size: u32, pub path: String, pub incremental_mode: bool, pub reset: bool, } // Public API impl OurDB { pub fn new(config: OurDBConfig) -> Result<Self, Error> { ... } pub fn set(&mut self, id: Option<u32>, data: &[u8]) -> Result<u32, Error> { ... } pub fn get(&mut self, id: u32) -> Result<Vec<u8>, Error> { ... } pub fn get_history(&mut self, id: u32, depth: u8) -> Result<Vec<Vec<u8>>, Error> { ... } pub fn delete(&mut self, id: u32) -> Result<(), Error> { ... } pub fn get_next_id(&mut self) -> Result<u32, Error> { ... } pub fn close(&mut self) -> Result<(), Error> { ... } pub fn destroy(&mut self) -> Result<(), Error> { ... } } ``` ## Implementation Considerations 1. **File I/O**: - Use Rust's standard library for file operations - Consider using BufReader/BufWriter for improved performance - Implement proper error handling for all I/O operations 2. **Memory Management**: - Use Rust's ownership model to ensure memory safety - Minimize unnecessary copying of data - Consider using Bytes or similar crates for efficient byte manipulation 3. **Error Handling**: - Implement a custom Error enum for OurDB-specific errors - Use Result for all operations that can fail - Provide detailed error messages 4. **Concurrency**: - Consider thread safety for concurrent access - Implement interior mutability patterns where appropriate - Potentially use RwLock for shared read access 5. **Performance Optimizations**: - Consider memory-mapped files for lookup tables - Implement caching for frequently accessed records - Use zero-copy operations where possible ## Use Cases OurDB is particularly useful for: - Applications requiring simple but efficient data persistence - Systems needing to track historical changes to data - Embedded applications with limited resources - Scenarios where data integrity is critical - Use cases requiring both memory and disk-based storage options ## Acceptance Criteria 1. All existing functionality is properly ported 2. All tests pass with equivalent behavior to the V implementation 3. API is idiomatic Rust while maintaining functional equivalence 4. Performance is at least on par with the V implementation 5. Documentation is comprehensive and includes examples 6. Code follows Rust best practices and passes clippy lints 7. Existing OurDB files can be read by the Rust implementation ## Resources - Original V implementation: `github/freeflowuniverse/herolib/lib/data/ourdb` - Rust documentation: https://doc.rust-lang.org/book/ - Crates to consider: - bytes for efficient byte manipulation - thiserror for error handling - memmap2 for memory-mapped files - crc32fast for CRC32 calculation

timur commented

2025-04-09 09:10:58 +00:00

architecture defined: 5c5225c8f7

architecture defined: https://git.ourworld.tf/herocode/db/commit/5c5225c8f77d5c435be7fa3463d2115be544255c

timur commented

2025-04-09 09:40:34 +00:00

ported with 0eedec9ed0

ported with https://git.ourworld.tf/herocode/db/commit/0eedec9ed08be6600f320e7334fc15dc71804a8c

timur closed this issue

2025-04-09 09:40:34 +00:00

Sign in to join this conversation.