Port HeroLib OurDB Package from V to Rust #1

Closed
opened 2025-04-09 08:26:18 +00:00 by timur · 2 comments
Owner

Port HeroLib OurDB Package from V to Rust

Overview

This issue involves porting the ourdb package from the HeroLib V codebase to a Rust implementation. OurDB is a lightweight, efficient key-value database implementation that provides data persistence with history tracking capabilities. The port should maintain all existing functionality while leveraging Rust's memory safety, performance, and ecosystem.

What is OurDB?

OurDB is a simple yet powerful key-value database designed for scenarios where you need fast key-value storage with the ability to track changes over time. It features a layered architecture with a frontend API, lookup table for efficient key-to-location mapping, and backend storage for actual data persistence.

Key characteristics:

  • Efficient key-value storage with history tracking
  • Data integrity verification using CRC32 checksums
  • Support for multiple backend files
  • Configurable record sizes and counts
  • Memory and disk-based lookup tables
  • Optional incremental ID mode

Current Implementation Details

Architecture

OurDB consists of three main components working together in a layered architecture:

  1. Frontend (db.v)

    • Provides the public API for database operations
    • Handles high-level operations (set, get, delete, history)
    • Coordinates between lookup and backend components
    • Supports both key-value and incremental ID modes
  2. Lookup Table (lookup.v)

    • Maps keys to physical locations in the backend storage
    • Supports both memory and disk-based lookup tables
    • Automatically optimizes key sizes based on database configuration
    • Handles sparse data efficiently
    • Provides next ID generation for incremental mode
  3. Backend Storage (backend.v)

    • Manages the actual data storage in files
    • Handles data integrity with CRC32 checksums
    • Supports multiple file backends for large datasets
    • Implements the low-level read/write operations

Core Data Structures

The V implementation consists of these primary structures:

  1. OurDB: The main database structure
@[heap]
pub struct OurDB {
mut:
    lookup &LookupTable
pub:
    path             string // directory for storage
    incremental_mode bool
    file_size        u32 = 500 * (1 << 20) // 500MB
pub mut:
    file              os.File
    file_nr           u16 // the file which is open
    last_used_file_nr u16
}
  1. LookupTable: Maps keys to physical storage locations
pub struct LookupTable {
    keysize    u8
    lookuppath string
mut:
    data        []u8
    incremental ?u32 // points to next empty slot if incremental mode is enabled
}
  1. Location: Represents a physical location in the storage
pub struct Location {
pub mut:
    file_nr  u16
    position u32
}

Key Operations

The current implementation provides these core operations:

  1. new(): Creates a new database with specified configuration
  2. set(): Stores data with a specific ID or auto-incremented ID
  3. get(): Retrieves data by ID
  4. get_history(): Retrieves historical values for a specific ID
  5. delete(): Removes data at a specific ID
  6. get_next_id(): Gets the next available ID in incremental mode
  7. save(): Persists the lookup table to disk
  8. load(): Loads the lookup table from disk
  9. close(): Closes the database
  10. destroy(): Deletes the database files

Storage Format

Record Format

Each record in the backend storage includes:

  • 2 bytes: Data size
  • 4 bytes: CRC32 checksum
  • 6 bytes: Previous record location (for history)
  • N bytes: Actual data

Lookup Table Optimization

The lookup table automatically optimizes its key size based on the database configuration:

  • 2 bytes: For databases with < 65,536 records
  • 3 bytes: For databases with < 16,777,216 records
  • 4 bytes: For databases with < 4,294,967,296 records
  • 6 bytes: For large databases requiring multiple files

File Management

  • Supports splitting data across multiple files when needed
  • Each file is limited to 500MB by default (configurable)
  • Automatic file selection based on record location
  • Files are created as needed with format: ${path}/${file_nr}.db
  • Lookup table state is persisted in ${path}/lookup_dump.db

Implementation Details

Data Storage and Retrieval

Storage Process (set)

  1. In incremental mode, generate the next ID or use provided ID
  2. Get the storage location from the lookup table
  3. Calculate CRC32 checksum for data integrity
  4. Store previous location for history tracking
  5. Write the data to the backend file
  6. Update the lookup table with the new location

Retrieval Process (get)

  1. Get the location from the lookup table using the ID
  2. Read the record header from the backend file
  3. Extract size and CRC32 from the header
  4. Read the actual data
  5. Verify data integrity using the CRC32 checksum
  6. Return the data if valid

History Tracking (get_history)

  1. Start with the current location
  2. Retrieve the data at that location
  3. Extract the previous location from the header
  4. Repeat until reaching the specified depth or no more history

Lookup Table Management

The lookup table maps IDs to physical locations in the storage files. It supports:

  1. Memory-based lookup: Stores the entire mapping in memory
  2. Disk-based lookup: Stores the mapping on disk for persistence
  3. Sparse export/import: Efficiently stores only non-empty entries
  4. Incremental ID generation: Automatically assigns sequential IDs

File Management

OurDB supports multiple storage files:

  1. Automatically creates new files when current file reaches size limit
  2. Selects appropriate file for read/write operations
  3. Handles file opening/closing as needed
  4. Supports up to 65,536 files per database

Requirements for Rust Port

The Rust implementation should:

  1. Maintain API Compatibility: Provide equivalent functionality to the V implementation
  2. Preserve Storage Format: Maintain compatibility with existing OurDB files
  3. Performance: Maintain or improve the performance characteristics of the V implementation
  4. Memory Safety: Leverage Rust's ownership model for memory safety
  5. Error Handling: Use Rust's Result type for proper error handling
  6. Documentation: Include comprehensive documentation and examples
  7. Testing: Port existing tests and add new ones as needed

Proposed Rust Structure

// Core data structures
pub struct OurDB {
    path: String,
    incremental_mode: bool,
    file_size: u32,
    lookup: LookupTable,
    file: Option<std::fs::File>,
    file_nr: u16,
    last_used_file_nr: u16,
}

pub struct LookupTable {
    keysize: u8,
    lookuppath: String,
    data: Vec<u8>,
    incremental: Option<u32>,
}

pub struct Location {
    file_nr: u16,
    position: u32,
}

// Configuration
pub struct OurDBConfig {
    pub record_nr_max: u32,
    pub record_size_max: u32,
    pub file_size: u32,
    pub path: String,
    pub incremental_mode: bool,
    pub reset: bool,
}

// Public API
impl OurDB {
    pub fn new(config: OurDBConfig) -> Result<Self, Error> { ... }
    pub fn set(&mut self, id: Option<u32>, data: &[u8]) -> Result<u32, Error> { ... }
    pub fn get(&mut self, id: u32) -> Result<Vec<u8>, Error> { ... }
    pub fn get_history(&mut self, id: u32, depth: u8) -> Result<Vec<Vec<u8>>, Error> { ... }
    pub fn delete(&mut self, id: u32) -> Result<(), Error> { ... }
    pub fn get_next_id(&mut self) -> Result<u32, Error> { ... }
    pub fn close(&mut self) -> Result<(), Error> { ... }
    pub fn destroy(&mut self) -> Result<(), Error> { ... }
}

Implementation Considerations

  1. File I/O:

    • Use Rust's standard library for file operations
    • Consider using BufReader/BufWriter for improved performance
    • Implement proper error handling for all I/O operations
  2. Memory Management:

    • Use Rust's ownership model to ensure memory safety
    • Minimize unnecessary copying of data
    • Consider using Bytes or similar crates for efficient byte manipulation
  3. Error Handling:

    • Implement a custom Error enum for OurDB-specific errors
    • Use Result for all operations that can fail
    • Provide detailed error messages
  4. Concurrency:

    • Consider thread safety for concurrent access
    • Implement interior mutability patterns where appropriate
    • Potentially use RwLock for shared read access
  5. Performance Optimizations:

    • Consider memory-mapped files for lookup tables
    • Implement caching for frequently accessed records
    • Use zero-copy operations where possible

Use Cases

OurDB is particularly useful for:

  • Applications requiring simple but efficient data persistence
  • Systems needing to track historical changes to data
  • Embedded applications with limited resources
  • Scenarios where data integrity is critical
  • Use cases requiring both memory and disk-based storage options

Acceptance Criteria

  1. All existing functionality is properly ported
  2. All tests pass with equivalent behavior to the V implementation
  3. API is idiomatic Rust while maintaining functional equivalence
  4. Performance is at least on par with the V implementation
  5. Documentation is comprehensive and includes examples
  6. Code follows Rust best practices and passes clippy lints
  7. Existing OurDB files can be read by the Rust implementation

Resources

  • Original V implementation: github/freeflowuniverse/herolib/lib/data/ourdb
  • Rust documentation: https://doc.rust-lang.org/book/
  • Crates to consider:
    • bytes for efficient byte manipulation
    • thiserror for error handling
    • memmap2 for memory-mapped files
    • crc32fast for CRC32 calculation
# Port HeroLib OurDB Package from V to Rust ## Overview This issue involves porting the `ourdb` package from the HeroLib V codebase to a Rust implementation. OurDB is a lightweight, efficient key-value database implementation that provides data persistence with history tracking capabilities. The port should maintain all existing functionality while leveraging Rust's memory safety, performance, and ecosystem. ## What is OurDB? OurDB is a simple yet powerful key-value database designed for scenarios where you need fast key-value storage with the ability to track changes over time. It features a layered architecture with a frontend API, lookup table for efficient key-to-location mapping, and backend storage for actual data persistence. Key characteristics: - Efficient key-value storage with history tracking - Data integrity verification using CRC32 checksums - Support for multiple backend files - Configurable record sizes and counts - Memory and disk-based lookup tables - Optional incremental ID mode ## Current Implementation Details ### Architecture OurDB consists of three main components working together in a layered architecture: 1. **Frontend (db.v)** - Provides the public API for database operations - Handles high-level operations (set, get, delete, history) - Coordinates between lookup and backend components - Supports both key-value and incremental ID modes 2. **Lookup Table (lookup.v)** - Maps keys to physical locations in the backend storage - Supports both memory and disk-based lookup tables - Automatically optimizes key sizes based on database configuration - Handles sparse data efficiently - Provides next ID generation for incremental mode 3. **Backend Storage (backend.v)** - Manages the actual data storage in files - Handles data integrity with CRC32 checksums - Supports multiple file backends for large datasets - Implements the low-level read/write operations ### Core Data Structures The V implementation consists of these primary structures: 1. **OurDB**: The main database structure ```v @[heap] pub struct OurDB { mut: lookup &LookupTable pub: path string // directory for storage incremental_mode bool file_size u32 = 500 * (1 << 20) // 500MB pub mut: file os.File file_nr u16 // the file which is open last_used_file_nr u16 } ``` 2. **LookupTable**: Maps keys to physical storage locations ```v pub struct LookupTable { keysize u8 lookuppath string mut: data []u8 incremental ?u32 // points to next empty slot if incremental mode is enabled } ``` 3. **Location**: Represents a physical location in the storage ```v pub struct Location { pub mut: file_nr u16 position u32 } ``` ### Key Operations The current implementation provides these core operations: 1. **new()**: Creates a new database with specified configuration 2. **set()**: Stores data with a specific ID or auto-incremented ID 3. **get()**: Retrieves data by ID 4. **get_history()**: Retrieves historical values for a specific ID 5. **delete()**: Removes data at a specific ID 6. **get_next_id()**: Gets the next available ID in incremental mode 7. **save()**: Persists the lookup table to disk 8. **load()**: Loads the lookup table from disk 9. **close()**: Closes the database 10. **destroy()**: Deletes the database files ### Storage Format #### Record Format Each record in the backend storage includes: - 2 bytes: Data size - 4 bytes: CRC32 checksum - 6 bytes: Previous record location (for history) - N bytes: Actual data #### Lookup Table Optimization The lookup table automatically optimizes its key size based on the database configuration: - 2 bytes: For databases with < 65,536 records - 3 bytes: For databases with < 16,777,216 records - 4 bytes: For databases with < 4,294,967,296 records - 6 bytes: For large databases requiring multiple files #### File Management - Supports splitting data across multiple files when needed - Each file is limited to 500MB by default (configurable) - Automatic file selection based on record location - Files are created as needed with format: `${path}/${file_nr}.db` - Lookup table state is persisted in `${path}/lookup_dump.db` ## Implementation Details ### Data Storage and Retrieval #### Storage Process (set) 1. In incremental mode, generate the next ID or use provided ID 2. Get the storage location from the lookup table 3. Calculate CRC32 checksum for data integrity 4. Store previous location for history tracking 5. Write the data to the backend file 6. Update the lookup table with the new location #### Retrieval Process (get) 1. Get the location from the lookup table using the ID 2. Read the record header from the backend file 3. Extract size and CRC32 from the header 4. Read the actual data 5. Verify data integrity using the CRC32 checksum 6. Return the data if valid #### History Tracking (get_history) 1. Start with the current location 2. Retrieve the data at that location 3. Extract the previous location from the header 4. Repeat until reaching the specified depth or no more history ### Lookup Table Management The lookup table maps IDs to physical locations in the storage files. It supports: 1. **Memory-based lookup**: Stores the entire mapping in memory 2. **Disk-based lookup**: Stores the mapping on disk for persistence 3. **Sparse export/import**: Efficiently stores only non-empty entries 4. **Incremental ID generation**: Automatically assigns sequential IDs ### File Management OurDB supports multiple storage files: 1. Automatically creates new files when current file reaches size limit 2. Selects appropriate file for read/write operations 3. Handles file opening/closing as needed 4. Supports up to 65,536 files per database ## Requirements for Rust Port The Rust implementation should: 1. **Maintain API Compatibility**: Provide equivalent functionality to the V implementation 2. **Preserve Storage Format**: Maintain compatibility with existing OurDB files 3. **Performance**: Maintain or improve the performance characteristics of the V implementation 4. **Memory Safety**: Leverage Rust's ownership model for memory safety 5. **Error Handling**: Use Rust's Result type for proper error handling 6. **Documentation**: Include comprehensive documentation and examples 7. **Testing**: Port existing tests and add new ones as needed ## Proposed Rust Structure ```rust // Core data structures pub struct OurDB { path: String, incremental_mode: bool, file_size: u32, lookup: LookupTable, file: Option<std::fs::File>, file_nr: u16, last_used_file_nr: u16, } pub struct LookupTable { keysize: u8, lookuppath: String, data: Vec<u8>, incremental: Option<u32>, } pub struct Location { file_nr: u16, position: u32, } // Configuration pub struct OurDBConfig { pub record_nr_max: u32, pub record_size_max: u32, pub file_size: u32, pub path: String, pub incremental_mode: bool, pub reset: bool, } // Public API impl OurDB { pub fn new(config: OurDBConfig) -> Result<Self, Error> { ... } pub fn set(&mut self, id: Option<u32>, data: &[u8]) -> Result<u32, Error> { ... } pub fn get(&mut self, id: u32) -> Result<Vec<u8>, Error> { ... } pub fn get_history(&mut self, id: u32, depth: u8) -> Result<Vec<Vec<u8>>, Error> { ... } pub fn delete(&mut self, id: u32) -> Result<(), Error> { ... } pub fn get_next_id(&mut self) -> Result<u32, Error> { ... } pub fn close(&mut self) -> Result<(), Error> { ... } pub fn destroy(&mut self) -> Result<(), Error> { ... } } ``` ## Implementation Considerations 1. **File I/O**: - Use Rust's standard library for file operations - Consider using BufReader/BufWriter for improved performance - Implement proper error handling for all I/O operations 2. **Memory Management**: - Use Rust's ownership model to ensure memory safety - Minimize unnecessary copying of data - Consider using Bytes or similar crates for efficient byte manipulation 3. **Error Handling**: - Implement a custom Error enum for OurDB-specific errors - Use Result for all operations that can fail - Provide detailed error messages 4. **Concurrency**: - Consider thread safety for concurrent access - Implement interior mutability patterns where appropriate - Potentially use RwLock for shared read access 5. **Performance Optimizations**: - Consider memory-mapped files for lookup tables - Implement caching for frequently accessed records - Use zero-copy operations where possible ## Use Cases OurDB is particularly useful for: - Applications requiring simple but efficient data persistence - Systems needing to track historical changes to data - Embedded applications with limited resources - Scenarios where data integrity is critical - Use cases requiring both memory and disk-based storage options ## Acceptance Criteria 1. All existing functionality is properly ported 2. All tests pass with equivalent behavior to the V implementation 3. API is idiomatic Rust while maintaining functional equivalence 4. Performance is at least on par with the V implementation 5. Documentation is comprehensive and includes examples 6. Code follows Rust best practices and passes clippy lints 7. Existing OurDB files can be read by the Rust implementation ## Resources - Original V implementation: `github/freeflowuniverse/herolib/lib/data/ourdb` - Rust documentation: https://doc.rust-lang.org/book/ - Crates to consider: - bytes for efficient byte manipulation - thiserror for error handling - memmap2 for memory-mapped files - crc32fast for CRC32 calculation
Author
Owner

architecture defined: 5c5225c8f7

architecture defined: https://git.ourworld.tf/herocode/db/commit/5c5225c8f77d5c435be7fa3463d2115be544255c
Author
Owner

ported with 0eedec9ed0

ported with https://git.ourworld.tf/herocode/db/commit/0eedec9ed08be6600f320e7334fc15dc71804a8c
timur closed this issue 2025-04-09 09:40:34 +00:00
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: herocode/db#1
No description provided.