# Full-Text Search with Tantivy HeroDB includes powerful full-text search capabilities powered by [Tantivy](https://github.com/quickwit-oss/tantivy), a fast full-text search engine library written in Rust. This provides Redis-compatible search commands similar to RediSearch. ## Overview The search functionality allows you to: - Create search indexes with custom schemas - Index documents with multiple field types - Perform complex queries with filters - Support for text, numeric, date, and geographic data - Real-time search with high performance ## Search Commands ### FT.CREATE - Create Search Index Create a new search index with a defined schema. ```bash FT.CREATE index_name SCHEMA field_name field_type [options] [field_name field_type [options] ...] ``` **Field Types:** - `TEXT` - Full-text searchable text fields - `NUMERIC` - Numeric fields (integers, floats) - `TAG` - Tag fields for exact matching - `GEO` - Geographic coordinates (lat,lon) - `DATE` - Date/timestamp fields **Field Options:** - `STORED` - Store field value for retrieval - `INDEXED` - Make field searchable - `TOKENIZED` - Enable tokenization for text fields - `FAST` - Enable fast access for numeric fields **Example:** ```bash # Create a product search index FT.CREATE products SCHEMA title TEXT STORED INDEXED TOKENIZED description TEXT STORED INDEXED TOKENIZED price NUMERIC STORED INDEXED FAST category TAG STORED location GEO STORED created_date DATE STORED INDEXED ``` ### FT.ADD - Add Document to Index Add a document to a search index. ```bash FT.ADD index_name doc_id [SCORE score] FIELDS field_name field_value [field_name field_value ...] ``` **Example:** ```bash # Add a product document FT.ADD products product:1 SCORE 1.0 FIELDS title "Wireless Headphones" description "High-quality wireless headphones with noise cancellation" price 199.99 category "electronics" location "37.7749,-122.4194" created_date 1640995200000 ``` ### FT.SEARCH - Search Documents Search for documents in an index. ```bash FT.SEARCH index_name query [LIMIT offset count] [FILTER field min max] [RETURN field [field ...]] ``` **Query Syntax:** - Simple terms: `wireless headphones` - Phrase queries: `"noise cancellation"` - Field-specific: `title:wireless` - Boolean operators: `wireless AND headphones` - Wildcards: `head*` **Examples:** ```bash # Simple text search FT.SEARCH products "wireless headphones" # Search with filters FT.SEARCH products "headphones" FILTER price 100 300 LIMIT 0 10 # Field-specific search FT.SEARCH products "title:wireless AND category:electronics" # Return specific fields only FT.SEARCH products "*" RETURN title price ``` ### FT.DEL - Delete Document Remove a document from the search index. ```bash FT.DEL index_name doc_id ``` **Example:** ```bash FT.DEL products product:1 ``` ### FT.INFO - Get Index Information Get information about a search index. ```bash FT.INFO index_name ``` **Returns:** - Index name and document count - Field definitions and types - Index configuration **Example:** ```bash FT.INFO products ``` ### FT.DROP - Drop Index Delete an entire search index. ```bash FT.DROP index_name ``` **Example:** ```bash FT.DROP products ``` ### FT.ALTER - Alter Index Schema Add new fields to an existing index. ```bash FT.ALTER index_name SCHEMA ADD field_name field_type [options] ``` **Example:** ```bash FT.ALTER products SCHEMA ADD brand TAG STORED ``` ### FT.AGGREGATE - Aggregate Search Results Perform aggregations on search results. ```bash FT.AGGREGATE index_name query [GROUPBY field] [REDUCE function field AS alias] ``` **Example:** ```bash # Group products by category and count FT.AGGREGATE products "*" GROUPBY category REDUCE COUNT 0 AS count ``` ## Field Types in Detail ### TEXT Fields - **Purpose**: Full-text search on natural language content - **Features**: Tokenization, stemming, stop-word removal - **Options**: `STORED`, `INDEXED`, `TOKENIZED` - **Example**: Product titles, descriptions, content ### NUMERIC Fields - **Purpose**: Numeric data for range queries and sorting - **Types**: I64, U64, F64 - **Options**: `STORED`, `INDEXED`, `FAST` - **Example**: Prices, quantities, ratings ### TAG Fields - **Purpose**: Exact-match categorical data - **Features**: No tokenization, exact string matching - **Options**: `STORED`, case sensitivity control - **Example**: Categories, brands, status values ### GEO Fields - **Purpose**: Geographic coordinates - **Format**: "latitude,longitude" (e.g., "37.7749,-122.4194") - **Features**: Geographic distance queries - **Options**: `STORED` ### DATE Fields - **Purpose**: Timestamp and date data - **Format**: Unix timestamp in milliseconds - **Features**: Range queries, temporal filtering - **Options**: `STORED`, `INDEXED`, `FAST` ## Search Query Syntax ### Basic Queries ```bash # Single term FT.SEARCH products "wireless" # Multiple terms (AND by default) FT.SEARCH products "wireless headphones" # Phrase query FT.SEARCH products "\"noise cancellation\"" ``` ### Field-Specific Queries ```bash # Search in specific field FT.SEARCH products "title:wireless" # Multiple field queries FT.SEARCH products "title:wireless AND description:bluetooth" ``` ### Boolean Operators ```bash # AND operator FT.SEARCH products "wireless AND headphones" # OR operator FT.SEARCH products "wireless OR bluetooth" # NOT operator FT.SEARCH products "headphones NOT wired" ``` ### Wildcards and Fuzzy Search ```bash # Wildcard search FT.SEARCH products "head*" # Fuzzy search (approximate matching) FT.SEARCH products "%headphone%" ``` ### Range Queries ```bash # Numeric range in query FT.SEARCH products "@price:[100 300]" # Date range FT.SEARCH products "@created_date:[1640995200000 1672531200000]" ``` ## Filtering and Sorting ### FILTER Clause ```bash # Numeric filter FT.SEARCH products "headphones" FILTER price 100 300 # Multiple filters FT.SEARCH products "*" FILTER price 100 500 FILTER rating 4 5 ``` ### LIMIT Clause ```bash # Pagination FT.SEARCH products "wireless" LIMIT 0 10 # First 10 results FT.SEARCH products "wireless" LIMIT 10 10 # Next 10 results ``` ### RETURN Clause ```bash # Return specific fields FT.SEARCH products "*" RETURN title price # Return all stored fields (default) FT.SEARCH products "*" ``` ## Performance Considerations ### Indexing Strategy - Only index fields you need to search on - Use `FAST` option for frequently filtered numeric fields - Consider storage vs. search performance trade-offs ### Query Optimization - Use specific field queries when possible - Combine filters with text queries for better performance - Use pagination with LIMIT for large result sets ### Memory Usage - Tantivy indexes are memory-mapped for performance - Index size depends on document count and field configuration - Monitor disk space for index storage ## Integration with Redis Commands Search indexes work alongside regular Redis data: ```bash # Store product data in Redis hash HSET product:1 title "Wireless Headphones" price "199.99" # Index the same data for search FT.ADD products product:1 FIELDS title "Wireless Headphones" price 199.99 # Search returns document IDs that can be used with Redis commands FT.SEARCH products "wireless" # Returns: product:1 # Retrieve full data using Redis HGETALL product:1 ``` ## Example Use Cases ### E-commerce Product Search ```bash # Create product catalog index FT.CREATE catalog SCHEMA name TEXT STORED INDEXED TOKENIZED description TEXT INDEXED TOKENIZED price NUMERIC STORED INDEXED FAST category TAG STORED brand TAG STORED rating NUMERIC STORED FAST # Add products FT.ADD catalog prod:1 FIELDS name "iPhone 14" price 999 category "phones" brand "apple" rating 4.5 FT.ADD catalog prod:2 FIELDS name "Samsung Galaxy" price 899 category "phones" brand "samsung" rating 4.3 # Search queries FT.SEARCH catalog "iPhone" FT.SEARCH catalog "phones" FILTER price 800 1000 FT.SEARCH catalog "@brand:apple" ``` ### Content Management ```bash # Create content index FT.CREATE content SCHEMA title TEXT STORED INDEXED TOKENIZED body TEXT INDEXED TOKENIZED author TAG STORED published DATE STORED INDEXED tags TAG STORED # Search content FT.SEARCH content "machine learning" FT.SEARCH content "@author:john AND @tags:ai" FT.SEARCH content "*" FILTER published 1640995200000 1672531200000 ``` ### Geographic Search ```bash # Create location-based index FT.CREATE places SCHEMA name TEXT STORED INDEXED TOKENIZED location GEO STORED type TAG STORED # Add locations FT.ADD places place:1 FIELDS name "Golden Gate Bridge" location "37.8199,-122.4783" type "landmark" # Geographic queries (future feature) FT.SEARCH places "@location:[37.7749 -122.4194 10 km]" ``` ## Error Handling Common error responses: - `ERR index not found` - Index doesn't exist - `ERR field not found` - Field not defined in schema - `ERR invalid query syntax` - Malformed query - `ERR document not found` - Document ID doesn't exist ## Best Practices 1. **Schema Design**: Plan your schema carefully - changes require reindexing 2. **Field Selection**: Only store and index fields you actually need 3. **Batch Operations**: Add multiple documents efficiently 4. **Query Testing**: Test queries for performance with realistic data 5. **Monitoring**: Monitor index size and query performance 6. **Backup**: Include search indexes in backup strategies ## Future Enhancements Planned features: - Geographic distance queries - Advanced aggregations and faceting - Highlighting of search results - Synonyms and custom analyzers - Real-time suggestions and autocomplete - Index replication and sharding