Files
herodb/docs/search.md
2025-08-23 04:58:41 +02:00

9.4 KiB

Full-Text Search with Tantivy

HeroDB includes powerful full-text search capabilities powered by Tantivy, a fast full-text search engine library written in Rust. This provides Redis-compatible search commands similar to RediSearch.

Overview

The search functionality allows you to:

  • Create search indexes with custom schemas
  • Index documents with multiple field types
  • Perform complex queries with filters
  • Support for text, numeric, date, and geographic data
  • Real-time search with high performance

Search Commands

FT.CREATE - Create Search Index

Create a new search index with a defined schema.

FT.CREATE index_name SCHEMA field_name field_type [options] [field_name field_type [options] ...]

Field Types:

  • TEXT - Full-text searchable text fields
  • NUMERIC - Numeric fields (integers, floats)
  • TAG - Tag fields for exact matching
  • GEO - Geographic coordinates (lat,lon)
  • DATE - Date/timestamp fields

Field Options:

  • STORED - Store field value for retrieval
  • INDEXED - Make field searchable
  • TOKENIZED - Enable tokenization for text fields
  • FAST - Enable fast access for numeric fields

Example:

# Create a product search index
FT.CREATE products SCHEMA 
  title TEXT STORED INDEXED TOKENIZED
  description TEXT STORED INDEXED TOKENIZED  
  price NUMERIC STORED INDEXED FAST
  category TAG STORED
  location GEO STORED
  created_date DATE STORED INDEXED

FT.ADD - Add Document to Index

Add a document to a search index.

FT.ADD index_name doc_id [SCORE score] FIELDS field_name field_value [field_name field_value ...]

Example:

# Add a product document
FT.ADD products product:1 SCORE 1.0 FIELDS 
  title "Wireless Headphones" 
  description "High-quality wireless headphones with noise cancellation"
  price 199.99
  category "electronics"
  location "37.7749,-122.4194"
  created_date 1640995200000

FT.SEARCH - Search Documents

Search for documents in an index.

FT.SEARCH index_name query [LIMIT offset count] [FILTER field min max] [RETURN field [field ...]]

Query Syntax:

  • Simple terms: wireless headphones
  • Phrase queries: "noise cancellation"
  • Field-specific: title:wireless
  • Boolean operators: wireless AND headphones
  • Wildcards: head*

Examples:

# Simple text search
FT.SEARCH products "wireless headphones"

# Search with filters
FT.SEARCH products "headphones" FILTER price 100 300 LIMIT 0 10

# Field-specific search
FT.SEARCH products "title:wireless AND category:electronics"

# Return specific fields only
FT.SEARCH products "*" RETURN title price

FT.DEL - Delete Document

Remove a document from the search index.

FT.DEL index_name doc_id

Example:

FT.DEL products product:1

FT.INFO - Get Index Information

Get information about a search index.

FT.INFO index_name

Returns:

  • Index name and document count
  • Field definitions and types
  • Index configuration

Example:

FT.INFO products

FT.DROP - Drop Index

Delete an entire search index.

FT.DROP index_name

Example:

FT.DROP products

FT.ALTER - Alter Index Schema

Add new fields to an existing index.

FT.ALTER index_name SCHEMA ADD field_name field_type [options]

Example:

FT.ALTER products SCHEMA ADD brand TAG STORED

FT.AGGREGATE - Aggregate Search Results

Perform aggregations on search results.

FT.AGGREGATE index_name query [GROUPBY field] [REDUCE function field AS alias]

Example:

# Group products by category and count
FT.AGGREGATE products "*" GROUPBY category REDUCE COUNT 0 AS count

Field Types in Detail

TEXT Fields

  • Purpose: Full-text search on natural language content
  • Features: Tokenization, stemming, stop-word removal
  • Options: STORED, INDEXED, TOKENIZED
  • Example: Product titles, descriptions, content

NUMERIC Fields

  • Purpose: Numeric data for range queries and sorting
  • Types: I64, U64, F64
  • Options: STORED, INDEXED, FAST
  • Example: Prices, quantities, ratings

TAG Fields

  • Purpose: Exact-match categorical data
  • Features: No tokenization, exact string matching
  • Options: STORED, case sensitivity control
  • Example: Categories, brands, status values

GEO Fields

  • Purpose: Geographic coordinates
  • Format: "latitude,longitude" (e.g., "37.7749,-122.4194")
  • Features: Geographic distance queries
  • Options: STORED

DATE Fields

  • Purpose: Timestamp and date data
  • Format: Unix timestamp in milliseconds
  • Features: Range queries, temporal filtering
  • Options: STORED, INDEXED, FAST

Search Query Syntax

Basic Queries

# Single term
FT.SEARCH products "wireless"

# Multiple terms (AND by default)
FT.SEARCH products "wireless headphones"

# Phrase query
FT.SEARCH products "\"noise cancellation\""

Field-Specific Queries

# Search in specific field
FT.SEARCH products "title:wireless"

# Multiple field queries
FT.SEARCH products "title:wireless AND description:bluetooth"

Boolean Operators

# AND operator
FT.SEARCH products "wireless AND headphones"

# OR operator  
FT.SEARCH products "wireless OR bluetooth"

# NOT operator
FT.SEARCH products "headphones NOT wired"
# Wildcard search
FT.SEARCH products "head*"

# Fuzzy search (approximate matching)
FT.SEARCH products "%headphone%"

Range Queries

# Numeric range in query
FT.SEARCH products "@price:[100 300]"

# Date range
FT.SEARCH products "@created_date:[1640995200000 1672531200000]"

Filtering and Sorting

FILTER Clause

# Numeric filter
FT.SEARCH products "headphones" FILTER price 100 300

# Multiple filters
FT.SEARCH products "*" FILTER price 100 500 FILTER rating 4 5

LIMIT Clause

# Pagination
FT.SEARCH products "wireless" LIMIT 0 10    # First 10 results
FT.SEARCH products "wireless" LIMIT 10 10   # Next 10 results

RETURN Clause

# Return specific fields
FT.SEARCH products "*" RETURN title price

# Return all stored fields (default)
FT.SEARCH products "*"

Performance Considerations

Indexing Strategy

  • Only index fields you need to search on
  • Use FAST option for frequently filtered numeric fields
  • Consider storage vs. search performance trade-offs

Query Optimization

  • Use specific field queries when possible
  • Combine filters with text queries for better performance
  • Use pagination with LIMIT for large result sets

Memory Usage

  • Tantivy indexes are memory-mapped for performance
  • Index size depends on document count and field configuration
  • Monitor disk space for index storage

Integration with Redis Commands

Search indexes work alongside regular Redis data:

# Store product data in Redis hash
HSET product:1 title "Wireless Headphones" price "199.99"

# Index the same data for search
FT.ADD products product:1 FIELDS title "Wireless Headphones" price 199.99

# Search returns document IDs that can be used with Redis commands
FT.SEARCH products "wireless"
# Returns: product:1

# Retrieve full data using Redis
HGETALL product:1

Example Use Cases

# Create product catalog index
FT.CREATE catalog SCHEMA 
  name TEXT STORED INDEXED TOKENIZED
  description TEXT INDEXED TOKENIZED
  price NUMERIC STORED INDEXED FAST
  category TAG STORED
  brand TAG STORED
  rating NUMERIC STORED FAST

# Add products
FT.ADD catalog prod:1 FIELDS name "iPhone 14" price 999 category "phones" brand "apple" rating 4.5
FT.ADD catalog prod:2 FIELDS name "Samsung Galaxy" price 899 category "phones" brand "samsung" rating 4.3

# Search queries
FT.SEARCH catalog "iPhone"
FT.SEARCH catalog "phones" FILTER price 800 1000
FT.SEARCH catalog "@brand:apple"

Content Management

# Create content index
FT.CREATE content SCHEMA
  title TEXT STORED INDEXED TOKENIZED
  body TEXT INDEXED TOKENIZED
  author TAG STORED
  published DATE STORED INDEXED
  tags TAG STORED

# Search content
FT.SEARCH content "machine learning"
FT.SEARCH content "@author:john AND @tags:ai"
FT.SEARCH content "*" FILTER published 1640995200000 1672531200000
# Create location-based index
FT.CREATE places SCHEMA
  name TEXT STORED INDEXED TOKENIZED
  location GEO STORED
  type TAG STORED

# Add locations
FT.ADD places place:1 FIELDS name "Golden Gate Bridge" location "37.8199,-122.4783" type "landmark"

# Geographic queries (future feature)
FT.SEARCH places "@location:[37.7749 -122.4194 10 km]"

Error Handling

Common error responses:

  • ERR index not found - Index doesn't exist
  • ERR field not found - Field not defined in schema
  • ERR invalid query syntax - Malformed query
  • ERR document not found - Document ID doesn't exist

Best Practices

  1. Schema Design: Plan your schema carefully - changes require reindexing
  2. Field Selection: Only store and index fields you actually need
  3. Batch Operations: Add multiple documents efficiently
  4. Query Testing: Test queries for performance with realistic data
  5. Monitoring: Monitor index size and query performance
  6. Backup: Include search indexes in backup strategies

Future Enhancements

Planned features:

  • Geographic distance queries
  • Advanced aggregations and faceting
  • Highlighting of search results
  • Synonyms and custom analyzers
  • Real-time suggestions and autocomplete
  • Index replication and sharding