Files

despiegk d4d3660bac ...

2025-08-23 04:58:41 +02:00

9.4 KiB

Raw Blame History

Full-Text Search with Tantivy

HeroDB includes powerful full-text search capabilities powered by Tantivy, a fast full-text search engine library written in Rust. This provides Redis-compatible search commands similar to RediSearch.

Overview

The search functionality allows you to:

Create search indexes with custom schemas
Index documents with multiple field types
Perform complex queries with filters
Support for text, numeric, date, and geographic data
Real-time search with high performance

Search Commands

FT.CREATE - Create Search Index

Create a new search index with a defined schema.

FT.CREATE index_name SCHEMA field_name field_type [options] [field_name field_type [options] ...]

Field Types:

TEXT - Full-text searchable text fields
NUMERIC - Numeric fields (integers, floats)
TAG - Tag fields for exact matching
GEO - Geographic coordinates (lat,lon)
DATE - Date/timestamp fields

Field Options:

STORED - Store field value for retrieval
INDEXED - Make field searchable
TOKENIZED - Enable tokenization for text fields
FAST - Enable fast access for numeric fields

Example:

# Create a product search index
FT.CREATE products SCHEMA 
  title TEXT STORED INDEXED TOKENIZED
  description TEXT STORED INDEXED TOKENIZED  
  price NUMERIC STORED INDEXED FAST
  category TAG STORED
  location GEO STORED
  created_date DATE STORED INDEXED

FT.ADD - Add Document to Index

Add a document to a search index.

FT.ADD index_name doc_id [SCORE score] FIELDS field_name field_value [field_name field_value ...]

Example:

# Add a product document
FT.ADD products product:1 SCORE 1.0 FIELDS 
  title "Wireless Headphones" 
  description "High-quality wireless headphones with noise cancellation"
  price 199.99
  category "electronics"
  location "37.7749,-122.4194"
  created_date 1640995200000

FT.SEARCH - Search Documents

Search for documents in an index.

FT.SEARCH index_name query [LIMIT offset count] [FILTER field min max] [RETURN field [field ...]]

Query Syntax:

Simple terms: wireless headphones
Phrase queries: "noise cancellation"
Field-specific: title:wireless
Boolean operators: wireless AND headphones
Wildcards: head*

Examples:

# Simple text search
FT.SEARCH products "wireless headphones"

# Search with filters
FT.SEARCH products "headphones" FILTER price 100 300 LIMIT 0 10

# Field-specific search
FT.SEARCH products "title:wireless AND category:electronics"

# Return specific fields only
FT.SEARCH products "*" RETURN title price

FT.DEL - Delete Document

Remove a document from the search index.

FT.DEL index_name doc_id

Example:

FT.DEL products product:1

FT.INFO - Get Index Information

Get information about a search index.

FT.INFO index_name

Returns:

Index name and document count
Field definitions and types
Index configuration

Example:

FT.INFO products

FT.DROP - Drop Index

Delete an entire search index.

FT.DROP index_name

Example:

FT.DROP products

FT.ALTER - Alter Index Schema

Add new fields to an existing index.

FT.ALTER index_name SCHEMA ADD field_name field_type [options]

Example:

FT.ALTER products SCHEMA ADD brand TAG STORED

FT.AGGREGATE - Aggregate Search Results

Perform aggregations on search results.

FT.AGGREGATE index_name query [GROUPBY field] [REDUCE function field AS alias]

Example:

# Group products by category and count
FT.AGGREGATE products "*" GROUPBY category REDUCE COUNT 0 AS count

Field Types in Detail

TEXT Fields

Purpose: Full-text search on natural language content
Features: Tokenization, stemming, stop-word removal
Options: STORED, INDEXED, TOKENIZED
Example: Product titles, descriptions, content

NUMERIC Fields

Purpose: Numeric data for range queries and sorting
Types: I64, U64, F64
Options: STORED, INDEXED, FAST
Example: Prices, quantities, ratings

TAG Fields

Purpose: Exact-match categorical data
Features: No tokenization, exact string matching
Options: STORED, case sensitivity control
Example: Categories, brands, status values

GEO Fields

Purpose: Geographic coordinates
Format: "latitude,longitude" (e.g., "37.7749,-122.4194")
Features: Geographic distance queries
Options: STORED

DATE Fields

Purpose: Timestamp and date data
Format: Unix timestamp in milliseconds
Features: Range queries, temporal filtering
Options: STORED, INDEXED, FAST

Search Query Syntax

Basic Queries

# Single term
FT.SEARCH products "wireless"

# Multiple terms (AND by default)
FT.SEARCH products "wireless headphones"

# Phrase query
FT.SEARCH products "\"noise cancellation\""

Field-Specific Queries

# Search in specific field
FT.SEARCH products "title:wireless"

# Multiple field queries
FT.SEARCH products "title:wireless AND description:bluetooth"

Boolean Operators

# AND operator
FT.SEARCH products "wireless AND headphones"

# OR operator  
FT.SEARCH products "wireless OR bluetooth"

# NOT operator
FT.SEARCH products "headphones NOT wired"

Wildcards and Fuzzy Search

# Wildcard search
FT.SEARCH products "head*"

# Fuzzy search (approximate matching)
FT.SEARCH products "%headphone%"

Range Queries

# Numeric range in query
FT.SEARCH products "@price:[100 300]"

# Date range
FT.SEARCH products "@created_date:[1640995200000 1672531200000]"

Filtering and Sorting

FILTER Clause

# Numeric filter
FT.SEARCH products "headphones" FILTER price 100 300

# Multiple filters
FT.SEARCH products "*" FILTER price 100 500 FILTER rating 4 5

LIMIT Clause

# Pagination
FT.SEARCH products "wireless" LIMIT 0 10    # First 10 results
FT.SEARCH products "wireless" LIMIT 10 10   # Next 10 results

RETURN Clause

# Return specific fields
FT.SEARCH products "*" RETURN title price

# Return all stored fields (default)
FT.SEARCH products "*"

Performance Considerations

Indexing Strategy

Only index fields you need to search on
Use FAST option for frequently filtered numeric fields
Consider storage vs. search performance trade-offs

Query Optimization

Use specific field queries when possible
Combine filters with text queries for better performance
Use pagination with LIMIT for large result sets

Memory Usage

Tantivy indexes are memory-mapped for performance
Index size depends on document count and field configuration
Monitor disk space for index storage

Integration with Redis Commands

Search indexes work alongside regular Redis data:

# Store product data in Redis hash
HSET product:1 title "Wireless Headphones" price "199.99"

# Index the same data for search
FT.ADD products product:1 FIELDS title "Wireless Headphones" price 199.99

# Search returns document IDs that can be used with Redis commands
FT.SEARCH products "wireless"
# Returns: product:1

# Retrieve full data using Redis
HGETALL product:1

Example Use Cases

E-commerce Product Search

# Create product catalog index
FT.CREATE catalog SCHEMA 
  name TEXT STORED INDEXED TOKENIZED
  description TEXT INDEXED TOKENIZED
  price NUMERIC STORED INDEXED FAST
  category TAG STORED
  brand TAG STORED
  rating NUMERIC STORED FAST

# Add products
FT.ADD catalog prod:1 FIELDS name "iPhone 14" price 999 category "phones" brand "apple" rating 4.5
FT.ADD catalog prod:2 FIELDS name "Samsung Galaxy" price 899 category "phones" brand "samsung" rating 4.3

# Search queries
FT.SEARCH catalog "iPhone"
FT.SEARCH catalog "phones" FILTER price 800 1000
FT.SEARCH catalog "@brand:apple"

Content Management

# Create content index
FT.CREATE content SCHEMA
  title TEXT STORED INDEXED TOKENIZED
  body TEXT INDEXED TOKENIZED
  author TAG STORED
  published DATE STORED INDEXED
  tags TAG STORED

# Search content
FT.SEARCH content "machine learning"
FT.SEARCH content "@author:john AND @tags:ai"
FT.SEARCH content "*" FILTER published 1640995200000 1672531200000

Geographic Search

# Create location-based index
FT.CREATE places SCHEMA
  name TEXT STORED INDEXED TOKENIZED
  location GEO STORED
  type TAG STORED

# Add locations
FT.ADD places place:1 FIELDS name "Golden Gate Bridge" location "37.8199,-122.4783" type "landmark"

# Geographic queries (future feature)
FT.SEARCH places "@location:[37.7749 -122.4194 10 km]"

Error Handling

Common error responses:

ERR index not found - Index doesn't exist
ERR field not found - Field not defined in schema
ERR invalid query syntax - Malformed query
ERR document not found - Document ID doesn't exist

Best Practices

Schema Design: Plan your schema carefully - changes require reindexing
Field Selection: Only store and index fields you actually need
Batch Operations: Add multiple documents efficiently
Query Testing: Test queries for performance with realistic data
Monitoring: Monitor index size and query performance
Backup: Include search indexes in backup strategies

Future Enhancements

Planned features:

Geographic distance queries
Advanced aggregations and faceting
Highlighting of search results
Synonyms and custom analyzers
Real-time suggestions and autocomplete
Index replication and sharding

9.4 KiB Raw Blame History

Full-Text Search with Tantivy

Overview

Search Commands

FT.CREATE - Create Search Index

FT.ADD - Add Document to Index

FT.SEARCH - Search Documents

FT.DEL - Delete Document

FT.INFO - Get Index Information

FT.DROP - Drop Index

FT.ALTER - Alter Index Schema

FT.AGGREGATE - Aggregate Search Results

Field Types in Detail

TEXT Fields

NUMERIC Fields

TAG Fields

GEO Fields

DATE Fields

Search Query Syntax

Basic Queries

Field-Specific Queries

Boolean Operators

Wildcards and Fuzzy Search

Range Queries

Filtering and Sorting

FILTER Clause

LIMIT Clause

RETURN Clause

Performance Considerations

Indexing Strategy

Query Optimization

Memory Usage

Integration with Redis Commands

Example Use Cases

E-commerce Product Search

Content Management

Geographic Search

Error Handling

Best Practices

Future Enhancements

9.4 KiB

Raw Blame History