# LanceDB Text and Images: End-to-End Example This guide demonstrates creating a Lance backend database, ingesting two text documents and two images, performing searches over both, and cleaning up the datasets. Prerequisites - Build HeroDB and start the server with JSON-RPC enabled. Commands: ```bash cargo build --release ./target/release/herodb --dir /tmp/herodb --admin-secret mysecret --port 6379 --enable-rpc ``` We'll use: - redis-cli for RESP commands against port 6379 - curl for JSON-RPC against 8080 if desired - Deterministic local embedders to avoid external dependencies: testhash (text, dim 64) and testimagehash (image, dim 512) 0) Create a Lance-backed database (JSON-RPC) Request: ```json { "jsonrpc": "2.0", "id": 1, "method": "herodb_createDatabase", "params": ["Lance", { "name": "media-db", "storage_path": null, "max_size": null, "redis_version": null }, null] } ``` Response returns db_id (assume 1). Select DB over RESP: ```bash redis-cli -p 6379 SELECT 1 # → OK ``` 1) Configure embedding providers We'll create two datasets with independent embedding configs: - textset → provider testhash, dim 64 - imageset → provider testimagehash, dim 512 Text config: ```bash redis-cli -p 6379 LANCE.EMBEDDING CONFIG SET textset PROVIDER testhash MODEL any PARAM dim 64 # → OK ``` Image config: ```bash redis-cli -p 6379 LANCE.EMBEDDING CONFIG SET imageset PROVIDER testimagehash MODEL any PARAM dim 512 # → OK ``` 2) Create datasets ```bash redis-cli -p 6379 LANCE.CREATE textset DIM 64 # → OK redis-cli -p 6379 LANCE.CREATE imageset DIM 512 # → OK ``` 3) Ingest two text documents (server-side embedding) ```bash redis-cli -p 6379 LANCE.STORE textset ID doc-1 TEXT "The quick brown fox jumps over the lazy dog" META title "Fox" category "animal" # → OK redis-cli -p 6379 LANCE.STORE textset ID doc-2 TEXT "A fast auburn fox vaulted a sleepy canine" META title "Paraphrase" category "animal" # → OK ``` 4) Ingest two images You can provide a URI or base64 bytes. Use URI for URIs, BYTES for base64 data. Example using free placeholder images: ```bash # Store via URI redis-cli -p 6379 LANCE.STOREIMAGE imageset ID img-1 URI "https://picsum.photos/seed/1/256/256" META title "Seed1" group "demo" # → OK redis-cli -p 6379 LANCE.STOREIMAGE imageset ID img-2 URI "https://picsum.photos/seed/2/256/256" META title "Seed2" group "demo" # → OK ``` If your environment blocks outbound HTTP, you can embed image bytes: ```bash # Example: read a local file and base64 it (replace path) b64=$(base64 -w0 ./image1.png) redis-cli -p 6379 LANCE.STOREIMAGE imageset ID img-b64-1 BYTES "$b64" META title "Local1" group "demo" ``` 5) Search text ```bash # Top-2 nearest neighbors for a query redis-cli -p 6379 LANCE.SEARCH textset K 2 QUERY "quick brown fox" RETURN 1 title # → 1) [id, score, [k1,v1,...]] ``` With a filter (supports equality on schema or meta keys): ```bash redis-cli -p 6379 LANCE.SEARCH textset K 2 QUERY "fox jumps" FILTER "category = 'animal'" RETURN 1 title ``` 6) Search images ```bash # Provide a URI as the query redis-cli -p 6379 LANCE.SEARCHIMAGE imageset K 2 QUERYURI "https://picsum.photos/seed/1/256/256" RETURN 1 title # Or provide base64 bytes as the query qb64=$(curl -s https://picsum.photos/seed/3/256/256 | base64 -w0) redis-cli -p 6379 LANCE.SEARCHIMAGE imageset K 2 QUERYBYTES "$qb64" RETURN 1 title ``` 7) Inspect datasets ```bash redis-cli -p 6379 LANCE.LIST redis-cli -p 6379 LANCE.INFO textset redis-cli -p 6379 LANCE.INFO imageset ``` 8) Delete by id and drop datasets ```bash # Delete one record redis-cli -p 6379 LANCE.DEL textset doc-2 # → OK # Drop entire datasets redis-cli -p 6379 LANCE.DROP textset redis-cli -p 6379 LANCE.DROP imageset # → OK ``` Appendix: Using OpenAI embeddings instead of test providers Text: ```bash export OPENAI_API_KEY=sk-... redis-cli -p 6379 LANCE.EMBEDDING CONFIG SET textset PROVIDER openai MODEL text-embedding-3-small PARAM dim 512 redis-cli -p 6379 LANCE.CREATE textset DIM 512 ``` Azure OpenAI: ```bash export AZURE_OPENAI_API_KEY=... redis-cli -p 6379 LANCE.EMBEDDING CONFIG SET textset PROVIDER openai MODEL text-embedding-3-small \ PARAM use_azure true \ PARAM azure_endpoint https://myresource.openai.azure.com \ PARAM azure_deployment my-embed-deploy \ PARAM azure_api_version 2024-02-15 \ PARAM dim 512 ``` Notes: - Ensure dataset DIM matches the configured embedding dimension. - Lance is only available for non-admin databases (db_id >= 1). - On Lance DBs, only LANCE.* and basic control commands are allowed.