LanceDBSink

The LanceDBSink class is a connector for LanceDB, an open-source, serverless vector database built for seamless integration and scale.

Properties

Required properties:

uri: URI for the LanceDB database.
table_name: Name of the LanceDB table to be used.

Optional properties:

api_key: If provided, connect to LanceDB cloud; otherwise, connect to a database on file system or cloud storage. region: Region for the use of LanceDB cloud.
create_index: Boolean to decide whether to create an index for ANN search or use flat search.
metric: The distance metric to use (default is ‘cosine’).
num_partitions: The number of partitions of the index.
num_sub_vectors: The number of sub-vectors created during Product Quantization (PQ).
accelerator: Specifies the accelerator to use for the index creation process (e.g., GPU or MPS).

Index creation is only required when dealing with 100k+ vectors. Below that threshold, set create_index to false. For more information on index creation and configuring partitions and sub vectors see: LanceDB documentation

from neumai.SinkConnectors import LanceDBSink

# Setup the LanceDBSink with required credentials and index information
lancedb_sink = LanceDBSink(
    uri = "lancedb_uri",
    table_name = "test_table",
    # if using LanceDB Cloud add 
    # api_key = "lancedb_cloud_api_key",
    # if using for more thatn 100k vectors then add:
    # create_index = True
    # ensure that your vector dimensions (ex. 1536 for OpenAI text-ada-002) is divisible by num_sub_vectors (default 96)
    # ensure that num_partitions is less than the number of vectors you are adding (default to 256)
)

Introduction

Source Connectors

Embed Connectors

Sink Connectors

Utilities

Properties

Introduction

Source Connectors

Embed Connectors

Sink Connectors

Utilities

​Properties

Properties