Skip to main content

Pinecone Vector Database Node

The Pinecone Vector Database Node in Fleak allows users to store and manage high-dimensional vector data, supporting applications that involve vector search and metadata filtering. This node is designed to simplify the integration and management of vector-based data within Fleak workflows, enabling efficient knowledge retrieval and data storage operations.

Configurations

  • Pinecone Connection: To use the Pinecone Vector Database Node, start by connecting Fleak to your Pinecone account:

    • Select an Existing Connection: Choose a previously created connection from the dropdown menu.

    • Create a New Connection:

      • Sign up for a Pinecone account if you don’t have one.
      • Obtain your API key by logging into Pinecone, navigating to the API Keys section, and creating or copying a key.
      • In Fleak, select the Pinecone node and click Select or Create New, then Create New. Name your connection and paste the API key from your Pinecone account.
  • Index Setup: Either select an existing index or create a new one by:

    • Navigating to the Indexes section in the Pinecone console.

    • Clicking Create Index and filling in name, cloud, region, index type, dimension, and metric type

      • Dimension: Ensure that the vector dimension in your index matches the dimensions produced by your embedding model, e.g.OpenAI text-embedding-3-small has dimensions of 1536 by default and text-embedding-3-large has dimensions of 3072. Consistent dimensions between query vectors and stored vectors are essential for mathematical operations like similarity calculations. Mismatched dimensions lead to errors and failed comparisons.

      • Metric Type: The distance metric defines how the similarity between vectors is calculated.

        • Cosine: Measures the cosine of the angle between vectors, ideal for text and semantic similarity tasks.
        • Dot Product: Suited for ranking tasks and positive vector spaces.
        • Euclidean: Computes the straight-line distance between vectors, often used for image or geometric data.
  • Write Data into Pinecone Index

    • Vector: specify the Json path that points to the field in your input data containing the vector. Fleak uses this path to locate the vector values in your input event. The values found at this path are then stored as vectors in the Pinecone database. For example, if your input data looks like this:

        {
      "embedding": [0.25, 0.5, 0.75, 0.1]
      }

      and you specify $.embedding as the Json path in the Vector Field, Fleak will extract [0.25, 0.5, 0.75, 0.1] and save these values as the vector in Pinecone.

    • Metadata Fields: they allow you to attach additional information as key-value pairs to each vector. This metadata provides context to the vector and can be useful for categorization, retrieval, and analysis.

      • Key Name: The label that represents the metadata field (e.g., category, timestamp, author).
      • Value: A Json path expression that points to the specific field in your input data containing the metadata value. Fleak will extract this value during data processing and store it alongside the vector in Pinecone.
      • What to Save as Metadata: Metadata can be any relevant data that helps provide context to the vector, such as:
        • Categories or Tags: E.g., {"category": "news"}
        • Timestamps: E.g., {"timestamp": "2024-11-10T10:00:00Z"}
        • Identifiers: E.g., {"user_id": "12345"}
        • Attributes: E.g., {"source": "social media", "author": "John Doe"} please head to Pinecone’s Filter with Metadata Documentation for more details
    • Namespaces: It act as containers within an index, allowing you to separate and organize data without mixing different sets. Data in a namespace can be queried or managed independently, making it suitable for segmenting projects or datasets. If not specified, the default namespace is used.

    • ID Path: Provide the Json path to the unique identifier for your data. If not specified, a random value will be generated

info

For more on Json path, click here to refer to the Json path Documentation.