Pinecone Knowledge Search Node
Overview
The Pinecone Knowledge Search Node in Fleak allows users to efficiently query high-dimensional vector data stored in a Pinecone database. This node facilitates semantic search by comparing query vectors to stored embeddings and returning the most relevant results, making it suitable for applications like intelligent search engines and recommendation systems.
How the Knowledge Search Node Works
Pinecone leverages vector embeddings to perform similarity searches:
- Vector Embeddings: Data is transformed into numerical representations called vectors, which capture semantic meaning and context. These vectors are generated using embedding models like sentence transformers or OpenAI models and are stored in Pinecone's index.
- Similarity Search: When a query is processed, it is converted into a vector using the same embedding model. The query vector is then compared to the stored vectors in Pinecone using similarity metrics such as cosine similarity, dot product, or Euclidean distance to find the most relevant matches.
Configurations
Pinecone Connection
To use the Pinecone Vector Database Node, start by connecting Fleak to your Pinecone account:
-
Select an Existing Connection: Choose a previously created connection from the dropdown menu.
-
Create a New Connection:
- Sign up for a Pinecone account if you don’t have one.
- Obtain your API key by logging into Pinecone, navigating to the API Keys section, and creating or copying a key.
- In Fleak, select the Pinecone node and click Select or Create New, then Create New. Name your connection and paste the API key from your Pinecone account.
Index Setup
Either select an existing index or create a new one by
- Navigating to the Indexes section in the Pinecone console.
- Clicking Create Index and filling in name, cloud, region, index type, dimension, and metric type
- Dimension: Ensure that the vector dimension in your index matches the dimensions produced by your embedding model, e.g.OpenAI text-embedding-3-small has dimensions of 1536 by default and text-embedding-3-large has dimensions of 3072. Consistent dimensions between query vectors and stored vectors are essential for mathematical operations like similarity calculations. Mismatched dimensions lead to errors and failed comparisons.
- Metric Type: The distance metric defines how the similarity between vectors is calculated.
- Cosine: Measures the cosine of the angle between vectors, ideal for text and semantic similarity tasks.
- Dot Product: Suited for ranking tasks and positive vector spaces.
- Euclidean: Computes the straight-line distance between vectors, often used for image or geometric data.
- Once created, return to your Fleak workflow, where the new index will be selectable.
Knowledge Search Settings
- Vector Path: Enter the Json Path that points to the field in your input data containing the query vector. This
path enables Fleak to extract vector values and use them in similarity searches within Pinecone. For example, if the
input data looks the following json and
$.query_vector
is specified, Fleak will use[0.12, 0.45, 0.78, 0.34]
as the query vector
{
"query_vector": [
0.12,
0.45,
0.78,
0.34
]
}
- Number of Top Results: Set the number of top results to retrieve during the search
- Output Field Name: Specify the field name where the search results will be stored. Default name is
query_output
Advanced Settings
- Rerank Results: Enabling the Rerank Results option allows you to apply an additional layer of filtering and
sorting on the initial search results. This feature is useful for refining the relevance of results based on a
secondary criterion. When toggled, the following configuration fields appear:
- Query String Path:
- Description: Specify the Json path of the input string that will be used as the target to rerank the retrieved results.
- Usage: This path points to the input field containing the string that will guide the reranking process.
- Example: If your input data includes a field
{"query_text": "what camera should I bring to the ski trip"}
, use$ .query_text
as the Json path.
- Number of Top Rerank Results:
- Description: Define how many results should be retained after the rerank step. This number must be less than or equal to the number of top results from the initial retrieval.
- Usage: This ensures that the rerank process only applies to a subset of the initial results for further precision.
- Best Practice: Set this value based on the level of refinement needed, ensuring it does not exceed the total top results returned in the initial retrieval.
- Rank Field:
- Description: Specify the metadata field saved with the vectors that will be used for applying the rerank algorithm. This field is typically a string field and is crucial for effective reranking.
- Usage: Ensure that the Rank Field metadata is consistently set in the stored vectors to achieve accurate reranking.
- Example: If your vectors have a metadata field such as
{"description": "ADT 9-piece Smart Home Security System with Google Nest Products & Pro"}
, use description as the Rank Field.
- Query String Path:
- Namespace: A namespace in Pinecone helps organize data within an index, allowing logical separation for different projects or use cases. If no namespace is specified, the default namespace is used.
- Include Values: Toggle this option to include the actual vector values in the search results.
- Filter with Metadata: Enable this feature to
apply metadata-based filters to the search
results. Provide a Json object representing the metadata criteria. For example:
{
"sub_category": "Cameras & Camcorders",
"pc_write_ts": { "$gte": 172400 }
}
For more on Json path, click here to refer to the Json path Documentation.