Skip to main content

Embedding Node

The Text Embeddings node converts text input into numerical vector representations using advanced embedding models. This node supports multiple providers and models, allowing for flexible text vectorization that can be used for semantic search, text similarity analysis, and other natural language processing tasks.

Configuration

  • Embedding Provider: Select the provider for text embedding services. Currently supported providers include:

    • OpenAI
    • Pinecone
  • Embedding Model: Choose the embedding model based on your requirements. Available models vary by provider.

    • OpenAI Models:
      • text-embedding-3-large (max input: 8191 tokens, 3072 dimensions)
      • text-embedding-ada-002 (max input: 8192 tokens, 1536 dimensions)
      • text-embedding-3-small (max input: 8191 tokens, 1536 dimensions)
    • Pinecone Models:
      • multilingual-e5-large (max input: 507 tokens, 1024 dimensions)
  • JSON Path: Specify the path to the text field in your input data that you want to create embeddings from. Use JSONPath syntax to reference the desired field. For Example, $.content or $.text.description. For more info about Fleak Json Path Syntax, please refer here.

  • Apply Chunking: Enable this option to break down large texts into smaller segments before creating embeddings. This is useful when dealing with long documents that exceed the token limit of the selected model.

  • Attached To Input Event: When enabled, the embedding output will be attached to the original input event rather than creating a new event. This helps maintain data context and relationships throughout your workflow.

Usage Tips

  • Choose the appropriate model based on your needs:
    • text-embedding-3-large for highest accuracy and dimensional richness
    • text-embedding-3-small for a balance of performance and efficiency
    • text-embedding-ada-002 for compatibility with existing systems
  • Consider token limits when selecting your model to ensure your text input will be processed completely
  • Use chunking for long documents to ensure all content is embedded while respecting model token limits
  • The choice of embedding model affects both the quality and dimensionality of the resulting vectors
  • Processing time and cost may vary between models
  • Ensure your JSON path correctly points to the text field you want to embed
  • Remember that the token limit includes any special characters and formatting in your text