Azure Blob Storage Sink Node

The azureblobsink node writes pipeline records to an Azure Blob Storage container as newline-delimited JSON (.jsonl) blobs. It batches records in memory and uploads each batch as a single blob, organized into date-partitioned virtual folders.

Typical use cases include archiving streaming events to Azure cold storage, producing date-partitioned data lake landings consumed by Synapse or Databricks, and regulatory retention of processed records.

Key Features

Batched NDJSON writes: records accumulate until the batch reaches batchSize, then are uploaded as a single .jsonl blob
Date-partitioned naming: every blob name embeds a UTC yyyy/MM/dd/HH-mm-ss path component plus a UUID, ready for tools that expect Hive-style partitions
Dual-mode auth: authenticate with either a full Azure Storage connection string or a credential containing the storage account name and key

Configuration

Field	Type	Required	Default	Description
`containerName`	`String`	Yes	—	Azure Blob Storage container to write to
`connectionString`	`String`	One of `connectionString` or `credentialId` is required	—	Full Azure Storage connection string. When set, takes priority over `credentialId`
`credentialId`	`String`	One of `connectionString` or `credentialId` is required	—	ID of a `UsernamePasswordCredential` in `jobContext.otherProperties` where `username` = storage account name and `password` = storage account key
`blobNamePrefix`	`String`	No	`events/`	Prefix prepended to every generated blob name. Include a trailing `/` to act as a virtual folder
`batchSize`	`Integer`	No	`100`	Number of records buffered in memory before flushing as a single blob. Minimum: `1`

Authentication

Either connectionString or credentialId must be provided. If both are set, connectionString wins.

Connection String

config:
  connectionString: "DefaultEndpointsProtocol=https;AccountName=mystorage;AccountKey=...;EndpointSuffix=core.windows.net"

Credential

jobContext:
  otherProperties:
    azure-blob-cred:
      username: "mystorage"          # storage account name
      password: "<storage account key>"

The connector builds a connection string at runtime from these two values.

Blob Naming

Every blob the sink writes follows the pattern:

{blobNamePrefix}{yyyy/MM/dd/HH-mm-ss}-{uuid}.jsonl

For example, with blobNamePrefix: "events/":

events/2026/04/27/12-34-56-3f1a8b9d-2c5e-4a8c-9d6e-7b1c2a3d4e5f.jsonl

The timestamp is in UTC, and each line in the blob is one record from the workflow batch.

DAG Example

jobContext:
  otherProperties:
    azure-blob-cred:
      username: "mystorage"
      password: "<storage account key>"
  metricTags: {}
  dlqConfig:

dag:
  - id: "source"
    commandName: "kafkasource"
    config:
      broker: "kafka:9092"
      topic: "raw-events"
      groupId: "azure-archiver"
      encodingType: "JSON_OBJECT"
    outputs:
      - "sink"

  - id: "sink"
    commandName: "azureblobsink"
    config:
      containerName: "workflow-output"
      blobNamePrefix: "archive/orders/"
      credentialId: "azure-blob-cred"
      batchSize: 500

Tuning Batch Size

batchSize is a tradeoff between write frequency and memory:

Larger batches reduce blob writes and produce fewer, larger objects (cheaper to scan downstream).
Smaller batches lower memory usage and reduce time-to-availability of records in storage.

The default of 100 is conservative; for high-throughput pipelines 1000–10000 is typical.

azureblobsource: Read blobs from an Azure Blob Storage container
gcssink: Write records to a Google Cloud Storage bucket
s3sink: Write records to AWS S3

Key Features​

Configuration​

Authentication​

Connection String​

Credential​

Blob Naming​

DAG Example​

Tuning Batch Size​

Related Nodes​