Azure Blob Storage Sink Node
The azureblobsink node writes pipeline records to an Azure Blob Storage container as newline-delimited JSON (.jsonl) blobs. It batches records in memory and uploads each batch as a single blob, organized into date-partitioned virtual folders.
Typical use cases include archiving streaming events to Azure cold storage, producing date-partitioned data lake landings consumed by Synapse or Databricks, and regulatory retention of processed records.
Key Features
- Batched NDJSON writes: records accumulate until the batch reaches
batchSize, then are uploaded as a single.jsonlblob - Date-partitioned naming: every blob name embeds a UTC
yyyy/MM/dd/HH-mm-sspath component plus a UUID, ready for tools that expect Hive-style partitions - Dual-mode auth: authenticate with either a full Azure Storage connection string or a credential containing the storage account name and key
Configuration
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
containerName | String | Yes | — | Azure Blob Storage container to write to |
connectionString | String | One of connectionString or credentialId is required | — | Full Azure Storage connection string. When set, takes priority over credentialId |
credentialId | String | One of connectionString or credentialId is required | — | ID of a UsernamePasswordCredential in jobContext.otherProperties where username = storage account name and password = storage account key |
blobNamePrefix | String | No | events/ | Prefix prepended to every generated blob name. Include a trailing / to act as a virtual folder |
batchSize | Integer | No | 100 | Number of records buffered in memory before flushing as a single blob. Minimum: 1 |
Authentication
Either connectionString or credentialId must be provided. If both are set, connectionString wins.
Connection String
config:
connectionString: "DefaultEndpointsProtocol=https;AccountName=mystorage;AccountKey=...;EndpointSuffix=core.windows.net"
Credential
jobContext:
otherProperties:
azure-blob-cred:
username: "mystorage" # storage account name
password: "<storage account key>"
The connector builds a connection string at runtime from these two values.
Blob Naming
Every blob the sink writes follows the pattern:
{blobNamePrefix}{yyyy/MM/dd/HH-mm-ss}-{uuid}.jsonl
For example, with blobNamePrefix: "events/":
events/2026/04/27/12-34-56-3f1a8b9d-2c5e-4a8c-9d6e-7b1c2a3d4e5f.jsonl
The timestamp is in UTC, and each line in the blob is one record from the workflow batch.
DAG Example
jobContext:
otherProperties:
azure-blob-cred:
username: "mystorage"
password: "<storage account key>"
metricTags: {}
dlqConfig:
dag:
- id: "source"
commandName: "kafkasource"
config:
broker: "kafka:9092"
topic: "raw-events"
groupId: "azure-archiver"
encodingType: "JSON_OBJECT"
outputs:
- "sink"
- id: "sink"
commandName: "azureblobsink"
config:
containerName: "workflow-output"
blobNamePrefix: "archive/orders/"
credentialId: "azure-blob-cred"
batchSize: 500
Tuning Batch Size
batchSize is a tradeoff between write frequency and memory:
- Larger batches reduce blob writes and produce fewer, larger objects (cheaper to scan downstream).
- Smaller batches lower memory usage and reduce time-to-availability of records in storage.
The default of 100 is conservative; for high-throughput pipelines 1000–10000 is typical.
Related Nodes
- azureblobsource: Read blobs from an Azure Blob Storage container
- gcssink: Write records to a Google Cloud Storage bucket
- s3sink: Write records to AWS S3