Azure Blob Storage Source Node
The azureblobsource node reads blobs from an Azure Blob Storage container and emits each blob's decoded content as records in the pipeline. It is a batch source — it lists every matching blob in the container, downloads each one, and terminates when all blobs are consumed.
Typical use cases include replaying events archived to Azure cold storage, ingesting CSV reports dropped into a container by another system, and one-off backfills.
Key Features
- Prefix filtering: only blobs whose name starts with
blobPrefixare read - Multiple encodings: each blob is decoded with the configured
encodingType(CSV, JSON, NDJSON, XML, Parquet, plain text) - Dual-mode auth: authenticate with either a full Azure Storage connection string or a credential containing the storage account name and key
- Batch source: runs the listing once and terminates when all blobs are consumed
Configuration
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
containerName | String | Yes | — | Azure Blob Storage container to read from |
connectionString | String | One of connectionString or credentialId is required | — | Full Azure Storage connection string. When set, takes priority over credentialId |
credentialId | String | One of connectionString or credentialId is required | — | ID of a UsernamePasswordCredential in jobContext.otherProperties where username = storage account name and password = storage account key |
blobPrefix | String | No | — | Prefix used to filter blobs. Leave empty to read every blob |
encodingType | String | Yes | — | Format used to decode each blob's bytes. Supported values: CSV, JSON_OBJECT, JSON_ARRAY, JSON_OBJECT_LINE, STRING_LINE, TEXT, XML, PARQUET |
Authentication
Either connectionString or credentialId must be provided. If both are set, connectionString wins.
Connection String
config:
connectionString: "DefaultEndpointsProtocol=https;AccountName=mystorage;AccountKey=...;EndpointSuffix=core.windows.net"
Credential
jobContext:
otherProperties:
azure-blob-cred:
username: "mystorage" # storage account name
password: "<storage account key>"
The connector builds a connection string at runtime from these two values.
Encoding Types
The encodingType controls how each downloaded blob's bytes are turned into pipeline records. See encoding for the full reference.
| Encoding | Behavior |
|---|---|
JSON_OBJECT_LINE | One JSON object per line — most common for archived event streams |
JSON_ARRAY | A JSON array; each element becomes one record |
JSON_OBJECT | One JSON object per blob |
CSV | Comma-separated rows, first row treated as header |
STRING_LINE / TEXT | One record per line of text |
XML | Parsed XML element |
PARQUET | Apache Parquet file |
DAG Example
jobContext:
otherProperties:
azure-blob-cred:
username: "mystorage"
password: "<storage account key>"
metricTags: {}
dlqConfig:
dag:
- id: "source"
commandName: "azureblobsource"
config:
containerName: "workflow-input"
blobPrefix: "logs/2026/04/"
encodingType: "JSON_OBJECT_LINE"
credentialId: "azure-blob-cred"
outputs:
- "sink"
- id: "sink"
commandName: "stdout"
config:
encodingType: "JSON_OBJECT"
Related Nodes
- azureblobsink: Write records back to an Azure Blob Storage container
- gcssource: Read objects from a Google Cloud Storage bucket
- s3sink: Write records to AWS S3