Skip to main content

Azure Blob Storage Source Node

The azureblobsource node reads blobs from an Azure Blob Storage container and emits each blob's decoded content as records in the pipeline. It is a batch source — it lists every matching blob in the container, downloads each one, and terminates when all blobs are consumed.

Typical use cases include replaying events archived to Azure cold storage, ingesting CSV reports dropped into a container by another system, and one-off backfills.

Key Features

  • Prefix filtering: only blobs whose name starts with blobPrefix are read
  • Multiple encodings: each blob is decoded with the configured encodingType (CSV, JSON, NDJSON, XML, Parquet, plain text)
  • Dual-mode auth: authenticate with either a full Azure Storage connection string or a credential containing the storage account name and key
  • Batch source: runs the listing once and terminates when all blobs are consumed

Configuration

FieldTypeRequiredDefaultDescription
containerNameStringYesAzure Blob Storage container to read from
connectionStringStringOne of connectionString or credentialId is requiredFull Azure Storage connection string. When set, takes priority over credentialId
credentialIdStringOne of connectionString or credentialId is requiredID of a UsernamePasswordCredential in jobContext.otherProperties where username = storage account name and password = storage account key
blobPrefixStringNoPrefix used to filter blobs. Leave empty to read every blob
encodingTypeStringYesFormat used to decode each blob's bytes. Supported values: CSV, JSON_OBJECT, JSON_ARRAY, JSON_OBJECT_LINE, STRING_LINE, TEXT, XML, PARQUET

Authentication

Either connectionString or credentialId must be provided. If both are set, connectionString wins.

Connection String

config:
connectionString: "DefaultEndpointsProtocol=https;AccountName=mystorage;AccountKey=...;EndpointSuffix=core.windows.net"

Credential

jobContext:
otherProperties:
azure-blob-cred:
username: "mystorage" # storage account name
password: "<storage account key>"

The connector builds a connection string at runtime from these two values.

Encoding Types

The encodingType controls how each downloaded blob's bytes are turned into pipeline records. See encoding for the full reference.

EncodingBehavior
JSON_OBJECT_LINEOne JSON object per line — most common for archived event streams
JSON_ARRAYA JSON array; each element becomes one record
JSON_OBJECTOne JSON object per blob
CSVComma-separated rows, first row treated as header
STRING_LINE / TEXTOne record per line of text
XMLParsed XML element
PARQUETApache Parquet file

DAG Example

jobContext:
otherProperties:
azure-blob-cred:
username: "mystorage"
password: "<storage account key>"
metricTags: {}
dlqConfig:

dag:
- id: "source"
commandName: "azureblobsource"
config:
containerName: "workflow-input"
blobPrefix: "logs/2026/04/"
encodingType: "JSON_OBJECT_LINE"
credentialId: "azure-blob-cred"
outputs:
- "sink"

- id: "sink"
commandName: "stdout"
config:
encodingType: "JSON_OBJECT"
  • azureblobsink: Write records back to an Azure Blob Storage container
  • gcssource: Read objects from a Google Cloud Storage bucket
  • s3sink: Write records to AWS S3