Azure Blob Storage Source Node

The azureblobsource node reads blobs from an Azure Blob Storage container and emits each blob's decoded content as records in the pipeline. It is a batch source — it lists every matching blob in the container, downloads each one, and terminates when all blobs are consumed.

Typical use cases include replaying events archived to Azure cold storage, ingesting CSV reports dropped into a container by another system, and one-off backfills.

Key Features

Prefix filtering: only blobs whose name starts with blobPrefix are read
Multiple encodings: each blob is decoded with the configured encodingType (CSV, JSON, NDJSON, XML, Parquet, plain text)
Dual-mode auth: authenticate with either a full Azure Storage connection string or a credential containing the storage account name and key
Batch source: runs the listing once and terminates when all blobs are consumed

Configuration

Field	Type	Required	Default	Description
`containerName`	`String`	Yes	—	Azure Blob Storage container to read from
`connectionString`	`String`	One of `connectionString` or `credentialId` is required	—	Full Azure Storage connection string. When set, takes priority over `credentialId`
`credentialId`	`String`	One of `connectionString` or `credentialId` is required	—	ID of a `UsernamePasswordCredential` in `jobContext.otherProperties` where `username` = storage account name and `password` = storage account key
`blobPrefix`	`String`	No	—	Prefix used to filter blobs. Leave empty to read every blob
`encodingType`	`String`	Yes	—	Format used to decode each blob's bytes. Supported values: `CSV`, `JSON_OBJECT`, `JSON_ARRAY`, `JSON_OBJECT_LINE`, `STRING_LINE`, `TEXT`, `XML`, `PARQUET`

Authentication

Either connectionString or credentialId must be provided. If both are set, connectionString wins.

Connection String

config:
  connectionString: "DefaultEndpointsProtocol=https;AccountName=mystorage;AccountKey=...;EndpointSuffix=core.windows.net"

Credential

jobContext:
  otherProperties:
    azure-blob-cred:
      username: "mystorage"          # storage account name
      password: "<storage account key>"

The connector builds a connection string at runtime from these two values.

Encoding Types

The encodingType controls how each downloaded blob's bytes are turned into pipeline records. See encoding for the full reference.

Encoding	Behavior
`JSON_OBJECT_LINE`	One JSON object per line — most common for archived event streams
`JSON_ARRAY`	A JSON array; each element becomes one record
`JSON_OBJECT`	One JSON object per blob
`CSV`	Comma-separated rows, first row treated as header
`STRING_LINE` / `TEXT`	One record per line of text
`XML`	Parsed XML element
`PARQUET`	Apache Parquet file

DAG Example

jobContext:
  otherProperties:
    azure-blob-cred:
      username: "mystorage"
      password: "<storage account key>"
  metricTags: {}
  dlqConfig:

dag:
  - id: "source"
    commandName: "azureblobsource"
    config:
      containerName: "workflow-input"
      blobPrefix: "logs/2026/04/"
      encodingType: "JSON_OBJECT_LINE"
      credentialId: "azure-blob-cred"
    outputs:
      - "sink"

  - id: "sink"
    commandName: "stdout"
    config:
      encodingType: "JSON_OBJECT"

azureblobsink: Write records back to an Azure Blob Storage container
gcssource: Read objects from a Google Cloud Storage bucket
s3sink: Write records to AWS S3

Key Features​

Configuration​

Authentication​

Connection String​

Credential​

Encoding Types​

DAG Example​

Related Nodes​