Delta Sink Node

warning

When using S3 as the cloud storage backend, Delta Lake does not support distributed writing. Deploying a data pipeline job with multiple replicas writing data in Delta Lake format to S3 will likely cause data corruption due to the lack of a distributed lock implementation. For S3 deployments, ensure your pipeline runs with a single replica.

Quick Reference

Use Credentials Credentials used to authenticate with your Delta Lake storage. ex: AWS Prod Credentials

Delta Lake Table Path The full path to the Delta Lake table you want to write to. ex: s3://my-bucket/data/events_delta

Select a Saved Table Choose from previously saved Delta Lake table paths. ex: UserEventsProdDeltaTable

Partition Columns Columns used to partition the Delta Lake table for optimized reads and writes. ex: event_date

Hadoop Configuration Custom Hadoop properties applied during the Delta Lake write. ex: fs.s3a.endpoint = s3.eu-west-1.amazonaws.com

Overview

The Delta Lake Sink Node enables you to write processed workflow data directly into a physical storage location in the Delta Lake table format.

Configuration

Field Name	Description	Required?	Default
Use Credentials	Select a stored credential object (e.g., AWS Keys, GCP Service Account) to authenticate with the storage provider.	Yes (for Cloud)	None
Delta Lake Table Path	The full URI to the Delta table folder. Supported schemes include `s3a://`, `gs://`, `abfs://`, `hdfs://`, and `file://`.	Yes	N/A
Batch Size	The number of records to accumulate before writing a transaction commit. Max recommended is 10,000.	No	1000
Partition Columns	A list of column names used to partition the data physically on storage. These columns must exist in the target table's schema.	No	None
Hadoop Configuration	Advanced Key-Value pairs to override underlying Hadoop file system settings.	No	None

Important Prerequisites

Pre-existing Tables Only: This node does not create new Delta tables automatically. The table must already exist at the specified Delta Lake Table Path with a defined schema. If the table is not found during initialization, the workflow will fail.

One can create new or import existing delta tables in the Data Assets section.

Schema Validation: Incoming data is strictly validated against the existing Delta table's schema.

Storage & Authentication

The node supports various storage backends. You must provide the correct credential type for your chosen storage path.

Storage Provider	Path Scheme	Required Credential Type	Note
AWS S3	`s3://` or `s3a://`	Username/Password	`Username` = Access Key ID `Password` = Secret Access Key
Google Cloud (GCS)	`gs://`	GCP Credential	Supports Service Account JSON Keyfile or Access Token.
Azure Blob (ADLS)	`abfs://`	API Key	The API Key is the Storage Account Key. The system attempts to extract the account name from the URL.
HDFS	`hdfs://`	None	Configure authentication via the Hadoop Configuration section directly.

When to use the Delta Lake Sink

Choose Databricks Sink if you are building business-critical tables that analysts query immediately, and you want the safety and ease of Databricks Unity Catalog governance.
Choose Delta Lake Sink if you are building a raw data landing zone, are sensitive to compute costs, or need to write data to storage that isn't strictly coupled to a Databricks workspace.

Quick Reference​

Overview​

Configuration​

Important Prerequisites​

Storage & Authentication​

When to use the Delta Lake Sink​