Skip to main content

Delta Sink Node

warning

When using S3 as the cloud storage backend, Delta Lake does not support distributed writing. Deploying a data pipeline job with multiple replicas writing data in Delta Lake format to S3 will likely cause data corruption due to the lack of a distributed lock implementation. For S3 deployments, ensure your pipeline runs with a single replica.

Quick Reference

Select Table Select an existing Delta Lake table data asset or create a new one. The data asset encapsulates the storage credentials and table path. ex: UserEventsProdDeltaTable

Partition Columns Columns used to partition the Delta Lake table for optimized reads and writes. ex: event_date

Hadoop Configuration Custom Hadoop properties applied during the Delta Lake write. ex: fs.s3a.endpoint = s3.eu-west-1.amazonaws.com

Overview

The Delta Lake Sink Node enables you to write processed workflow data directly into a physical storage location in the Delta Lake table format.

Configuration

Field NameDescriptionRequired?Default
Select TableSelect an existing Delta Lake table data asset or create a new one. The data asset encapsulates the storage credentials and table path.YesN/A
Partition ColumnsA list of column names used to partition the data physically on storage. These columns must match the partition columns already defined on the target Delta table.NoNone
Hadoop ConfigurationAdvanced Key-Value pairs to override underlying Hadoop file system settings. Do not place cloud storage credentials (access keys, secret keys, etc.) here — use the data asset's integration credentials instead.NoNone

Important Prerequisites

Pre-existing Tables Only: This node does not create new Delta tables automatically. The table must already exist at the path configured on the selected data asset, with a defined schema. If the table is not found during initialization, the workflow will fail.

You can create new or import existing Delta tables in the Data Assets section. When creating a new Delta Table data asset, you must provide an Avro schema defining the table structure. When importing an existing table, the schema is auto-populated.

Schema Validation: Incoming data is strictly validated against the existing Delta table's schema.

Write Behavior

  • Append-only: All writes are append operations. Overwrite and merge modes are not supported.
  • Buffered writes: Data is buffered and flushed periodically. The batch size and flush interval are system-managed and not user-configurable.

Storage & Authentication

When creating a Delta Lake table data asset, the following storage backend is available in the UI:

Storage ProviderPath SchemeRequired Credential TypeNote
AWS S3s3a://Username/PasswordUsername = Access Key ID
Password = Secret Access Key

When to use the Delta Lake Sink

  • Choose Databricks Sink if you are building business-critical tables that analysts query immediately, and you want the safety and ease of Databricks Unity Catalog governance.
  • Choose Delta Lake Sink if you are building a raw data landing zone, are sensitive to compute costs, or need to write data to storage that isn't strictly coupled to a Databricks workspace.