Delta Sink Node
When using S3 as the cloud storage backend, Delta Lake does not support distributed writing. Deploying a data pipeline job with multiple replicas writing data in Delta Lake format to S3 will likely cause data corruption due to the lack of a distributed lock implementation. For S3 deployments, ensure your pipeline runs with a single replica.
Quick Reference
Select Table
Select an existing Delta Lake table data asset or create a new one. The data asset encapsulates the storage credentials and table path.
ex: UserEventsProdDeltaTable
Partition Columns
Columns used to partition the Delta Lake table for optimized reads and writes.
ex: event_date
Hadoop Configuration
Custom Hadoop properties applied during the Delta Lake write.
ex: fs.s3a.endpoint = s3.eu-west-1.amazonaws.com
Overview
The Delta Lake Sink Node enables you to write processed workflow data directly into a physical storage location in the Delta Lake table format.
Configuration
| Field Name | Description | Required? | Default |
|---|---|---|---|
| Select Table | Select an existing Delta Lake table data asset or create a new one. The data asset encapsulates the storage credentials and table path. | Yes | N/A |
| Partition Columns | A list of column names used to partition the data physically on storage. These columns must match the partition columns already defined on the target Delta table. | No | None |
| Hadoop Configuration | Advanced Key-Value pairs to override underlying Hadoop file system settings. Do not place cloud storage credentials (access keys, secret keys, etc.) here — use the data asset's integration credentials instead. | No | None |
Important Prerequisites
Pre-existing Tables Only: This node does not create new Delta tables automatically. The table must already exist at the path configured on the selected data asset, with a defined schema. If the table is not found during initialization, the workflow will fail.
You can create new or import existing Delta tables in the Data Assets section. When creating a new Delta Table data asset, you must provide an Avro schema defining the table structure. When importing an existing table, the schema is auto-populated.
Schema Validation: Incoming data is strictly validated against the existing Delta table's schema.
Write Behavior
- Append-only: All writes are append operations. Overwrite and merge modes are not supported.
- Buffered writes: Data is buffered and flushed periodically. The batch size and flush interval are system-managed and not user-configurable.
Storage & Authentication
When creating a Delta Lake table data asset, the following storage backend is available in the UI:
| Storage Provider | Path Scheme | Required Credential Type | Note |
|---|---|---|---|
| AWS S3 | s3a:// | Username/Password | Username = Access Key IDPassword = Secret Access Key |
When to use the Delta Lake Sink
- Choose Databricks Sink if you are building business-critical tables that analysts query immediately, and you want the safety and ease of Databricks Unity Catalog governance.
- Choose Delta Lake Sink if you are building a raw data landing zone, are sensitive to compute costs, or need to write data to storage that isn't strictly coupled to a Databricks workspace.