Skip to main content

Delta Sink Node

warning

When using S3 as the cloud storage backend, Delta Lake does not support distributed writing. Deploying a data pipeline job with multiple replicas writing data in Delta Lake format to S3 will likely cause data corruption due to the lack of a distributed lock implementation. For S3 deployments, ensure your pipeline runs with a single replica.

Quick Reference

NameDescription
Use CredentialsCredentials used to authenticate with your Delta Lake storage.
ex: AWS Prod Credentials
Delta Lake Table PathThe full path to the Delta Lake table you want to write to.
ex: s3://my-bucket/data/events_delta
Select a Saved TableChoose from previously saved Delta Lake table paths.
ex: UserEventsProdDeltaTable
Partition ColumnsColumns used to partition the Delta Lake table for optimized reads and writes.
ex: event_date
Hadoop ConfigurationCustom Hadoop properties applied during the Delta Lake write.
ex: fs.s3a.endpoint = s3.eu-west-1.amazonaws.com

Overview

The Delta Lake Sink Node enables you to write processed workflow data directly into a physical storage location in the Delta Lake table format.

Configuration

Field NameDescriptionRequired?Default
Use CredentialsSelect a stored credential object (e.g., AWS Keys, GCP Service Account) to authenticate with the storage provider.Yes (for Cloud)None
Delta Lake Table PathThe full URI to the Delta table folder. Supported schemes include s3a://, gs://, abfs://, hdfs://, and file://.YesN/A
Batch SizeThe number of records to accumulate before writing a transaction commit. Max recommended is 10,000.No1000
Partition ColumnsA list of column names used to partition the data physically on storage. These columns must exist in the target table's schema.NoNone
Hadoop ConfigurationAdvanced Key-Value pairs to override underlying Hadoop file system settings.NoNone

Important Prerequisites

Pre-existing Tables Only: This node does not create new Delta tables automatically. The table must already exist at the specified Delta Lake Table Path with a defined schema. If the table is not found during initialization, the workflow will fail.

One can create new or import existing delta tables in the Data Assets section.

Schema Validation: Incoming data is strictly validated against the existing Delta table's schema.

Storage & Authentication

The node supports various storage backends. You must provide the correct credential type for your chosen storage path.

Storage ProviderPath SchemeRequired Credential TypeNote
AWS S3s3:// or s3a://Username/PasswordUsername = Access Key ID
Password = Secret Access Key
Google Cloud (GCS)gs://GCP CredentialSupports Service Account JSON Keyfile or Access Token.
Azure Blob (ADLS)abfs://API KeyThe API Key is the Storage Account Key. The system attempts to extract the account name from the URL.
HDFShdfs://NoneConfigure authentication via the Hadoop Configuration section directly.

When to use the Delta Lake Sink

  • Choose Databricks Sink if you are building business-critical tables that analysts query immediately, and you want the safety and ease of Databricks Unity Catalog governance.
  • Choose Delta Lake Sink if you are building a raw data landing zone, are sensitive to compute costs, or need to write data to storage that isn't strictly coupled to a Databricks workspace.