Databricks Sink Node
Quick Reference
| Name | Description |
|---|---|
| Table (Asset) | Select or create a table from databricks |
| Select Warehouse | Select the Databricks SQL Warehouse to execute the load command. |
Overview
The Databricks Sink Node allows you to ingest processed data directly into Databricks Unity Catalog tables. Unlike direct file writers, this node leverages the Databricks SQL Compute engine to ensure ACID compliance and governance within the Databricks ecosystem.
How It Works
This node operates in a three-step "Stage and Load" process to maximize reliability and throughput:
- Stage: Data is converted to Parquet format locally.
- Upload: Parquet files are uploaded securely to a Databricks Volume (Unity Catalog managed storage).
- Load: A
COPY INTOSQL command is executed on your designated SQL Warehouse. This command loads the data from the Volume into your target table transactionally.
Configuration
| UI Selection | Description |
|---|---|
| Select Table (Asset) | Choose a target table from the dropdown (e.g., prod.finance.revenue_reports). You can select an existing table or define a new one directly in the UI. You can also create/import tables in the Data Assets section |
| Select Warehouse | Select the Databricks SQL Warehouse to execute the load command. |
Advanced Settings
These settings control the performance and behavior of the ingestion process but do not affect the destination topology.
| Field Name | Description | Default |
|---|---|---|
| Batch Size | Number of records to accumulate before triggering a "Stage and Load" operation. | 10,000 |
| Flush Interval | Maximum time (ms) to wait before forcing a write, ensuring low latency for low-volume streams. | 30,000 (30s) |
| Cleanup After Copy | Automatically deletes the temporary Parquet files from the Databricks Volume after a successful load. | True |
When to use the Databricks Sink
- Choose Databricks Sink if you are building business-critical tables that analysts query immediately, and you want the safety and ease of Databricks Unity Catalog governance.
- Choose Delta Lake Sink if you are building a raw data landing zone, are sensitive to compute costs, or need to write data to storage that isn't strictly coupled to a Databricks workspace.