S3 Output Node
The S3 Sink node writes workflow output data to AWS S3 via a configured data asset, with optional batching for high-volume streams.
The S3 Output Node stores workflow output data in Amazon S3. It uses a data asset to encapsulate the AWS connection details, making it ideal for data archival, backup, or integration with other AWS services.
Quick Reference
S3 Folder Select an existing S3 folder (asset) or create a new one.
Enable Batching
Toggle to enable batching of records before writing to S3. Default: true.
Batch Size
Number of records to accumulate before writing. Default: 10,000. Only available when batching is enabled.
Flush Interval (ms)
Maximum time in milliseconds to wait before forcing a write. Default: 10,000. Only available when batching is enabled.
Configuration
| Field | Description | Required | Placeholder |
|---|---|---|---|
| S3 Folder | Select an existing S3 folder data asset or create a new one. The data asset encapsulates the AWS connection, region, bucket, key path, and file format. | Yes | N/A |
| Enable Batching | Toggle to enable batching of records before writing. | No | true |
| Batch Size | Number of records to accumulate before writing (only when batching is enabled). Minimum: 1. | No | 10,000 |
| Flush Interval (ms) | Maximum time (ms) to wait before forcing a write (only when batching is enabled). | No | 10,000 |
Batching
When batching is disabled, each record is written to S3 individually. This is simple but can result in a high number of S3 API calls.
When batching is enabled, records accumulate in memory and are flushed to S3 when either condition is met first:
- The number of accumulated records reaches the Batch Size (default: 10,000)
- The Flush Interval timer expires (default: 10,000 ms)
This reduces the number of write operations and is recommended for high-volume streams.
File Formats
The output file format is configured on the S3 data asset. Supported formats:
- CSV (
.csv) - JSON Object (
.json) — does not support batching - JSON Array (
.json) - JSON Lines (
.jsonl) - Parquet (
.parquet) — requires batching enabled and an Avro schema configured on the data asset
Output Path Structure
The S3 key path depends on whether batching is enabled.
Batching mode:
{key}/year=YYYY/month=MM/day=DD/{uuid}.{ext}
Files are organized into date-partitioned folders using UTC timestamps. Each batch is written with a unique UUID filename.
Without batching:
{key}/{epochMillis}.{ext}
Each record is written individually with a millisecond-precision timestamp filename.
Usage Tips
- Ensure the data asset's AWS credentials have appropriate permissions to write to the target S3 bucket
- Enable batching for high-volume streams to reduce the number of S3 write operations
- Parquet format requires batching to be enabled and an Avro schema on the data asset
- JSON Object format is incompatible with batching