S3 Output Node
The S3 Sink node writes workflow output data to AWS S3 via a configured data asset, batching records before each write for efficient, high-volume streaming.
The S3 Output Node stores workflow output data in Amazon S3. It uses a data asset to encapsulate the AWS connection details, making it ideal for data archival, backup, or integration with other AWS services.
Quick Reference
S3 Folder Select an existing S3 folder (asset) or create a new one.
Batch Size
Number of records to accumulate before writing. Default: 10,000.
Flush Interval (ms)
Maximum time in milliseconds to wait before forcing a write. Default: 10,000.
Configuration
| Field | Description | Required | Placeholder |
|---|---|---|---|
| S3 Folder | Select an existing S3 folder data asset or create a new one. The data asset encapsulates the AWS connection, region, bucket, key path, and file format. | Yes | N/A |
| Batch Size | Number of records to accumulate before writing. Minimum: 1. | No | 10,000 |
| Flush Interval (ms) | Maximum time (ms) to wait before forcing a write. | No | 10,000 |
Batching
Records are always batched before being written to S3. They accumulate in memory and are flushed to S3 when either condition is met first:
- The number of accumulated records reaches the Batch Size (default: 10,000)
- The Flush Interval timer expires (default: 10,000 ms)
This reduces the number of write operations and keeps the node efficient for high-volume streams.
File Formats
The output file format is configured on the S3 data asset. Supported formats:
- CSV (
.csv) - JSON Object (
.json) — does not support batching - JSON Array (
.json) - JSON Lines (
.jsonl) - Parquet (
.parquet) — requires an Avro schema configured on the data asset
Output Path Structure
Batches are written using the following S3 key path:
{key}/year=YYYY/month=MM/day=DD/{uuid}.{ext}
Files are organized into date-partitioned folders using UTC timestamps. Each batch is written with a unique UUID filename.
Usage Tips
- Ensure the data asset's AWS credentials have appropriate permissions to write to the target S3 bucket
- Tune Batch Size and Flush Interval to balance write frequency against latency for your stream volume
- Parquet format requires an Avro schema on the data asset
- JSON Object format is incompatible with batching