Skip to main content

S3 Output Node

The S3 Sink node writes workflow output data to AWS S3 via a configured data asset, with optional batching for high-volume streams.

The S3 Output Node stores workflow output data in Amazon S3. It uses a data asset to encapsulate the AWS connection details, making it ideal for data archival, backup, or integration with other AWS services.

Quick Reference

S3 Folder Select an existing S3 folder (asset) or create a new one.

Enable Batching Toggle to enable batching of records before writing to S3. Default: true.

Batch Size Number of records to accumulate before writing. Default: 10,000. Only available when batching is enabled.

Flush Interval (ms) Maximum time in milliseconds to wait before forcing a write. Default: 10,000. Only available when batching is enabled.

Configuration

FieldDescriptionRequiredPlaceholder
S3 FolderSelect an existing S3 folder data asset or create a new one. The data asset encapsulates the AWS connection, region, bucket, key path, and file format.YesN/A
Enable BatchingToggle to enable batching of records before writing.Notrue
Batch SizeNumber of records to accumulate before writing (only when batching is enabled). Minimum: 1.No10,000
Flush Interval (ms)Maximum time (ms) to wait before forcing a write (only when batching is enabled).No10,000

Batching

When batching is disabled, each record is written to S3 individually. This is simple but can result in a high number of S3 API calls.

When batching is enabled, records accumulate in memory and are flushed to S3 when either condition is met first:

  • The number of accumulated records reaches the Batch Size (default: 10,000)
  • The Flush Interval timer expires (default: 10,000 ms)

This reduces the number of write operations and is recommended for high-volume streams.

File Formats

The output file format is configured on the S3 data asset. Supported formats:

  • CSV (.csv)
  • JSON Object (.json) — does not support batching
  • JSON Array (.json)
  • JSON Lines (.jsonl)
  • Parquet (.parquet) — requires batching enabled and an Avro schema configured on the data asset

Output Path Structure

The S3 key path depends on whether batching is enabled.

Batching mode:

{key}/year=YYYY/month=MM/day=DD/{uuid}.{ext}

Files are organized into date-partitioned folders using UTC timestamps. Each batch is written with a unique UUID filename.

Without batching:

{key}/{epochMillis}.{ext}

Each record is written individually with a millisecond-precision timestamp filename.

Usage Tips

  • Ensure the data asset's AWS credentials have appropriate permissions to write to the target S3 bucket
  • Enable batching for high-volume streams to reduce the number of S3 write operations
  • Parquet format requires batching to be enabled and an Avro schema on the data asset
  • JSON Object format is incompatible with batching