GCS Source Node

The gcssource node reads objects from a Google Cloud Storage bucket and emits each object's decoded content as records in the pipeline. It is a batch source — it lists every matching blob in the bucket, downloads each one, and terminates when all blobs are consumed.

Typical use cases include replaying archived events stored as .jsonl exports, ingesting CSV reports dropped into a bucket by another system, or one-off backfills from cold storage.

Key Features

Prefix filtering: only blobs whose name starts with objectPrefix are read
Multiple encodings: each blob is decoded with the configured encodingType (CSV, JSON, NDJSON, XML, Parquet, plain text)
Credential modes: service account JSON keyfile, OAuth access token, or Application Default Credentials
Batch source: runs the listing once and terminates when all blobs are consumed

Configuration

Field	Type	Required	Default	Description
`bucketName`	`String`	Yes	—	GCS bucket to read from
`objectPrefix`	`String`	No	—	Prefix used to filter blobs. Leave empty to read every object
`encodingType`	`String`	Yes	—	Format used to decode each object's bytes. Supported values: `CSV`, `JSON_OBJECT`, `JSON_ARRAY`, `JSON_OBJECT_LINE`, `STRING_LINE`, `TEXT`, `XML`, `PARQUET`
`credentialId`	`String`	No	—	ID of a `GcpCredential` in `jobContext.otherProperties`. Omit to use Application Default Credentials

Encoding Types

The encodingType controls how each downloaded blob's bytes are turned into pipeline records. See encoding for the full reference.

Encoding	Behavior
`JSON_OBJECT_LINE`	One JSON object per line — most common for archived event streams
`JSON_ARRAY`	A JSON array; each element becomes one record
`JSON_OBJECT`	One JSON object per blob
`CSV`	Comma-separated rows, first row treated as header
`STRING_LINE` / `TEXT`	One record per line of text
`XML`	Parsed XML element
`PARQUET`	Apache Parquet file

DAG Example

jobContext:
  otherProperties:
    gcs-cred:
      authType: SERVICE_ACCOUNT_JSON_KEYFILE
      jsonKeyContent: |
        {
          "type": "service_account",
          "project_id": "my-project",
          "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
          "client_email": "reader@my-project.iam.gserviceaccount.com"
        }
      projectId: "my-project"
  metricTags: {}
  dlqConfig:

dag:
  - id: "source"
    commandName: "gcssource"
    config:
      bucketName: "my-data-lake"
      objectPrefix: "events/2026/04/"
      encodingType: "JSON_OBJECT_LINE"
      credentialId: "gcs-cred"
    outputs:
      - "sink"

  - id: "sink"
    commandName: "stdout"
    config:
      encodingType: "JSON_OBJECT"

Credentials

credentialId resolves a GcpCredential from jobContext.otherProperties. The credential's authType determines how authentication is performed:

`authType`	Required fields
`SERVICE_ACCOUNT_JSON_KEYFILE`	`jsonKeyContent` (full keyfile JSON), `projectId` (optional)
`ACCESS_TOKEN`	`accessToken` (short-lived `ya29.*` token), `projectId` (optional)
`APPLICATION_DEFAULT`	none — relies on the runtime's ADC chain

If credentialId is omitted, the source uses Application Default Credentials.

jobContext:
  otherProperties:
    gcs-cred:
      authType: ACCESS_TOKEN
      accessToken: "ya29.c.b0AXv..."
      projectId: "my-project"

The credential principal needs at least the Storage Object Viewer role on the bucket.

gcssink: Write records back to a Google Cloud Storage bucket
s3sink: Write records to AWS S3
kafkasource: Stream-based source for Kafka topics

Key Features​

Configuration​

Encoding Types​

DAG Example​

Credentials​

Related Nodes​

Key Features

Configuration

Encoding Types

DAG Example

Credentials

Related Nodes