Introduction to Grid

Grid is Fleak's Kubernetes-based distributed job execution platform that orchestrates and executes data pipeline workflows at scale. It provides a robust, fault-tolerant infrastructure for running DAG-configured jobs with comprehensive monitoring and debugging capabilities.

Overview

Grid serves as the compute backbone for Fleak's data platform, accepting job submissions in the form of DAG (Directed Acyclic Graph) configurations and distributing execution across a cluster of worker nodes. Whether you're running Grid on-premises or using Fleak's managed cloud platform, you get enterprise-grade orchestration with built-in observability.

Architecture

Grid operates as a Kubernetes-native application with three core components:

JobMaster

The central orchestration service responsible for:

Accepting job submissions via REST API
Managing job lifecycle (scheduling, monitoring, completion)
Task distribution and load balancing across workers
State management and coordination
Exposing monitoring APIs and UI

Worker Agents

Distributed compute nodes that:

Execute individual tasks from jobs
Report health and metrics to JobMaster
Stream logs to centralized logging infrastructure
Handle task retries and failure recovery
Support horizontal scaling based on workload

Monitoring Stack

Integrated observability components:

VictoriaLogs - Centralized log aggregation and search
InfluxDB v2 - Time-series metrics storage
Vector - High-performance log pipeline
Grafana - Unified visualization dashboard

Key Features

Distributed Execution

Automatic task parallelization across worker nodes
Dynamic load balancing based on worker capacity
Horizontal scaling with Kubernetes
Resource isolation and limits per task

Fault Tolerance

Automatic task retry with configurable backoff
Worker failure detection and task rescheduling
Job recovery from checkpoint state
Dead letter queue for persistent failures

Observability

Real-time log streaming from running tasks
Time-series metrics for performance monitoring
Task-level debugging with DAG visualization
Cluster health monitoring and alerting

Job Management

RESTful API for job submission and control
Job status tracking with task counters
Kill and cancel operations
External ID mapping for system integration

Deployment Models

On-Premises Deployment

Grid can be deployed in your Kubernetes cluster using the provided Helm chart. This deployment includes:

Full monitoring stack (VictoriaLogs, InfluxDB, Grafana)
Integrated Svelte-based monitoring UI
Configurable resource limits and scaling policies
Persistent storage for logs and metrics

Managed Cloud Platform

When using Fleak's managed platform, Grid deployment is fully automated:

Deploy workflows directly from Fleak's web UI
Infrastructure provisioning handled automatically
Integrated monitoring in the workflow builder
No Kubernetes expertise required

Use Cases

Grid is designed for:

Stream Processing - Continuous data transformation from Kafka and other streaming sources
Batch Processing - Large-scale ETL jobs with complex transformation logic
Data Integration - Moving and transforming data between systems
Real-time Analytics - Low-latency processing pipelines with immediate insights
Data Quality - Validation, cleansing, and enrichment workflows

Getting Started

For On-Premises Users

Deploy Grid to your Kubernetes cluster using the Helm chart
Configure monitoring stack credentials
Access the JobMaster UI for job submission
Submit your first DAG configuration via API or UI

For Managed Platform Users

Sign up for Fleak's cloud platform
Build your workflow in the visual DAG builder
Deploy with a single click
Monitor execution through the integrated dashboard

Next Steps

Learn how to Submit and Manage Jobs using the JobMaster UI
Explore Monitoring and Debugging capabilities for observability

info

For additional help with Grid deployment or usage, contact us at support@fleak.ai

Overview​

Architecture​

JobMaster​

Worker Agents​

Monitoring Stack​

Key Features​

Distributed Execution​

Fault Tolerance​

Observability​

Job Management​

Deployment Models​

On-Premises Deployment​

Managed Cloud Platform​

Use Cases​

Getting Started​

For On-Premises Users​

For Managed Platform Users​

Next Steps​