Skip to main content

Introduction to Grid

Grid is Fleak's Kubernetes-based distributed job execution platform that orchestrates and executes data pipeline workflows at scale. It provides a robust, fault-tolerant infrastructure for running DAG-configured jobs with comprehensive monitoring and debugging capabilities.

Overview

Grid serves as the compute backbone for Fleak's data platform, accepting job submissions in the form of DAG (Directed Acyclic Graph) configurations and distributing execution across a cluster of worker nodes. Whether you're running Grid on-premises or using Fleak's managed cloud platform, you get enterprise-grade orchestration with built-in observability.

Architecture

Grid operates as a Kubernetes-native application with three core components:

JobMaster

The central orchestration service responsible for:

  • Accepting job submissions via REST API
  • Managing job lifecycle (scheduling, monitoring, completion)
  • Task distribution and load balancing across workers
  • State management and coordination
  • Exposing monitoring APIs and UI

Worker Agents

Distributed compute nodes that:

  • Execute individual tasks from jobs
  • Report health and metrics to JobMaster
  • Stream logs to centralized logging infrastructure
  • Handle task retries and failure recovery
  • Support horizontal scaling based on workload

Monitoring Stack

Integrated observability components:

  • VictoriaLogs - Centralized log aggregation and search
  • InfluxDB v2 - Time-series metrics storage
  • Vector - High-performance log pipeline
  • Grafana - Unified visualization dashboard

Key Features

Distributed Execution

  • Automatic task parallelization across worker nodes
  • Dynamic load balancing based on worker capacity
  • Horizontal scaling with Kubernetes
  • Resource isolation and limits per task

Fault Tolerance

  • Automatic task retry with configurable backoff
  • Worker failure detection and task rescheduling
  • Job recovery from checkpoint state
  • Dead letter queue for persistent failures

Observability

  • Real-time log streaming from running tasks
  • Time-series metrics for performance monitoring
  • Task-level debugging with DAG visualization
  • Cluster health monitoring and alerting

Job Management

  • RESTful API for job submission and control
  • Job status tracking with task counters
  • Kill and cancel operations
  • External ID mapping for system integration

Deployment Models

On-Premises Deployment

Grid can be deployed in your Kubernetes cluster using the provided Helm chart. This deployment includes:

  • Full monitoring stack (VictoriaLogs, InfluxDB, Grafana)
  • Integrated Svelte-based monitoring UI
  • Configurable resource limits and scaling policies
  • Persistent storage for logs and metrics

Managed Cloud Platform

When using Fleak's managed platform, Grid deployment is fully automated:

  • Deploy workflows directly from Fleak's web UI
  • Infrastructure provisioning handled automatically
  • Integrated monitoring in the workflow builder
  • No Kubernetes expertise required

Use Cases

Grid is designed for:

  • Stream Processing - Continuous data transformation from Kafka and other streaming sources
  • Batch Processing - Large-scale ETL jobs with complex transformation logic
  • Data Integration - Moving and transforming data between systems
  • Real-time Analytics - Low-latency processing pipelines with immediate insights
  • Data Quality - Validation, cleansing, and enrichment workflows

Getting Started

For On-Premises Users

  1. Deploy Grid to your Kubernetes cluster using the Helm chart
  2. Configure monitoring stack credentials
  3. Access the JobMaster UI for job submission
  4. Submit your first DAG configuration via API or UI

For Managed Platform Users

  1. Sign up for Fleak's cloud platform
  2. Build your workflow in the visual DAG builder
  3. Deploy with a single click
  4. Monitor execution through the integrated dashboard

Next Steps

info

For additional help with Grid deployment or usage, contact us at support@fleak.ai