Introduction to Grid
Grid is Fleak's Kubernetes-based distributed job execution platform that orchestrates and executes data pipeline workflows at scale. It provides a robust, fault-tolerant infrastructure for running DAG-configured jobs with comprehensive monitoring and debugging capabilities.
Overview
Grid serves as the compute backbone for Fleak's data platform, accepting job submissions in the form of DAG (Directed Acyclic Graph) configurations and distributing execution across a cluster of worker nodes. Whether you're running Grid on-premises or using Fleak's managed cloud platform, you get enterprise-grade orchestration with built-in observability.
Architecture
Grid operates as a Kubernetes-native application with three core components:
JobMaster
The central orchestration service responsible for:
- Accepting job submissions via REST API
- Managing job lifecycle (scheduling, monitoring, completion)
- Task distribution and load balancing across workers
- State management and coordination
- Exposing monitoring APIs and UI
Worker Agents
Distributed compute nodes that:
- Execute individual tasks from jobs
- Report health and metrics to JobMaster
- Stream logs to centralized logging infrastructure
- Handle task retries and failure recovery
- Support horizontal scaling based on workload
Monitoring Stack
Integrated observability components:
- VictoriaLogs - Centralized log aggregation and search
- InfluxDB v2 - Time-series metrics storage
- Vector - High-performance log pipeline
- Grafana - Unified visualization dashboard
Key Features
Distributed Execution
- Automatic task parallelization across worker nodes
- Dynamic load balancing based on worker capacity
- Horizontal scaling with Kubernetes
- Resource isolation and limits per task
Fault Tolerance
- Automatic task retry with configurable backoff
- Worker failure detection and task rescheduling
- Job recovery from checkpoint state
- Dead letter queue for persistent failures
Observability
- Real-time log streaming from running tasks
- Time-series metrics for performance monitoring
- Task-level debugging with DAG visualization
- Cluster health monitoring and alerting
Job Management
- RESTful API for job submission and control
- Job status tracking with task counters
- Kill and cancel operations
- External ID mapping for system integration
Deployment Models
On-Premises Deployment
Grid can be deployed in your Kubernetes cluster using the provided Helm chart. This deployment includes:
- Full monitoring stack (VictoriaLogs, InfluxDB, Grafana)
- Integrated Svelte-based monitoring UI
- Configurable resource limits and scaling policies
- Persistent storage for logs and metrics
Managed Cloud Platform
When using Fleak's managed platform, Grid deployment is fully automated:
- Deploy workflows directly from Fleak's web UI
- Infrastructure provisioning handled automatically
- Integrated monitoring in the workflow builder
- No Kubernetes expertise required
Use Cases
Grid is designed for:
- Stream Processing - Continuous data transformation from Kafka and other streaming sources
- Batch Processing - Large-scale ETL jobs with complex transformation logic
- Data Integration - Moving and transforming data between systems
- Real-time Analytics - Low-latency processing pipelines with immediate insights
- Data Quality - Validation, cleansing, and enrichment workflows
Getting Started
For On-Premises Users
- Deploy Grid to your Kubernetes cluster using the Helm chart
- Configure monitoring stack credentials
- Access the JobMaster UI for job submission
- Submit your first DAG configuration via API or UI
For Managed Platform Users
- Sign up for Fleak's cloud platform
- Build your workflow in the visual DAG builder
- Deploy with a single click
- Monitor execution through the integrated dashboard
Next Steps
- Learn how to Submit and Manage Jobs using the JobMaster UI
- Explore Monitoring and Debugging capabilities for observability
For additional help with Grid deployment or usage, contact us at support@fleak.ai