Fleak SQL Language Reference
Fleak SQL is a PostgreSQL-compatible SQL dialect with extended support for JSON operations, log parsing, and nested data structures in ZephFlow transformation pipelines.
Introduction
Fleak SQL is a SQL dialect that follows Postgres SQL standards while providing enhanced functionality for handling nested data structures, JSON operations, and log parsing. This reference guide covers the syntax, data types, operators, and functions available in Fleak SQL.
Basic Syntax
Fleak SQL supports standard SQL query syntax:
SELECT [DISTINCT [ON (expression_list)]] column1 [AS alias1], column2 [AS alias2], ...
FROM source_table [AS alias]
[WHERE conditions]
[GROUP BY column1, column2, ...]
[ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...]
[LIMIT count]
[;]
Use records as the source table to represent the input data.
Query Components
SELECT Statement
The SELECT statement retrieves data from tables or expressions:
-- Basic selection
SELECT id, timestamp, status FROM records;
-- Select all columns
SELECT * FROM records;
-- Column aliasing
SELECT timestamp AS event_time, message AS event_message FROM records;
-- DISTINCT to remove duplicates
SELECT DISTINCT status FROM records;
-- DISTINCT ON to keep first row per group
SELECT DISTINCT ON (user_id) user_id, timestamp, message
FROM records
ORDER BY user_id, timestamp DESC;
FROM Clause
The FROM clause specifies the data source:
-- Simple table reference
FROM records
-- Table with alias
FROM records AS e
-- Subquery
FROM (SELECT * FROM raw_events WHERE type = 'error') AS filtered_events
WHERE Clause
The WHERE clause filters rows:
-- Basic condition
WHERE status = 'success'
-- Multiple conditions
WHERE status = 'error' AND severity > 3
-- JSON field conditions
WHERE data->'user'->>'role' = 'admin'
-- Pattern matching
WHERE message LIKE '%error%'
-- Range check
WHERE timestamp >= '2023-01-01' AND timestamp <= '2023-01-31'
-- NULL check
WHERE response_code IS NULL
WHERE response_code IS NOT NULL
JOIN Clause
JOIN clauses are not yet supported. To combine data from multiple sources, use comma-separated tables in the FROM clause with conditions in the WHERE clause, or use LATERAL subqueries.
GROUP BY Clause
GROUP BY groups rows with the same values:
-- Group by one column
GROUP BY status
-- Group by multiple columns
GROUP BY region, status
HAVING clauses are not yet supported. To filter grouped results, use a subquery with a WHERE clause on the outer query.
ORDER BY Clause
ORDER BY sorts the result set:
-- Ascending order (default)
ORDER BY timestamp
-- Descending order
ORDER BY timestamp DESC
-- Multiple sort columns
ORDER BY status ASC, timestamp DESC
LIMIT Clause
LIMIT restricts the number of rows returned:
-- Limit to 10 rows
LIMIT 10
Data Types
Fleak SQL supports the following data types:
| Type Name | Description |
|---|---|
| INT | Integer value |
| TEXT | String value |
| DATE | Date value |
| TIME | Time value |
| TIMESTAMP | Timestamp value |
| JSON | JSON object or array |
| NUMERIC | Decimal value |
| DOUBLE PRECISION | Double precision floating point |
| BOOLEAN, BOOL | Boolean value (true/false) |
Type Casting
Values can be cast to different types using the :: operator:
-- Cast to string
SELECT field::text FROM records
-- Cast to integer
SELECT field::int FROM records
SELECT field::integer FROM records
SELECT field::smallint FROM records
-- Cast to numeric types
SELECT field::numeric FROM records
SELECT field::bigint FROM records
-- Cast to boolean
SELECT field::boolean FROM records
SELECT field::bool FROM records
-- Cast to JSON
SELECT field::json FROM records
Operators
Arithmetic Operators
| Operator | Description | Example |
|---|---|---|
+ | Addition | SELECT amount + tax FROM records |
- | Subtraction | SELECT price - discount FROM records |
* | Multiplication | SELECT quantity * price FROM records |
/ | Division | SELECT total / count FROM records |
% | Modulo | SELECT id % 10 FROM records |
^ | Exponent | SELECT 2 ^ 3 FROM records |
Comparison Operators
| Operator | Description | Example |
|---|---|---|
= | Equal | WHERE status = 'success' |
!=, <> | Not equal | WHERE status != 'error' |
< | Less than | WHERE count < 10 |
> | Greater than | WHERE response_time > 100 |
<= | Less than or equal | WHERE count <= 20 |
>= | Greater than or equal | WHERE response_time >= 50 |
Logical Operators
| Operator | Description | Example |
|---|---|---|
AND | Logical AND | WHERE status = 'error' AND severity > 3 |
OR | Logical OR | WHERE status = 'success' OR status = 'pending' |
NOT | Logical NOT | WHERE NOT status = 'error' |
String Operators
| Operator | Description | Example |
|---|---|---|
| ` | ` | |
LIKE | Pattern matching (case-sensitive) | WHERE name LIKE 'A%' |
ILIKE | Pattern matching (case-insensitive) | WHERE name ILIKE 'a%' |
JSON Operators
| Operator | Description | Example |
|---|---|---|
-> | Get JSON object field or array element as JSON | data->'user'->'name' |
->> | Get JSON object field or array element as text | data->>'email' |
Working with JSON Data
Fleak SQL provides powerful operators for handling JSON data structures.
JSON Access Examples
-- Access object field as JSON
SELECT data->'user'->'name' FROM records;
-- Access object field as text
SELECT data->>'email' FROM records;
-- Access array element
SELECT items->0->'price' FROM records;
-- Access array element and convert to text
SELECT items->0->>'name' FROM records;
Special JSON Behavior
Unlike standard Postgres SQL, Fleak SQL provides more flexible type handling with JSON operators:
- Automatic string-to-JSON casting when possible
->is preferred for chained access to preserve JSON types- Use
->>only when you need the final result as a string
For example:
-- This works in Fleak SQL (auto-casting string to JSON)
SELECT modelOutputs->0->'result'->'AAPL' FROM records;
-- Avoid this pattern (may cause errors)
SELECT modelOutputs->>0->'result' FROM records;
Functions
JSON Functions
| Function | Description | Example |
|---|---|---|
json_build_object | Create a JSON object | json_build_object('name', username, 'age', age) |
json_array_length | Get JSON array length | json_array_length(items) |
json_array_elements | Expand JSON array into rows | SELECT * FROM records, json_array_elements(items) item |
json_agg | Aggregate values as JSON array | json_agg(value) |
String Functions
| Function | Description | Example |
|---|---|---|
concat | Concatenate strings | concat(first_name, ' ', last_name) |
concat_ws | Concatenate with separator | concat_ws(', ', first_name, last_name) |
upper | Convert to uppercase | upper(name) |
lower | Convert to lowercase | lower(email) |
length | String length | length(message) |
char_length | Character count | char_length(message) |
substring | Extract substring | substring(text, 1, 10) |
regexp_replace | Replace using regex | regexp_replace(text, 'pattern', 'replacement') |
regexp_split_to_array | Split string to array | regexp_split_to_array(text, ',') |
regexp_split_to_table | Split string to rows | SELECT * FROM records, regexp_split_to_table(tags, ',') |
regexp_matches | Find regex matches | regexp_matches(text, 'pattern') |
split_part | Get part after splitting | split_part(text, ',', 2) |
trim | Remove whitespace | trim(text) |
ltrim | Remove leading whitespace | ltrim(text) |
rtrim | Remove trailing whitespace | rtrim(text) |
replace | Replace substring | replace(text, 'old', 'new') |
left | Get leftmost characters | left(text, 5) |
right | Get rightmost characters | right(text, 5) |
position | Find substring position | position('needle', 'haystack') |
format | Format string | format('Hello, %s!', name) |
md5 | MD5 hash | md5(text) |
btrim | Trim specified characters | btrim(text, 'xyz') |
reverse | Reverse string | reverse(text) |
to_hex | Convert to hex | to_hex(255) |
Mathematical Functions
| Function | Description | Example |
|---|---|---|
abs | Absolute value | abs(value) |
round | Round number | round(value, 2) |
ceil | Round up to nearest integer | ceil(value) |
floor | Round down to nearest integer | floor(value) |
random | Random value between 0 and 1 | random() |
sqrt | Square root | sqrt(16) |
power | Raise to power | power(2, 3) |
mod | Modulo | mod(10, 3) |
trunc | Truncate decimal | trunc(value, 2) |
sign | Sign (-1, 0, 1) | sign(-15) |
gcd | Greatest common divisor | gcd(12, 8) |
lcm | Least common multiple | lcm(12, 8) |
exp | Exponential | exp(1) |
log | Natural logarithm | log(value) |
ln | Natural logarithm | ln(value) |
log10 | Base-10 logarithm | log10(100) |
log2 | Base-2 logarithm | log2(8) |
pi | PI constant | pi() |
degrees | Radians to degrees | degrees(pi()) |
radians | Degrees to radians | radians(180) |
sin | Sine | sin(radians(30)) |
cos | Cosine | cos(radians(60)) |
tan | Tangent | tan(radians(45)) |
Aggregate Functions
| Function | Description | Example |
|---|---|---|
count | Count rows | SELECT count(*) FROM records |
sum | Sum values | SELECT sum(amount) FROM records |
max | Maximum value | SELECT max(timestamp) FROM records |
min | Minimum value | SELECT min(timestamp) FROM records |
string_agg | Concatenate strings | SELECT string_agg(tag, ',') FROM records, json_array_elements(tags) tag |
Important Note: Aggregation functions in Fleak SQL only work with the current input record list. For HTTP requests with multiple records, aggregation works across all records in the request. For Kafka sources, each input record list typically contains only one message.
Set-returning Functions
These functions return multiple rows from a single input:
| Function | Description | Example |
|---|---|---|
json_array_elements | Expand JSON array into rows | SELECT * FROM records, json_array_elements(items) item |
regexp_split_to_table | Split string to rows | SELECT * FROM records, regexp_split_to_table(tags, ',') |
Log Parsing with grok
The grok function provides powerful log parsing capabilities:
SELECT grok('%{IPV4:client_ip} - %{WORD:method} %{URIPATHPARAM:request}', log_entry) AS extracted_fields
FROM records
This extracts structured data from unstructured log text:
"127.0.0.1 - GET /index.html" → { "client_ip": "127.0.0.1", "method": "GET", "request": "/index.html" }
CASE Expressions
CASE expressions provide conditional logic:
-- Simple CASE
SELECT
CASE status
WHEN 'success' THEN 'Completed'
WHEN 'error' THEN 'Failed'
ELSE 'Unknown'
END AS status_text
FROM records;
-- Searched CASE
SELECT
CASE
WHEN response_time < 100 THEN 'Fast'
WHEN response_time < 300 THEN 'Normal'
ELSE 'Slow'
END AS performance
FROM records;
Subqueries and Derived Tables
You can use subqueries to create derived tables:
-- Subquery in SELECT
SELECT id, (SELECT count(*) FROM json_array_elements(items)) AS item_count
FROM records;
-- Subquery in FROM
SELECT e.*
FROM (SELECT status, count(*) AS cnt FROM records GROUP BY status) e
WHERE e.cnt > 1;
-- Subquery in WHERE
SELECT * FROM records
WHERE status IN (SELECT status FROM event_categories WHERE severity > 3);
Best Practices
-
Use type casting to ensure proper data type handling:
SELECT timestamp::timestamp, amount::numeric FROM records -
Prefer
->for JSON access chains, using->>only for the final extraction:-- Good
SELECT data->'user'->'address'->>'city' FROM records
-- Avoid (may cause errors in parsing nested structures)
SELECT data->>'user'->>'address'->>'city' FROM records -
Use set-returning functions for processing arrays and structured data:
SELECT item.value->>'id' AS item_id
FROM records,
json_array_elements(items) item -
Remember aggregation scope - aggregations work on the current input record list only
Limitations and Special Behaviors
-
Case sensitivity: Table names and columns are case-insensitive.
-
Aggregation scope: Aggregation functions only work within the current input record list.
- For HTTP requests with multiple records, aggregation works across all records in the request
- For Kafka sources, each input record list typically contains only one message, limiting aggregation across messages
-
Implicit grouping: Simple queries with aggregate functions like
SELECT count(*) FROM recordsimply an implicit grouping, equivalent toSELECT count(*) FROM records GROUP BY 1. -
JSON handling: The type system is more flexible than standard Postgres SQL and tries to cast types to container elements when using JSON operators.
-
NULL handling: Operations on null values generally propagate null.