Fleak SQL Language Reference
Introduction
Fleak SQL is a SQL dialect that follows Postgres SQL standards while providing enhanced functionality for handling nested data structures, JSON operations, and log parsing. This reference guide covers the syntax, data types, operators, and functions available in Fleak SQL.
Basic Syntax
Fleak SQL supports standard SQL query syntax:
SELECT [DISTINCT [ON (expression_list)]] column1 [AS alias1], column2 [AS alias2], ...
FROM source_table [AS alias]
[JOIN another_table [ON condition]]
[WHERE conditions]
[GROUP BY column1, column2, ...]
[HAVING group_conditions]
[ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...]
[LIMIT count]
[;]
Always use events
as the source_table to represent the input data.
Query Components
SELECT Statement
The SELECT statement retrieves data from tables or expressions:
-- Basic selection
SELECT id, timestamp, status FROM events;
-- Select all columns
SELECT * FROM events;
-- Column aliasing
SELECT timestamp AS event_time, message AS event_message FROM events;
-- DISTINCT to remove duplicates
SELECT DISTINCT status FROM events;
-- DISTINCT ON to keep first row per group
SELECT DISTINCT ON (user_id) user_id, timestamp, message
FROM events
ORDER BY user_id, timestamp DESC;
FROM Clause
The FROM clause specifies the data source:
-- Simple table reference
FROM events
-- Table with alias
FROM events AS e
-- Subquery
FROM (SELECT * FROM raw_events WHERE type = 'error') AS filtered_events
WHERE Clause
The WHERE clause filters rows:
-- Basic condition
WHERE status = 'success'
-- Multiple conditions
WHERE status = 'error' AND severity > 3
-- JSON field conditions
WHERE data->'user'->>'role' = 'admin'
-- Pattern matching
WHERE message LIKE '%error%'
-- Range check
WHERE timestamp BETWEEN '2023-01-01' AND '2023-01-31'
-- NULL check
WHERE response_code IS NULL
WHERE response_code IS NOT NULL
JOIN Clause
Joins combine rows from multiple tables:
-- INNER JOIN
FROM table1 JOIN table2 ON table1.id = table2.table1_id
-- LEFT JOIN
FROM table1 LEFT JOIN table2 ON table1.id = table2.table1_id
-- RIGHT JOIN
FROM table1 RIGHT JOIN table2 ON table1.id = table2.table1_id
-- FULL JOIN
FROM table1 FULL JOIN table2 ON table1.id = table2.table1_id
-- CROSS JOIN
FROM table1 CROSS JOIN table2
GROUP BY and HAVING Clauses
GROUP BY groups rows with the same values:
-- Group by one column
GROUP BY status
-- Group by multiple columns
GROUP BY region, status
-- Filter groups with HAVING
GROUP BY status HAVING COUNT(*) > 10
ORDER BY Clause
ORDER BY sorts the result set:
-- Ascending order (default)
ORDER BY timestamp
-- Descending order
ORDER BY timestamp DESC
-- Multiple sort columns
ORDER BY status ASC, timestamp DESC
LIMIT Clause
LIMIT restricts the number of rows returned:
-- Limit to 10 rows
LIMIT 10
Data Types
Fleak SQL supports the following data types:
Type Name | Description |
---|---|
INT, INTEGER | Integer value |
TEXT, STRING, VARCHAR | String value |
DATE | Date value |
TIME | Time value |
TIMESTAMP | Timestamp value |
JSON | JSON object or array |
NUMERIC | Decimal value |
DOUBLE PRECISION | Double precision floating point |
BOOLEAN, BOOL | Boolean value (true/false) |
Type Casting
Values can be cast to different types using the ::
operator:
-- Cast to string
SELECT field::text FROM events
-- Cast to integer
SELECT field::int FROM events
SELECT field::integer FROM events
SELECT field::smallint FROM events
SELECT field::bigint FROM events
-- Cast to numeric types
SELECT field::numeric FROM events
-- Cast to boolean
SELECT field::boolean FROM events
SELECT field::bool FROM events
-- Cast to JSON
SELECT field::json FROM events
Operators
Arithmetic Operators
Operator | Description | Example |
---|---|---|
+ | Addition | SELECT amount + tax FROM events |
- | Subtraction | SELECT price - discount FROM events |
* | Multiplication | SELECT quantity * price FROM events |
/ | Division | SELECT total / count FROM events |
% | Modulo | SELECT id % 10 FROM events |
^ | Exponent | SELECT 2 ^ 3 FROM events |
Comparison Operators
Operator | Description | Example |
---|---|---|
= | Equal | WHERE status = 'success' |
!= , <> | Not equal | WHERE status != 'error' |
< | Less than | WHERE count < 10 |
> | Greater than | WHERE response_time > 100 |
<= | Less than or equal | WHERE count <= 20 |
>= | Greater than or equal | WHERE response_time >= 50 |
Logical Operators
Operator | Description | Example |
---|---|---|
AND | Logical AND | WHERE status = 'error' AND severity > 3 |
OR | Logical OR | WHERE status = 'success' OR status = 'pending' |
NOT | Logical NOT | WHERE NOT status = 'error' |
String Operators
Operator | Description | Example |
---|---|---|
` | ` | |
LIKE | Pattern matching (case-sensitive) | WHERE name LIKE 'A%' |
ILIKE | Pattern matching (case-insensitive) | WHERE name ILIKE 'a%' |
JSON Operators
Operator | Description | Example |
---|---|---|
-> | Get JSON object field or array element as JSON | data->'user'->'name' |
->> | Get JSON object field or array element as text | data->>'email' |
Working with JSON Data
Fleak SQL provides powerful operators for handling JSON data structures.
JSON Access Examples
-- Access object field as JSON
SELECT data->'user'->'name' FROM events;
-- Access object field as text
SELECT data->>'email' FROM events;
-- Access array element
SELECT items->0->'price' FROM events;
-- Access array element and convert to text
SELECT items->0->>'name' FROM events;
Special JSON Behavior
Unlike standard Postgres SQL, Fleak SQL provides more flexible type handling with JSON operators:
- Automatic string-to-JSON casting when possible
->
is preferred for chained access to preserve JSON types- Use
->>
only when you need the final result as a string
For example:
-- This works in Fleak SQL (auto-casting string to JSON)
SELECT modelOutputs->0->'result'->'AAPL' FROM events;
-- Avoid this pattern (may cause errors)
SELECT modelOutputs->>0->'result' FROM events;
Functions
JSON Functions
Function | Description | Example |
---|---|---|
json_build_object | Create a JSON object | json_build_object('name', username, 'age', age) |
json_array_length | Get JSON array length | json_array_length(items) |
json_array_elements | Expand JSON array into rows | SELECT * FROM events, json_array_elements(items) item |
json_each | Expand JSON object into key/value rows | SELECT * FROM events, json_each(data) kv |
json_each_text | Expand JSON object into key/value rows as text | SELECT * FROM events, json_each_text(data) kv |
json_object_keys | Get JSON object keys as rows | SELECT * FROM events, json_object_keys(data) k |
json_strip_nulls | Remove object fields with null values | json_strip_nulls(data) |
json_agg | Aggregate values as JSON array | json_agg(value) |
String Functions
Function | Description | Example |
---|---|---|
concat | Concatenate strings | concat(first_name, ' ', last_name) |
concat_ws | Concatenate with separator | concat_ws(', ', first_name, last_name) |
upper | Convert to uppercase | upper(name) |
lower | Convert to lowercase | lower(email) |
length | String length | length(message) |
char_length | Character count | char_length(message) |
substring | Extract substring | substring(text, 1, 10) |
regexp_replace | Replace using regex | regexp_replace(text, 'pattern', 'replacement') |
regexp_split_to_array | Split string to array | regexp_split_to_array(text, ',') |
regexp_split_to_table | Split string to rows | SELECT * FROM events, regexp_split_to_table(tags, ',') |
regexp_matches | Find regex matches | regexp_matches(text, 'pattern') |
split_part | Get part after splitting | split_part(text, ',', 2) |
trim | Remove whitespace | trim(text) |
ltrim | Remove leading whitespace | ltrim(text) |
rtrim | Remove trailing whitespace | rtrim(text) |
replace | Replace substring | replace(text, 'old', 'new') |
left | Get leftmost characters | left(text, 5) |
right | Get rightmost characters | right(text, 5) |
position | Find substring position | position('needle' IN 'haystack') |
format | Format string | format('Hello, %s!', name) |
md5 | MD5 hash | md5(text) |
btrim | Trim specified characters | btrim(text, 'xyz') |
reverse | Reverse string | reverse(text) |
to_hex | Convert to hex | to_hex(255) |
Mathematical Functions
Function | Description | Example |
---|---|---|
abs | Absolute value | abs(value) |
round | Round number | round(value, 2) |
ceil , ceiling | Round up to nearest integer | ceil(value) |
floor | Round down to nearest integer | floor(value) |
random | Random value between 0 and 1 | random() |
sqrt | Square root | sqrt(16) |
power | Raise to power | power(2, 3) |
mod | Modulo | mod(10, 3) |
trunc | Truncate decimal | trunc(value, 2) |
sign | Sign (-1, 0, 1) | sign(-15) |
gcd | Greatest common divisor | gcd(12, 8) |
lcm | Least common multiple | lcm(12, 8) |
exp | Exponential | exp(1) |
log | Natural logarithm | log(value) |
ln | Natural logarithm | ln(value) |
log10 | Base-10 logarithm | log10(100) |
log2 | Base-2 logarithm | log2(8) |
pi | PI constant | pi() |
degrees | Radians to degrees | degrees(pi()) |
radians | Degrees to radians | radians(180) |
sin | Sine | sin(radians(30)) |
cos | Cosine | cos(radians(60)) |
tan | Tangent | tan(radians(45)) |
Date and Time Functions
Function | Description | Example |
---|---|---|
now | Current timestamp | now() |
date_trunc | Truncate timestamp | date_trunc('hour', timestamp) |
to_timestamp | Convert to timestamp | to_timestamp(epoch_value) |
to_date | Convert to date | to_date(string, format) |
Aggregate Functions
Function | Description | Example |
---|---|---|
count | Count rows | SELECT count(*) FROM events |
sum | Sum values | SELECT sum(amount) FROM events |
max | Maximum value | SELECT max(timestamp) FROM events |
min | Minimum value | SELECT min(timestamp) FROM events |
avg | Average value | SELECT avg(score) FROM events |
string_agg | Concatenate strings | SELECT string_agg(tag, ',') FROM events, json_array_elements(tags) tag |
Important Note: Aggregation functions in Fleak SQL only work with the current input event list. For HTTP requests with multiple events, aggregation works across all events in the request. For Kafka sources, each input event list typically contains only one message.
Set-returning Functions
These functions return multiple rows from a single input:
Function | Description | Example |
---|---|---|
json_array_elements | Expand JSON array into rows | SELECT * FROM events, json_array_elements(items) item |
regexp_split_to_table | Split string to rows | SELECT * FROM events, regexp_split_to_table(tags, ',') |
json_each | Expand JSON object into key/value rows | SELECT * FROM events, json_each(data) kv |
Other Functions
Function | Description | Example |
---|---|---|
greatest | Largest value from list | greatest(a, b, c) |
least | Smallest value from list | least(a, b, c) |
Log Parsing with grok
The grok
function provides powerful log parsing capabilities:
SELECT grok('%{IPV4:client_ip} - %{WORD:method} %{URIPATHPARAM:request}', log_entry) AS extracted_fields
FROM events
This extracts structured data from unstructured log text:
"127.0.0.1 - GET /index.html" → { "client_ip": "127.0.0.1", "method": "GET", "request": "/index.html" }
CASE Expressions
CASE expressions provide conditional logic:
-- Simple CASE
SELECT
CASE status
WHEN 'success' THEN 'Completed'
WHEN 'error' THEN 'Failed'
ELSE 'Unknown'
END AS status_text
FROM events;
-- Searched CASE
SELECT
CASE
WHEN response_time < 100 THEN 'Fast'
WHEN response_time < 300 THEN 'Normal'
ELSE 'Slow'
END AS performance
FROM events;
Subqueries and Derived Tables
You can use subqueries to create derived tables:
-- Subquery in SELECT
SELECT id, (SELECT count(*) FROM json_array_elements(items)) AS item_count
FROM events;
-- Subquery in FROM
SELECT u.name, e.count
FROM users u
JOIN (SELECT user_id, count(*) AS count FROM events GROUP BY user_id) e
ON u.id = e.user_id;
-- Subquery in WHERE
SELECT * FROM events
WHERE status IN (SELECT status FROM event_categories WHERE severity > 3);
Best Practices
-
Use type casting to ensure proper data type handling:
SELECT timestamp::timestamp, amount::numeric FROM events
-
Prefer
->
for JSON access chains, using->>
only for the final extraction:-- Good
SELECT data->'user'->'address'->>'city' FROM events
-- Avoid (may cause errors in parsing nested structures)
SELECT data->>'user'->>'address'->>'city' FROM events -
Use set-returning functions for processing arrays and structured data:
SELECT item.value->>'id' AS item_id
FROM events,
json_array_elements(items) item -
Remember aggregation scope - aggregations work on the current input event list only
Limitations and Special Behaviors
-
Case sensitivity: Table names and columns are case-insensitive.
-
Aggregation scope: Aggregation functions only work within the current input event list.
- For HTTP requests with multiple events, aggregation works across all events in the request
- For Kafka sources, each input event list typically contains only one message, limiting aggregation across messages
-
Implicit grouping: Simple queries with aggregate functions like
SELECT count(*) FROM events
imply an implicit grouping, equivalent toSELECT count(*) FROM events GROUP BY 1
. -
JSON handling: The type system is more flexible than standard Postgres SQL and tries to cast types to container elements when using JSON operators.
-
NULL handling: Operations on null values generally propagate null.