Fleak SQL Language Reference

Introduction

Fleak SQL is a SQL dialect that follows Postgres SQL standards while providing enhanced functionality for handling nested data structures, JSON operations, and log parsing. This reference guide covers the syntax, data types, operators, and functions available in Fleak SQL.

Basic Syntax

Fleak SQL supports standard SQL query syntax:

SELECT [DISTINCT [ON (expression_list)]] column1 [AS alias1], column2 [AS alias2], ...
FROM source_table [AS alias] 
[JOIN another_table [ON condition]]
[WHERE conditions]
[GROUP BY column1, column2, ...]
[HAVING group_conditions]
[ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...]
[LIMIT count]
[;]

note

Always use events as the source_table to represent the input data.

Query Components

SELECT Statement

The SELECT statement retrieves data from tables or expressions:

-- Basic selection
SELECT id, timestamp, status FROM events;

-- Select all columns
SELECT * FROM events;

-- Column aliasing
SELECT timestamp AS event_time, message AS event_message FROM events;

-- DISTINCT to remove duplicates
SELECT DISTINCT status FROM events;

-- DISTINCT ON to keep first row per group
SELECT DISTINCT ON (user_id) user_id, timestamp, message 
FROM events 
ORDER BY user_id, timestamp DESC;

FROM Clause

The FROM clause specifies the data source:

-- Simple table reference
FROM events

-- Table with alias
FROM events AS e

-- Subquery
FROM (SELECT * FROM raw_events WHERE type = 'error') AS filtered_events

WHERE Clause

The WHERE clause filters rows:

-- Basic condition
WHERE status = 'success'

-- Multiple conditions
WHERE status = 'error' AND severity > 3

-- JSON field conditions
WHERE data->'user'->>'role' = 'admin'

-- Pattern matching
WHERE message LIKE '%error%'

-- Range check
WHERE timestamp BETWEEN '2023-01-01' AND '2023-01-31'

-- NULL check
WHERE response_code IS NULL
WHERE response_code IS NOT NULL

JOIN Clause

Joins combine rows from multiple tables:

-- INNER JOIN
FROM table1 JOIN table2 ON table1.id = table2.table1_id

-- LEFT JOIN
FROM table1 LEFT JOIN table2 ON table1.id = table2.table1_id

-- RIGHT JOIN
FROM table1 RIGHT JOIN table2 ON table1.id = table2.table1_id

-- FULL JOIN
FROM table1 FULL JOIN table2 ON table1.id = table2.table1_id

-- CROSS JOIN
FROM table1 CROSS JOIN table2

GROUP BY and HAVING Clauses

GROUP BY groups rows with the same values:

-- Group by one column
GROUP BY status

-- Group by multiple columns
GROUP BY region, status

-- Filter groups with HAVING
GROUP BY status HAVING COUNT(*) > 10

ORDER BY Clause

ORDER BY sorts the result set:

-- Ascending order (default)
ORDER BY timestamp

-- Descending order
ORDER BY timestamp DESC

-- Multiple sort columns
ORDER BY status ASC, timestamp DESC

LIMIT Clause

LIMIT restricts the number of rows returned:

-- Limit to 10 rows
LIMIT 10

Data Types

Fleak SQL supports the following data types:

Type Name	Description
INT, INTEGER	Integer value
TEXT, STRING, VARCHAR	String value
DATE	Date value
TIME	Time value
TIMESTAMP	Timestamp value
JSON	JSON object or array
NUMERIC	Decimal value
DOUBLE PRECISION	Double precision floating point
BOOLEAN, BOOL	Boolean value (true/false)

Type Casting

Values can be cast to different types using the :: operator:

-- Cast to string
SELECT field::text FROM events          

-- Cast to integer
SELECT field::int FROM events           
SELECT field::integer FROM events       
SELECT field::smallint FROM events       
SELECT field::bigint FROM events        

-- Cast to numeric types
SELECT field::numeric FROM events        

-- Cast to boolean
SELECT field::boolean FROM events       
SELECT field::bool FROM events          

-- Cast to JSON
SELECT field::json FROM events          

Operators

Arithmetic Operators

Operator	Description	Example
`+`	Addition	`SELECT amount + tax FROM events`
`-`	Subtraction	`SELECT price - discount FROM events`
`*`	Multiplication	`SELECT quantity * price FROM events`
`/`	Division	`SELECT total / count FROM events`
`%`	Modulo	`SELECT id % 10 FROM events`
`^`	Exponent	`SELECT 2 ^ 3 FROM events`

Comparison Operators

Operator	Description	Example
`=`	Equal	`WHERE status = 'success'`
`!=`, `<>`	Not equal	`WHERE status != 'error'`
`<`	Less than	`WHERE count < 10`
`>`	Greater than	`WHERE response_time > 100`
`<=`	Less than or equal	`WHERE count <= 20`
`>=`	Greater than or equal	`WHERE response_time >= 50`

Logical Operators

Operator	Description	Example
`AND`	Logical AND	`WHERE status = 'error' AND severity > 3`
`OR`	Logical OR	`WHERE status = 'success' OR status = 'pending'`
`NOT`	Logical NOT	`WHERE NOT status = 'error'`

String Operators

Operator	Description	Example
`		`
`LIKE`	Pattern matching (case-sensitive)	`WHERE name LIKE 'A%'`
`ILIKE`	Pattern matching (case-insensitive)	`WHERE name ILIKE 'a%'`

JSON Operators

Operator	Description	Example
`->`	Get JSON object field or array element as JSON	`data->'user'->'name'`
`->>`	Get JSON object field or array element as text	`data->>'email'`

Working with JSON Data

Fleak SQL provides powerful operators for handling JSON data structures.

JSON Access Examples

-- Access object field as JSON
SELECT data->'user'->'name' FROM events;

-- Access object field as text
SELECT data->>'email' FROM events;

-- Access array element
SELECT items->0->'price' FROM events;

-- Access array element and convert to text
SELECT items->0->>'name' FROM events;

Special JSON Behavior

Unlike standard Postgres SQL, Fleak SQL provides more flexible type handling with JSON operators:

Automatic string-to-JSON casting when possible
-> is preferred for chained access to preserve JSON types
Use ->> only when you need the final result as a string

For example:

-- This works in Fleak SQL (auto-casting string to JSON)
SELECT modelOutputs->0->'result'->'AAPL' FROM events;

-- Avoid this pattern (may cause errors)
SELECT modelOutputs->>0->'result' FROM events;

Functions

JSON Functions

Function	Description	Example
`json_build_object`	Create a JSON object	`json_build_object('name', username, 'age', age)`
`json_array_length`	Get JSON array length	`json_array_length(items)`
`json_array_elements`	Expand JSON array into rows	`SELECT * FROM events, json_array_elements(items) item`
`json_each`	Expand JSON object into key/value rows	`SELECT * FROM events, json_each(data) kv`
`json_each_text`	Expand JSON object into key/value rows as text	`SELECT * FROM events, json_each_text(data) kv`
`json_object_keys`	Get JSON object keys as rows	`SELECT * FROM events, json_object_keys(data) k`
`json_strip_nulls`	Remove object fields with null values	`json_strip_nulls(data)`
`json_agg`	Aggregate values as JSON array	`json_agg(value)`

String Functions

Function	Description	Example
`concat`	Concatenate strings	`concat(first_name, ' ', last_name)`
`concat_ws`	Concatenate with separator	`concat_ws(', ', first_name, last_name)`
`upper`	Convert to uppercase	`upper(name)`
`lower`	Convert to lowercase	`lower(email)`
`length`	String length	`length(message)`
`char_length`	Character count	`char_length(message)`
`substring`	Extract substring	`substring(text, 1, 10)`
`regexp_replace`	Replace using regex	`regexp_replace(text, 'pattern', 'replacement')`
`regexp_split_to_array`	Split string to array	`regexp_split_to_array(text, ',')`
`regexp_split_to_table`	Split string to rows	`SELECT * FROM events, regexp_split_to_table(tags, ',')`
`regexp_matches`	Find regex matches	`regexp_matches(text, 'pattern')`
`split_part`	Get part after splitting	`split_part(text, ',', 2)`
`trim`	Remove whitespace	`trim(text)`
`ltrim`	Remove leading whitespace	`ltrim(text)`
`rtrim`	Remove trailing whitespace	`rtrim(text)`
`replace`	Replace substring	`replace(text, 'old', 'new')`
`left`	Get leftmost characters	`left(text, 5)`
`right`	Get rightmost characters	`right(text, 5)`
`position`	Find substring position	`position('needle' IN 'haystack')`
`format`	Format string	`format('Hello, %s!', name)`
`md5`	MD5 hash	`md5(text)`
`btrim`	Trim specified characters	`btrim(text, 'xyz')`
`reverse`	Reverse string	`reverse(text)`
`to_hex`	Convert to hex	`to_hex(255)`

Mathematical Functions

Function	Description	Example
`abs`	Absolute value	`abs(value)`
`round`	Round number	`round(value, 2)`
`ceil`, `ceiling`	Round up to nearest integer	`ceil(value)`
`floor`	Round down to nearest integer	`floor(value)`
`random`	Random value between 0 and 1	`random()`
`sqrt`	Square root	`sqrt(16)`
`power`	Raise to power	`power(2, 3)`
`mod`	Modulo	`mod(10, 3)`
`trunc`	Truncate decimal	`trunc(value, 2)`
`sign`	Sign (-1, 0, 1)	`sign(-15)`
`gcd`	Greatest common divisor	`gcd(12, 8)`
`lcm`	Least common multiple	`lcm(12, 8)`
`exp`	Exponential	`exp(1)`
`log`	Natural logarithm	`log(value)`
`ln`	Natural logarithm	`ln(value)`
`log10`	Base-10 logarithm	`log10(100)`
`log2`	Base-2 logarithm	`log2(8)`
`pi`	PI constant	`pi()`
`degrees`	Radians to degrees	`degrees(pi())`
`radians`	Degrees to radians	`radians(180)`
`sin`	Sine	`sin(radians(30))`
`cos`	Cosine	`cos(radians(60))`
`tan`	Tangent	`tan(radians(45))`

Date and Time Functions

Function	Description	Example
`now`	Current timestamp	`now()`
`date_trunc`	Truncate timestamp	`date_trunc('hour', timestamp)`
`to_timestamp`	Convert to timestamp	`to_timestamp(epoch_value)`
`to_date`	Convert to date	`to_date(string, format)`

Aggregate Functions

Function	Description	Example
`count`	Count rows	`SELECT count(*) FROM events`
`sum`	Sum values	`SELECT sum(amount) FROM events`
`max`	Maximum value	`SELECT max(timestamp) FROM events`
`min`	Minimum value	`SELECT min(timestamp) FROM events`
`avg`	Average value	`SELECT avg(score) FROM events`
`string_agg`	Concatenate strings	`SELECT string_agg(tag, ',') FROM events, json_array_elements(tags) tag`

Important Note: Aggregation functions in Fleak SQL only work with the current input event list. For HTTP requests with multiple events, aggregation works across all events in the request. For Kafka sources, each input event list typically contains only one message.

Set-returning Functions

These functions return multiple rows from a single input:

Function	Description	Example
`json_array_elements`	Expand JSON array into rows	`SELECT * FROM events, json_array_elements(items) item`
`regexp_split_to_table`	Split string to rows	`SELECT * FROM events, regexp_split_to_table(tags, ',')`
`json_each`	Expand JSON object into key/value rows	`SELECT * FROM events, json_each(data) kv`

Other Functions

Function	Description	Example
`greatest`	Largest value from list	`greatest(a, b, c)`
`least`	Smallest value from list	`least(a, b, c)`

Log Parsing with `grok`

The grok function provides powerful log parsing capabilities:

SELECT grok('%{IPV4:client_ip} - %{WORD:method} %{URIPATHPARAM:request}', log_entry) AS extracted_fields
FROM events

This extracts structured data from unstructured log text:

"127.0.0.1 - GET /index.html" → { "client_ip": "127.0.0.1", "method": "GET", "request": "/index.html" }

CASE Expressions

CASE expressions provide conditional logic:

-- Simple CASE
SELECT 
  CASE status
    WHEN 'success' THEN 'Completed'
    WHEN 'error' THEN 'Failed'
    ELSE 'Unknown'
  END AS status_text
FROM events;

-- Searched CASE
SELECT 
  CASE 
    WHEN response_time < 100 THEN 'Fast'
    WHEN response_time < 300 THEN 'Normal'
    ELSE 'Slow'
  END AS performance
FROM events;

Subqueries and Derived Tables

You can use subqueries to create derived tables:

-- Subquery in SELECT
SELECT id, (SELECT count(*) FROM json_array_elements(items)) AS item_count
FROM events;

-- Subquery in FROM
SELECT u.name, e.count
FROM users u
JOIN (SELECT user_id, count(*) AS count FROM events GROUP BY user_id) e
ON u.id = e.user_id;

-- Subquery in WHERE
SELECT * FROM events
WHERE status IN (SELECT status FROM event_categories WHERE severity > 3);

Best Practices

Use type casting to ensure proper data type handling:

SELECT timestamp::timestamp, amount::numeric FROM events

Prefer -> for JSON access chains, using ->> only for the final extraction:

-- Good
SELECT data->'user'->'address'->>'city' FROM events

-- Avoid (may cause errors in parsing nested structures)
SELECT data->>'user'->>'address'->>'city' FROM events

Use set-returning functions for processing arrays and structured data:

SELECT item.value->>'id' AS item_id
FROM events,
json_array_elements(items) item

Remember aggregation scope - aggregations work on the current input event list only

Limitations and Special Behaviors

Case sensitivity: Table names and columns are case-insensitive.
Aggregation scope: Aggregation functions only work within the current input event list.

For HTTP requests with multiple events, aggregation works across all events in the request
For Kafka sources, each input event list typically contains only one message, limiting aggregation across messages

Implicit grouping: Simple queries with aggregate functions like SELECT count(*) FROM events imply an implicit grouping, equivalent to SELECT count(*) FROM events GROUP BY 1.
JSON handling: The type system is more flexible than standard Postgres SQL and tries to cast types to container elements when using JSON operators.
NULL handling: Operations on null values generally propagate null.

Introduction​

Basic Syntax​

Query Components​

SELECT Statement​

FROM Clause​

WHERE Clause​

JOIN Clause​

GROUP BY and HAVING Clauses​

ORDER BY Clause​

LIMIT Clause​

Data Types​

Type Casting​

Operators​

Arithmetic Operators​

Comparison Operators​

Logical Operators​

String Operators​

JSON Operators​

Working with JSON Data​

JSON Access Examples​

Special JSON Behavior​

Functions​

JSON Functions​

String Functions​

Mathematical Functions​

Date and Time Functions​

Aggregate Functions​

Set-returning Functions​

Other Functions​

Log Parsing with grok​

CASE Expressions​

Subqueries and Derived Tables​

Best Practices​

Limitations and Special Behaviors​