Parser Node

The parser node is a versatile component in ZephFlow that extracts structured data from string fields in your events. It can parse various log formats into structured key-value pairs, allowing you to transform raw log data into structured events for easier analysis and processing.

Key Features

Multi-format Parsing: Support for various log formats including Grok patterns, Syslog, CEF, and Windows multiline logs
Flexible Field Selection: Parse specific fields within your events
Conditional Parsing: Apply different parsing rules based on field values using dispatch configuration
Nested Parsing: Chain multiple parsers together for complex log structures
Field Management: Option to remove original raw fields after parsing

Attaching Parser Node Using SDK

ZephFlow flow = ZephFlow.startFlow();
flow.parse(parserConfig);

Basic Usage

The parser node requires a ParserConfig object that defines which field to parse and how to parse it:

// Create a parser config for Grok parsing
ParserConfigs.ParserConfig parserConfig = ParserConfigs.ParserConfig.builder()
    .targetField("message")
    .removeTargetField(true)
    .extractionConfig(new GrokExtractionConfig("%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:content}"))
    .build();

// Add the parser to your flow
ZephFlow flow = ZephFlow.startFlow();
flow.parse(parserConfig);

Configuration Details

ParserConfig

The ParserConfig class contains the main configuration parameters:

Parameter	Description
`targetField`	Field to be parsed. Must be a string field, otherwise processing will fail.
`removeTargetField`	When true, the original field is removed after parsing. When false, both original and parsed fields are present in output.
`extractionConfig`	Defines the parsing method and format (Grok, Syslog, CEF, or Windows multiline).
`dispatchConfig`	Optional configuration for further parsing based on field values.

Extraction Configurations

ZephFlow provides multiple extraction configurations to handle different log formats efficiently. Each extraction method is optimized for specific log structures and conventions.

Grok Extraction

Grok is a powerful pattern-matching syntax that combines named regular expressions to parse unstructured text into structured data.

Configuration

GrokExtractionConfig grokConfig = GrokExtractionConfig.builder()
    .grokExpression("%ASA-%{INT:level}-%{INT:message_number}: %{GREEDYDATA:message_text}")
    .build();

Parameter	Description
`grokExpression`	A pattern string that defines how to extract fields from the text

Grok Pattern Syntax

Grok patterns use the format %{SYNTAX:SEMANTIC} where:

SYNTAX is the pattern name (like IP, TIMESTAMP, NUMBER)
SEMANTIC is the field name you want to assign the matched value to

Common Grok Patterns

Pattern	Description
`%{NUMBER}`	Matches decimal numbers
`%{IP}`	Matches IPv4 addresses
`%{TIMESTAMP_ISO8601}`	Matches ISO8601 timestamps
`%{LOGLEVEL}`	Matches log levels (INFO, ERROR, etc.)
`%{GREEDYDATA}`	Matches everything remaining
`%{WORD}`	Matches word characters (a-z, A-Z, 0-9, _)
`%{NOTSPACE}`	Matches everything until a space

Example

// Grok pattern for Apache access logs
ParserConfigs.ParserConfig apacheConfig = ParserConfigs.ParserConfig.builder()
    .targetField("__raw__")
    .extractionConfig(new GrokExtractionConfig(
        "%{IPORHOST:client_ip} %{NOTSPACE:ident} %{NOTSPACE:auth} \\[%{HTTPDATE:timestamp}\\] \"%{WORD:method} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response_code} %{NUMBER:bytes}"
    ))
    .build();

Input:

{
  "__raw__": "192.168.1.1 - - [10/Oct/2023:13:55:36 -0700] \"GET /index.html HTTP/1.1\" 200 2326"
}

Output:

{
  "client_ip": "192.168.1.1",
  "ident": "-",
  "auth": "-",
  "timestamp": "10/Oct/2023:13:55:36 -0700",
  "method": "GET",
  "request": "/index.html",
  "httpversion": "1.1",
  "response_code": "200",
  "bytes": "2326"
}

Windows Multiline Extraction

This extraction method is specifically designed to parse Windows event logs that span multiple lines, which is common in Windows applications and services logs.

Configuration

WindowsMultilineExtractionConfig windowsConfig = WindowsMultilineExtractionConfig.builder()
    .timestampLocationType(TimestampLocationType.FIRST_LINE)
    .config(Map.of("key", "value"))
    .build();

Parameter	Description
`timestampLocationType`	Specifies where to find the timestamp in the log entry
`config`	Additional configuration parameters as key-value pairs

Timestamp Location Types

Type	Description
`NO_TIMESTAMP`	Log entries don't contain timestamps
`FIRST_LINE`	Timestamp appears in the first line of each log entry
`FROM_FIELD`	Timestamp is found in a specific field (requires setting `target_field` in config)

Using FROM_FIELD Timestamp Location

When your timestamp is in a specific field, configure as follows:

WindowsMultilineExtractionConfig windowsConfig = WindowsMultilineExtractionConfig.builder()
    .timestampLocationType(WindowsMultilineExtractionConfig.TimestampLocationType.FROM_FIELD)
    .config(Map.of("target_field", "event_time"))
    .build();

Syslog Extraction

The Syslog extraction configuration parses standard syslog formatted messages, which are widely used in system and network device logging.

Configuration

SyslogExtractionConfig syslogConfig = SyslogExtractionConfig.builder()
    .timestampPattern("MMM d HH:mm:ss")
    .componentList(List.of(
        SyslogExtractionConfig.ComponentType.TIMESTAMP,
        SyslogExtractionConfig.ComponentType.DEVICE,
        SyslogExtractionConfig.ComponentType.APP
    ))
    .messageBodyDelimiter(':')
    .build();

Parameter	Description
`timestampPattern`	Java date format pattern for parsing the timestamp component. required when `TIMESTAMP` component is present
`componentList`	Ordered list of syslog components present in the log
`messageBodyDelimiter`	Character that separates the header from the message body (optional)

Syslog Components

Component	Description	Parsed Field Name	Example
`PRIORITY`	Log priority enclosed in angle brackets	`priority`	`<13>`
`VERSION`	Syslog protocol version	`version`	`1`
`TIMESTAMP`	Timestamp of the log event. If present, `timestampPattern` is required	`timestamp`	`Oct 11 22:14:15`
`DEVICE`	Host name or IP address	`deviceId`	`server1`
`APP`	Application or process name	`appName`	`sshd`
`PROC_ID`	Process ID	`procId`	`12345`
`MSG_ID`	Message identifier	`msgId`	`ID47`
`STRUCTURED_DATA`	Structured data in the format `[id@domain key="value"]`	`structuredData`	`[exampleSDID@32473 iut="3" eventSource="App"]`
Remaining log	whatever remaining content after syslog header	`content`	`Failed password for invalid user admin from 192.168.1.10 port 55279 ssh2`

Example

// Syslog parser for standard BSD format logs
ParserConfigs.ParserConfig syslogConfig = ParserConfigs.ParserConfig.builder()
    .targetField("__raw__")
    .extractionConfig(SyslogExtractionConfig.builder()
        .timestampPattern("MMM d HH:mm:ss")
        .componentList(List.of(
            SyslogExtractionConfig.ComponentType.TIMESTAMP,
            SyslogExtractionConfig.ComponentType.DEVICE,
            SyslogExtractionConfig.ComponentType.APP,
            SyslogExtractionConfig.ComponentType.PROC_ID
        ))
        .messageBodyDelimiter(':')
        .build())
    .build();

Input:

{
  "__raw__": "Oct 11 22:14:15 server1 sshd 12345: Failed password for invalid user admin from 192.168.1.10 port 55279 ssh2"
}

Output:

{
  "timestamp": "Oct 11 22:14:15",
  "deviceId": "server1",
  "appName": "sshd",
  "procId": "12345",
  "content": "Failed password for invalid user admin from 192.168.1.10 port 55279 ssh2"
}

RFC5424 Format Example

For modern RFC5424 format syslog messages:

// Syslog parser for RFC5424 format
ParserConfigs.ParserConfig syslog5424Config = ParserConfigs.ParserConfig.builder()
    .targetField("log_message")
    .extractionConfig(SyslogExtractionConfig.builder()
        .timestampPattern("yyyy-MM-dd'T'HH:mm:ss.SSSXXX")
        .componentList(List.of(
            SyslogExtractionConfig.ComponentType.PRIORITY,
            SyslogExtractionConfig.ComponentType.VERSION,
            SyslogExtractionConfig.ComponentType.TIMESTAMP,
            SyslogExtractionConfig.ComponentType.DEVICE,
            SyslogExtractionConfig.ComponentType.APP,
            SyslogExtractionConfig.ComponentType.PROC_ID,
            SyslogExtractionConfig.ComponentType.MSG_ID,
            SyslogExtractionConfig.ComponentType.STRUCTURED_DATA
        ))
        .build())
    .build();

CEF Extraction

Common Event Format (CEF) is a logging and auditing file format developed by ArcSight, widely used in security information and event management (SIEM) systems.

Configuration

CefExtractionConfig cefConfig = new CefExtractionConfig();

The CEF extraction config doesn't require any additional parameters.

CEF Format Structure

CEF logs follow this structure:

CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension

The header part (before the Extension) contains pipe-delimited fields, while the Extension part contains key-value pairs.

Example

// CEF parser for security logs
ParserConfigs.ParserConfig cefConfig = ParserConfigs.ParserConfig.builder()
    .targetField("security_log")
    .extractionConfig(new CefExtractionConfig())
    .build();

Input:

CEF:0|Vendor|Product|1.0|100|Intrusion Detected|10|src=192.168.1.1 dst=10.0.0.1 spt=1234 dpt=80 act=blocked

Output:

{
  "severity": 10,
  "dst": "10.0.0.1",
  "src": "192.168.1.1",
  "deviceVendor": "Vendor",
  "__raw__": "CEF:0|Vendor|Product|1.0|100|Intrusion Detected|10|src=192.168.1.1 dst=10.0.0.1 spt=1234 dpt=80 act=blocked",
  "dpt": "80",
  "deviceVersion": "1.0",
  "version": 0,
  "deviceEventClassId": "100",
  "act": "blocked",
  "spt": "1234",
  "name": "Intrusion Detected",
  "deviceProduct": "Product"
}

Advanced Usage: Multi-level Log Parsing

For complex log formats that require multiple parsing stages, ZephFlow's dispatch configuration enables sophisticated processing pipelines. This example demonstrates how to handle Cisco ASA network security logs.

Complex Example: Cisco ASA Log Parsing

This example shows how to parse complex Cisco ASA logs with multiple levels of extraction.

Input Log:

{
  "__raw__": "Oct 10 2018 12:34:56 localhost CiscoASA[999]: %ASA-6-305011: Built dynamic TCP translation from inside:172.31.98.44/1772 to outside:100.66.98.44/8256"
}

Parser Configuration:

{
  "targetField": "__raw__",
  "removeTargetField": false,
  "extractionConfig": {
    "type": "syslog",
    "timestampPattern": "MMM dd yyyy HH:mm:ss",
    "componentList": [
      "TIMESTAMP",
      "DEVICE",
      "APP"
    ],
    "messageBodyDelimiter": ":"
  },
  "dispatchConfig": {
    "dispatchField": "content",
    "dispatchMap": {},
    "defaultConfig": {
      "targetField": "content",
      "removeTargetField": true,
      "extractionConfig": {
        "type": "grok",
        "grokExpression": "%ASA-%{INT:level}-%{INT:message_number}: %{GREEDYDATA:message_text}"
      },
      "dispatchConfig": {
        "dispatchField": "message_number",
        "dispatchMap": {
          "305011": {
            "targetField": "message_text",
            "removeTargetField": true,
            "extractionConfig": {
              "type": "grok",
              "grokExpression": "%{WORD:action} %{WORD:translation_type} %{WORD:protocol} translation from %{WORD:source_interface}:%{IP:source_ip}/%{INT:source_port} to %{WORD:dest_interface}:%{IP:dest_ip}/%{INT:dest_port}"
            },
            "dispatchConfig": null
          }
        },
        "defaultConfig": null
      }
    }
  }
}

Expected Output:

{
  "level": "6",
  "message_number": "305011",
  "source_interface": "inside",
  "__raw__": "Oct 10 2018 12:34:56 localhost CiscoASA[999]: %ASA-6-305011: Built dynamic TCP translation from inside:172.31.98.44/1772 to outside:100.66.98.44/8256",
  "appName": "CiscoASA[999]",
  "dest_interface": "outside",
  "source_ip": "172.31.98.44",
  "translation_type": "dynamic",
  "deviceId": "localhost",
  "protocol": "TCP",
  "source_port": "1772",
  "dest_ip": "100.66.98.44",
  "action": "Built",
  "dest_port": "8256",
  "timestamp": "Oct 10 2018 12:34:56"
}

How It Works

This example processes logs through three sequential stages:

First Stage (Syslog Parsing):

Parses the syslog header format from the __raw__ field
Extracts timestamp, device ID, and application name
Places the message content after the colon in a field called content

Second Stage (Cisco ASA Format Parsing):

The default config in the first dispatchConfig targets the content field
Uses Grok to extract the ASA log format with level, message number, and message text
For example, parses %ASA-6-305011: Built dynamic TCP translation...
Places the message text into the message_text field

Third Stage (Message-Type Specific Parsing):

Uses the message_number (305011) to select the appropriate parser
Each message type has specific fields to extract
For 305011 messages, extracts details about the network translation

This multi-stage approach demonstrates the power of nested dispatch configurations for handling complex, structured logs with varying formats.

Best Practices

Selecting the Right Extraction Configuration

Analyze your log format first to determine which extraction configuration best matches your needs:

Use Grok for most text-based logs with consistent formats
Use Syslog for standard system and network device logs
Use Windows Multiline for Windows Event logs
Use CEF for security and SIEM-related logs

Test with sample data to verify your configuration handles all variations in your log format
Create targeted parsers rather than trying to parse everything with one complex configuration

Using Dispatch Configuration

For mixed log formats, use the dispatch configuration to apply different parsing strategies based on initial parsing results:

ParserConfigs.ParserConfig initialParser = ParserConfigs.ParserConfig.builder()
    .targetField("raw_log")
    .extractionConfig(new GrokExtractionConfig("%{WORD:log_type}: %{GREEDYDATA:log_content}"))
    .dispatchConfig(ParserConfigs.DispatchConfig.builder()
        .dispatchField("log_type")
        .dispatchMap(Map.of(
            "apache", apacheParserConfig,
            "syslog", syslogParserConfig,
            "windows", windowsParserConfig,
            "security", cefParserConfig
        ))
        .defaultConfig(defaultParserConfig)
        .build())
    .build();

Common Pitfalls to Avoid

Overly complex Grok patterns can be difficult to maintain - break them down into smaller, reusable patterns
Missing components in Syslog configuration - ensure your component list matches the exact format of your logs
Incorrect timestamp patterns - test thoroughly with various date formats that appear in your logs
Performance considerations - very complex parsing on high-volume logs can impact performance; consider pre-filtering or using simpler patterns where possible

Key Features​

Attaching Parser Node Using SDK​

Basic Usage​

Configuration Details​

ParserConfig​

Extraction Configurations​

Grok Extraction​

Configuration​

Grok Pattern Syntax​

Common Grok Patterns​

Example​

Windows Multiline Extraction​

Configuration​

Timestamp Location Types​

Using FROM_FIELD Timestamp Location​

Syslog Extraction​

Configuration​

Syslog Components​

Example​

RFC5424 Format Example​

CEF Extraction​

Configuration​

CEF Format Structure​

Example​

Advanced Usage: Multi-level Log Parsing​

Complex Example: Cisco ASA Log Parsing​

How It Works​

Best Practices​

Selecting the Right Extraction Configuration​

Using Dispatch Configuration​

Common Pitfalls to Avoid​

Key Features

Attaching Parser Node Using SDK

Basic Usage

Configuration Details

ParserConfig

Extraction Configurations

Grok Extraction

Configuration

Grok Pattern Syntax

Common Grok Patterns

Example

Windows Multiline Extraction

Configuration

Timestamp Location Types

Using FROM_FIELD Timestamp Location

Syslog Extraction

Configuration

Syslog Components

Example

RFC5424 Format Example

CEF Extraction

Configuration

CEF Format Structure

Example

Advanced Usage: Multi-level Log Parsing

Complex Example: Cisco ASA Log Parsing

How It Works

Best Practices

Selecting the Right Extraction Configuration

Using Dispatch Configuration

Common Pitfalls to Avoid