Parsing rules

Parsing rules are rules for custom feeds (feeds that are specified using the Path element). These parameters specify how each feed must be parsed by Feed Utility.

Parsing rules are defined in the Parsing element of feed rules for a custom feed.

The following is an example of parsing rules for a custom feed. These rules specify that the input feed is in JSON format. An MD5 parsing rule is defined for the files/md5 field in the input feed. Values in this field will be parsed as MD5 hashes.

<Feed>

...

<Parsing type="json">

<MD5 type="MD5">files/md5</MD5>

</Parsing>

...

<Feed>

Parsing element

The parent element, Parsing contains all nested parsing rules. Its attributes define the input format.

This element has the following attributes:

The following example demonstrates how to use the Parsing element for an XML input feed. In this case, parsing rules will be applied to elements nested inside the Feeds > Example > Contents element.

<Feed>

...

<Parsing type="xml" rootElement="Feeds/Example/Contents">

...

</Parsing>

...

<Feed>

Individual parsing rules

Parsing rules for individual fields of an input feed must be nested inside the Parsing element. When Feed Utility processes the input feed, it creates the fields of the output feed according to these rules.

Each rule has the following format:

<%OUTPUT_NAME% type="%VALUE_TYPE%">%INPUT_NAME%</%OUTPUT_NAME%>

Above, the following rule name elements are used:

The following example demonstrates parsing rule syntax for JSON input format:

<Feed>

...

<Parsing type="json">

<Field type="url">URL</Field>

<Field type="ip">IP</Field>

<Field type="context">GEO</Field>

<Field type="md5">files/md5</Field>

</Parsing>

...

<Feed>

The following example demonstrates parsing rule syntax for CSV input format:

<Feed>

...

<Parsing type="csv" delimiter=";">

<URL type="url">1</URL>

<IP type="ip">2</IP>

<GEO type="context">3</GEO>

<MD5 type="md5">4</MD5>

</Parsing>

...

<Feed>

The following example demonstrates parsing rule syntax for XML input format:

<Feed>

...

<Parsing type="xml" rootElement="Feeds/Example/Contents">

<URL type="url">url</URL>

<IP type="ip">ip</IP>

<GEO type="context">context</GEO>

<MD5 type="md5">md5_hash</MD5>

</Parsing>

...

<Feed>

Parsing rules for feeds of email type

To set the parsing rules for a third-party feed, specify the following values for the type attribute:

Parsing the message body (for feeds of the email type)

Feed Utility parses the body of an email loaded from a mail server, if the messageBody value is set in the type attribute of the Parsing element.

For parsing the message body, the regular expressions specified in the Parsing element are used.

You can set one or several rules with regular expressions for message body parsing.

Each rule has the following form:

<%FIELD_NAME% type="%FIELD_TYPE%">%REG_EXP%</%FIELD_NAME%>,

Where:

%FIELD_NAME% defines the name of the field in the output feed. For example, if %FIELD_NAME% is MD5, the field with this value will also be named MD5 in the output feed.

%FIELD_TYPE% is an indicator type.

%REG_EXP% is a regular expression.

Each regular expression applies to the whole message body.

The feed is formed according to the content of loaded emails. Formation of the feed meets the following conditions:

Parsing message attachments (for feeds of the email type)

Feed Utility parses email attachments loaded from a mail server, if the messageAttach value is set in the type attribute of the Parsing element.

You can set one or several rules with types of attached files.

Each rule has the following form:

<Attach type="%ATTACH_TYPE%"></Attach>,

Where:

%ATTACH_TYPE% is an attachment type.

%ATTACH_TYPE% can have the following values:

The Attach element has at least one value.

You can set one or several rules with regular expressions.

Each rule has the following form:

<%FIELD_NAME% type="%FIELD_TYPE%">%REG_EXP%</%FIELD_NAME%>,

Where:

%FIELD_NAME% defines the name of the field in the output feed. For example, if %FIELD_NAME% is MD5, the field with this value will also be named MD5 in the output feed.

%REG_EXP% is a regular expression.

%FIELD_TYPE% is an indicator type. For the %FIELD_TYPE% element, specify the type attribute by using the following values:

The following example demonstrates the parsing rule for a message attachment:

<Attach type="pdf">

<hash1 type="md5">([\da-fA-F]{32})</hash1>

<hash2 type="sha1">([\da-fA-F]{40})</hash2>

</Attach>

Feed Utility parses files with the following extensions:

Value in the type attribute of the Parsing element

File extensions

csv

csv and txt

json

json

xml

xml

stix1

xml

stix2

json

pdf

pdf

If parsing rules are set simultaneously for stix1 and xml (or stix2 and json), Feed Utility performs the following:

  1. Attempts to parse the attached file as stix (with the xml/json extension).
  2. If no errors occurred while parsing, and the file is a valid stix feed, this file is not parsed according to the rules for parsing xml/json attachments specified in the feed's settings.
  3. If an error occurred while parsing (the file is not a valid stix feed), this file is parsed according to the rules for parsing xml/json attachments specified in the feed's settings.

If an email has more than one attachment, the information from each attachment will be in one resulting feed.

The feed is formed according to the content of loaded emails. Formation of the feed meets the following conditions:

Excluded element for PDF feeds and email feeds

If the pdf, messageBody, or messageAttach value is specified in the type attribute of the Parsing element, the Feed element can contain the Excluded section and have one or more nested <Item/> elements with indicator exclusion rules for the resulting feed.

The Excluded section has the following form:

<Excluded>

<Item>{RegExp}</Item>

...

</Excluded>

Where {RegExp} is a regular expression.

The Excluded section and the Item elements are not obligatory.

The following example demonstrates exclusion rules:

<Excluded>

<Item>(\w{3}\s+\d+\s+[\d\:]+)\s</Item>

<Item>(https:\/\/badurl\.com)</Item>

</Excluded>

Page top