Parsing rules are rules for custom feeds (feeds that are specified using the Path
element). These parameters specify how each feed must be parsed by Feed Utility.
Parsing rules are defined in the Parsing
element of feed rules for a custom feed.
The following is an example of parsing rules for a custom feed. These rules specify that the input feed is in JSON format. An MD5 parsing rule is defined for the files/md5
field in the input feed. Values in this field will be parsed as MD5 hashes.
<Feed> ... <Parsing type="json"> <MD5 type="MD5">files/md5</MD5> </Parsing> ... <Feed> |
Parsing element
The parent element, Parsing
contains all nested parsing rules. Its attributes define the input format.
This element has the following attributes:
type
Specifies the input feed type.
This attribute can have the following values: json
, csv
, xml
, misp
, stix
, stix2
, pdf
, messageBody
, messageAttach
.
Feed Utility supports STIX versions 1.0, 1.1, 2.0, and 2.1. The exact version of STIX is determined automatically.
The file added to the directory of the pdf feed will not be processed, if this file is created earlier than or at the same time as the latest created file that had been previously processed.
delimiter
Specifies the delimiter for CSV input feeds. By default, this value is ';
'.
rootElement
Specifies the root element path for XML and JSON input feeds.
You can use the '*
' and '?
' wildcard characters as substitutes for any other character or group of characters. The '*' wildcard character can be used for a group of characters. The '?
' wildcard character can be used for a single character.
You cannot specify parts of the rootElement
path with wildcard symbols only. For example, "Feeds/*/Contents"
is invalid.
You can specify root element value with any nesting level. Define the limits of the nesting level with a "/
" character.
The root element parameter can be empty. If it is not empty, the value of the root element will not contain empty nesting levels (substring "//
"), and will not start or end with a "/
" character.
You cannot use wildcards in the root element for JSON feeds.
The following example demonstrates how to use the Parsing
element for an XML input feed. In this case, parsing rules will be applied to elements nested inside the Feeds > Example > Contents
element.
<Feed> ... <Parsing type="xml" rootElement="Feeds/Example/Contents"> ... </Parsing> ... <Feed> |
Individual parsing rules
Parsing rules for individual fields of an input feed must be nested inside the Parsing
element. When Feed Utility processes the input feed, it creates the fields of the output feed according to these rules.
Each rule has the following format:
<%OUTPUT_NAME% type="%VALUE_TYPE%">%INPUT_NAME%</%OUTPUT_NAME%>
Above, the following rule name elements are used:
%OUTPUT_NAME% preserves nested fields. If a field specified in the %INPUT_NAME% is nested, the field in the output feed will also be nested. For example, if %OUTPUT_NAME% is MD5_HASH and %INPUT_NAME% is files/md5, the field in the output feed will be files/MD5_HASH.
For JSON input feed, %OUTPUT_NAME% must always use the Field
value. Feed Utility uses the field names from the original feed.
These values will be handled by Feed Utility according to the specified type. For example, if the output feed contains domain names and URLs, then it will be compiled to the binary format.
Following value types are possible:
url
—This value type is used for URLs.ip
—This value type is used for IP addresses.md5
—This value type is used for MD5 hashes.sha1
—This value type is used for SHA1 hashes.sha256
—This value type is used for SHA256 hashes.domain
—This value type is used for domain names.context
—This value type is used for context information./
'.rootElement
attribute of Parsing
element. The path is case sensitive.Parsing
element must contain no parsing rules.The following example demonstrates parsing rule syntax for JSON input format:
<Feed> ... <Parsing type="json"> <Field type="url">URL</Field> <Field type="ip">IP</Field> <Field type="context">GEO</Field> <Field type="md5">files/md5</Field> </Parsing> ... <Feed> |
The following example demonstrates parsing rule syntax for CSV input format:
<Feed> ... <Parsing type="csv" delimiter=";"> <URL type="url">1</URL> <IP type="ip">2</IP> <GEO type="context">3</GEO> <MD5 type="md5">4</MD5> </Parsing> ... <Feed> |
The following example demonstrates parsing rule syntax for XML input format:
<Feed> ... <Parsing type="xml" rootElement="Feeds/Example/Contents"> <URL type="url">url</URL> <IP type="ip">ip</IP> <GEO type="context">context</GEO> <MD5 type="md5">md5_hash</MD5> </Parsing> ... <Feed> |
Parsing rules for feeds of email type
To set the parsing rules for a third-party feed, specify the following values for the type
attribute:
messageBody
—Parsing rules for an email body.This value is applicable if POP3 or IMAP are enabled in the Path
element.
messageAttach
—Parsing rules for an email attachment. This value is applicable if POP3 or IMAP are enabled in the Path
element.
Parsing the message body (for feeds of the email type)
Feed Utility parses the body of an email loaded from a mail server, if the messageBody
value is set in the type
attribute of the Parsing
element.
For parsing the message body, the regular expressions specified in the Parsing
element are used.
You can set one or several rules with regular expressions for message body parsing.
Each rule has the following form:
<%FIELD_NAME% type="%FIELD_TYPE%">%REG_EXP%</%FIELD_NAME%>,
Where:
%FIELD_NAME% defines the name of the field in the output feed. For example, if %FIELD_NAME% is MD5, the field with this value will also be named MD5 in the output feed.
%FIELD_TYPE% is an indicator type.
%REG_EXP% is a regular expression.
Each regular expression applies to the whole message body.
The feed is formed according to the content of loaded emails. Formation of the feed meets the following conditions:
Feed Utility stores the date of the feed's latest update. In the case of addressing the mail server, the parsing will be applied only to the emails received after the previous feed update.
message_from
—Email address of the message sender.message_subject
—Subject of the email.message_date
—Date on which the mail server receives the email.type
attribute value other than CONTEXT.type
attribute of which has a CONTEXT value.The values are indicated in the resulting feed entries, which contain the indicators (IP/HASH/URL) from the same email.
If more than one value is obtained per one regular expression (with the type attribute having the CONTEXT value), these values are specified in one entry of the resulting feed. The values are separated by a sequence of ";
" characters.
Excluded
section (see the "Excluded element for PDF and email feeds" section below).Parsing message attachments (for feeds of the email type)
Feed Utility parses email attachments loaded from a mail server, if the messageAttach
value is set in the type attribute of the Parsing
element.
You can set one or several rules with types of attached files.
Each rule has the following form:
<Attach type="%ATTACH_TYPE%"></Attach>
,
Where:
%ATTACH_TYPE% is an attachment type.
%ATTACH_TYPE% can have the following values:
csv
json
xml
stix
stix2
pdf
The Attach
element has at least one value.
You can set one or several rules with regular expressions.
Each rule has the following form:
<%FIELD_NAME% type="%FIELD_TYPE%">%REG_EXP%</%FIELD_NAME%>
,
Where:
%FIELD_NAME% defines the name of the field in the output feed. For example, if %FIELD_NAME% is MD5, the field with this value will also be named MD5 in the output feed.
%REG_EXP% is a regular expression.
%FIELD_TYPE% is an indicator type. For the %FIELD_TYPE% element, specify the type
attribute by using the following values:
The following example demonstrates the parsing rule for a message attachment:
<Attach type="pdf"> <hash1 type="md5">([\da-fA-F]{32})</hash1> <hash2 type="sha1">([\da-fA-F]{40})</hash2> </Attach> |
Feed Utility parses files with the following extensions:
Value in the |
File extensions |
csv |
csv and txt |
json |
json |
xml |
xml |
stix1 |
xml |
stix2 |
json |
If parsing rules are set simultaneously for stix1 and xml (or stix2 and json), Feed Utility performs the following:
If an email has more than one attachment, the information from each attachment will be in one resulting feed.
The feed is formed according to the content of loaded emails. Formation of the feed meets the following conditions:
Feed Utility stores the date of the feed latest update. In case of addressing the mail server, the parsing will be applied only to the emails received after the previous feed update.
message_from
—Email address of the message sender.message_subject
—Subject of email.message_date
—Date on which the mail server receives the email.attach_name
—Name of attachment.type
attribute value other than CONTEXT.type
attribute of which has a CONTEXT value.The values are indicated in the resulting feed entries, which contain the indicators (IP/HASH/URL) from the same attachment.
If more than one value is obtained per one regular expression (with the type attribute having the CONTEXT value), these values are specified in one entry of the resulting feed. The values are separated by a sequence of ";
" characters.
Excluded
section (see the "Excluded element for PDF and email feeds" section below).Excluded element for PDF feeds and email feeds
If the pdf, messageBody
, or messageAttach
value is specified in the type
attribute of the Parsing
element, the Feed
element can contain the Excluded
section and have one or more nested <Item/> elements with indicator exclusion rules for the resulting feed.
The Excluded
section has the following form:
<Excluded> <Item>{RegExp}</Item> ... </Excluded> |
Where {RegExp}
is a regular expression.
The Excluded
section and the Item
elements are not obligatory.
The following example demonstrates exclusion rules:
<Excluded> <Item>(\w{3}\s+\d+\s+[\d\:]+)\s</Item> <Item>(https:\/\/badurl\.com)</Item> </Excluded> |