Parsing rules are rules for custom feeds (feeds that are specified using the Path element). These parameters specify how each feed must be parsed by Feed Utility.
Parsing rules are defined in the Parsing element of feed rules for a custom feed.
The following is an example of parsing rules for a custom feed. These rules specify that the input feed is in JSON format. An MD5 parsing rule is defined for the files/md5 field in the input feed. Values in this field will be parsed as MD5 hashes.
<Feed> ... <Parsing type="json"> <MD5 type="MD5">files/md5</MD5> </Parsing> ... <Feed> |
Parsing element
The parent element, Parsing contains all nested parsing rules. Its attributes define the input format.
This element has the following attributes:
typeSpecifies the input feed type.
This attribute can have the following values: json, csv, xml, misp, stix, stix2, pdf, messageBody, messageAttach.
Feed Utility supports STIX versions 1.0, 1.1, 2.0, and 2.1. The exact version of STIX is determined automatically.
The file added to the directory of the pdf feed will not be processed, if this file is created earlier than or at the same time as the latest created file that had been previously processed.
delimiterSpecifies the delimiter for CSV input feeds. By default, this value is ';'.
rootElementSpecifies the root element path for XML and JSON input feeds.
You can use the '*' and '?' wildcard characters as substitutes for any other character or group of characters. The '*' wildcard character can be used for a group of characters. The '?' wildcard character can be used for a single character.
You cannot specify parts of the rootElement path with wildcard characters only. For example, "Feeds/*/Contents" is invalid.
You can specify root element value with any nesting level. Define the limits of the nesting level with a "/" character.
The root element parameter can be empty. If it is not empty, the value of the root element will not contain empty nesting levels (substring "//"), and will not start or end with a "/" character.
You cannot use wildcards in the root element for JSON feeds.
The following example demonstrates how to use the Parsing element for an XML input feed. In this case, parsing rules will be applied to elements nested inside the Feeds > Example > Contents element.
<Feed> ... <Parsing type="xml" rootElement="Feeds/Example/Contents"> ... </Parsing> ... <Feed> |
Individual parsing rules
Parsing rules for individual fields of an input feed must be nested inside the Parsing element. When Feed Utility processes the input feed, it creates the fields of the output feed according to these rules.
Each rule has the following format:
<%OUTPUT_NAME% type="%VALUE_TYPE%">%INPUT_NAME%</%OUTPUT_NAME%>
Above, the following rule name elements are used:
%OUTPUT_NAME% preserves nested fields. If a field specified in the %INPUT_NAME% is nested, the field in the output feed will also be nested. For example, if %OUTPUT_NAME% is MD5_HASH and %INPUT_NAME% is files/md5, the field in the output feed will be files/MD5_HASH.
For JSON input feed, %OUTPUT_NAME% must always use the Field value. Feed Utility uses the field names from the original feed.
These values will be handled by Feed Utility according to the specified type. For example, if the output feed contains domain names and URLs, then it will be compiled to the binary format.
Following value types are possible:
url—This value type is used for URLs.ip—This value type is used for IP addresses.md5—This value type is used for MD5 hashes.sha1—This value type is used for SHA1 hashes.sha256—This value type is used for SHA256 hashes.domain—This value type is used for domain names.context—This value type is used for context information./'.rootElement attribute of Parsing element. The path is case sensitive.Parsing element must contain no parsing rules.The following example demonstrates parsing rule syntax for JSON input format:
<Feed> ... <Parsing type="json"> <Field type="url">URL</Field> <Field type="ip">IP</Field> <Field type="context">GEO</Field> <Field type="md5">files/md5</Field> </Parsing> ... <Feed> |
The following example demonstrates parsing rule syntax for CSV input format:
<Feed> ... <Parsing type="csv" delimiter=";"> <URL type="url">1</URL> <IP type="ip">2</IP> <GEO type="context">3</GEO> <MD5 type="md5">4</MD5> </Parsing> ... <Feed> |
The following example demonstrates parsing rule syntax for XML input format:
<Feed> ... <Parsing type="xml" rootElement="Feeds/Example/Contents"> <URL type="url">url</URL> <IP type="ip">ip</IP> <GEO type="context">context</GEO> <MD5 type="md5">md5_hash</MD5> </Parsing> ... <Feed> |
Parsing rules for feeds of email type
To set the parsing rules for a third-party feed, specify the following values for the type attribute:
messageBody—Parsing rules for an email body.This value is applicable if POP3 or IMAP are enabled in the Path element.
messageAttach—Parsing rules for an email attachment. This value is applicable if POP3 or IMAP are enabled in the Path element.
Parsing the message body (for feeds of the email type)
Feed Utility parses the body of an email loaded from a mail server, if the messageBody value is set in the type attribute of the Parsing element.
For parsing the message body, the regular expressions specified in the Parsing element are used.
You can set one or several rules with regular expressions for message body parsing.
Each rule has the following form:
<%FIELD_NAME% type="%FIELD_TYPE%">%REG_EXP%</%FIELD_NAME%>,
Where:
%FIELD_NAME% defines the name of the field in the output feed. For example, if %FIELD_NAME% is MD5, the field with this value will also be named MD5 in the output feed.
%FIELD_TYPE% is an indicator type.
%REG_EXP% is a regular expression.
Each regular expression applies to the whole message body.
The feed is formed according to the content of loaded emails. Formation of the feed meets the following conditions:
Feed Utility stores the date of the feed's latest update. In the case of addressing the mail server, the parsing will be applied only to the emails received after the previous feed update.
message_from—Email address of the message sender.message_subject—Subject of the email.message_date—Date on which the mail server receives the email.type attribute value other than CONTEXT.type attribute of which has a CONTEXT value.The values are indicated in the resulting feed entries, which contain the indicators (IP/HASH/URL) from the same email.
If more than one value is obtained per one regular expression (with the type attribute having the CONTEXT value), these values are specified in one entry of the resulting feed. The values are separated by a sequence of ";" characters.
Excluded section (see the "Excluded element for PDF and email feeds" section below).Parsing message attachments (for feeds of the email type)
Feed Utility parses email attachments loaded from a mail server, if the messageAttach value is set in the type attribute of the Parsing element.
You can set one or several rules with types of attached files.
Each rule has the following form:
<Attach type="%ATTACH_TYPE%"></Attach>,
Where:
%ATTACH_TYPE% is an attachment type.
%ATTACH_TYPE% can have the following values:
csvjsonxmlstixstix2pdfThe Attach element has at least one value.
You can set one or several rules with regular expressions.
Each rule has the following form:
<%FIELD_NAME% type="%FIELD_TYPE%">%REG_EXP%</%FIELD_NAME%>,
Where:
%FIELD_NAME% defines the name of the field in the output feed. For example, if %FIELD_NAME% is MD5, the field with this value will also be named MD5 in the output feed.
%REG_EXP% is a regular expression.
%FIELD_TYPE% is an indicator type. For the %FIELD_TYPE% element, specify the type attribute by using the following values:
The following example demonstrates the parsing rule for a message attachment:
<Attach type="pdf"> <hash1 type="md5">([\da-fA-F]{32})</hash1> <hash2 type="sha1">([\da-fA-F]{40})</hash2> </Attach> |
Feed Utility parses files with the following extensions:
Value in the |
File extensions |
csv |
csv and txt |
json |
json |
xml |
xml |
stix1 |
xml |
stix2 |
json |
If parsing rules are set simultaneously for stix1 and xml (or stix2 and json), Feed Utility performs the following:
If an email has more than one attachment, the information from each attachment will be in one resulting feed.
The feed is formed according to the content of loaded emails. Formation of the feed meets the following conditions:
Feed Utility stores the date of the feed latest update. In case of addressing the mail server, the parsing will be applied only to the emails received after the previous feed update.
message_from—Email address of the message sender.message_subject—Subject of email.message_date—Date on which the mail server receives the email.attach_name—Name of attachment.type attribute value other than CONTEXT.type attribute of which has a CONTEXT value.The values are indicated in the resulting feed entries, which contain the indicators (IP/HASH/URL) from the same attachment.
If more than one value is obtained per one regular expression (with the type attribute having the CONTEXT value), these values are specified in one entry of the resulting feed. The values are separated by a sequence of ";" characters.
Excluded section (see the "Excluded element for PDF and email feeds" section below).Excluded element for PDF feeds and email feeds
If the pdf, messageBody, or messageAttach value is specified in the type attribute of the Parsing element, the Feed element can contain the Excluded section and have one or more nested <Item/> elements with indicator exclusion rules for the resulting feed.
The Excluded section has the following form:
<Excluded> <Item>{RegExp}</Item> ... </Excluded> |
Where {RegExp} is a regular expression.
The Excluded section and the Item elements are not obligatory.
The following example demonstrates exclusion rules:
<Excluded> <Item>(\w{3}\s+\d+\s+[\d\:]+)\s</Item> <Item>(https:\/\/badurl\.com)</Item> </Excluded> |