Parsing rules are rules for custom feeds (feeds that are specified using the Path
element). These parameters specify how each feed must be parsed by Feed Utility.
Parsing rules are defined in the Parsing
element of feed rules for a custom feed.
The following is an example of parsing rules for a custom feed. These rules specify that the input feed is in JSON format. An MD5 parsing rule is defined for the files/md5
field in the input feed. Values in this field will be parsed as MD5 hashes.
<Feed> ... <Parsing type="json"> <MD5 type="MD5">files/md5</MD5> </Parsing> ... <Feed> |
Parsing element
The parent element, Parsing
contains all nested parsing rules. Its attributes define the input format.
This element has the following attributes:
type
Specifies the input feed type.
This attribute can have the following values: json
, stix
, csv
, xml
.
delimiter
Specifies the delimiter for CSV input feeds. By default, this value is ';
'.
rootElement
Specifies the root element path for XML input feeds.
You can use the '*
' and '?
' wildcard characters as substitutes for any other character or group of characters. The '*' wildcard character can be used for a group of characters. The '?
' wildcard character can be used for a single character.
You cannot specify parts of the rootElement
path with wildcard symbols only. For example, "Feeds/*/Contents"
is invalid.
The following example demonstrates how to use the Parsing
element for an XML input feed. In this case, parsing rules will be applied to elements nested inside the Feeds > Example > Contents
element.
<Feed> ... <Parsing type="xml" rootElement="Feeds/Example/Contents"> ... </Parsing> ... <Feed> |
Individual parsing rules
Parsing rules for individual fields of an input feed must be nested inside the Parsing
element. When Feed Utility processes the input feed, it creates the fields of the output feed according to these rules.
Each rule has the following format:
<OUTPUT_NAME type="VALUE_TYPE">INPUT_NAME</OUTPUT_NAME>
Above, the following rule name elements are used:
OUTPUT_NAME preserves nested fields. If a field specified in the INPUT_NAME is nested, the field in the output feed will also be nested. For example, if OUTPUT_NAME is MD5_HASH and INPUT_NAME is files/md5, the field in the output feed will be files/MD5_HASH.
For JSON input feeds, OUTPUT_NAME must always use the Field
value. Feed Utility uses the field names from the original feed.
These values will be handled by Feed Utility according to the specified type. For example, if the output feed contains domain names and URLs, then it will be compiled to the binary format.
Following value types are possible:
url—
This value type is used for URLs.ip—
This value type is used for IP addresses.md5—
This value type is used for MD5 hashes.sha1—
This value type is used for SHA1 hashes.sha256—
This value type is used for SHA256 hashes.domain—
This value type is used for domain names.context—
This value type is used for context information./
' .rootElement
attribute of Parsing
element. The path is case sensitive.Parsing
element must contain no parsing rules.The following example demonstrates parsing rule syntax for JSON input format:
<Feed> ... <Parsing type="json"> <Field type="url">URL</Field> <Field type="ip">IP</Field> <Field type="context">GEO</Field> <Field type="md5">files/md5</Field> </Parsing> ... <Feed> |
The following example demonstrates parsing rule syntax for CSV input format:
<Feed> ... <Parsing type="csv" delimiter=";"> <URL type="url">1</URL> <IP type="ip">2</IP> <GEO type="context">3</GEO> <MD5 type="md5">4</MD5> </Parsing> ... <Feed> |
The following example demonstrates parsing rule syntax for XML input format:
<Feed> ... <Parsing type="xml" rootElement="Feeds/Example/Contents"> <URL type="url">url</URL> <IP type="ip">ip</IP> <GEO type="context">context</GEO> <MD5 type="md5">md5_hash</MD5> </Parsing> ... <Feed> |