Parsing rules

Parsing rules are rules for custom feeds (feeds that are specified using the Path element). These parameters specify how each feed must be parsed by Feed Utility.

Parsing rules are defined in the Parsing element of feed rules for a custom feed.

The following is an example of parsing rules for a custom feed. These rules specify that the input feed is in JSON format. An MD5 parsing rule is defined for the files/md5 field in the input feed. Values in this field will be parsed as MD5 hashes.

<Feed>

...

<MD5 type="MD5">files/md5</MD5>

</Parsing>

...

<Feed>

Parsing element

The parent element, Parsing contains all nested parsing rules. Its attributes define the input format.

This element has the following attributes:

type
Specifies the input feed type.

This attribute can have the following values: json, stix, csv, xml.
delimiter
Specifies the delimiter for CSV input feeds. By default, this value is ';'.
rootElement
Specifies the root element path for XML input feeds.

You can use the '*' and '?' wildcard characters as substitutes for any other character or group of characters. The '*' wildcard character can be used for a group of characters. The '?' wildcard character can be used for a single character.

You cannot specify parts of the rootElement path with wildcard symbols only. For example, "Feeds/*/Contents" is invalid.

The following example demonstrates how to use the Parsing element for an XML input feed. In this case, parsing rules will be applied to elements nested inside the Feeds > Example > Contents element.

<Feed>

...

...

</Parsing>

...

<Feed>

Individual parsing rules

Parsing rules for individual fields of an input feed must be nested inside the Parsing element. When Feed Utility processes the input feed, it creates the fields of the output feed according to these rules.

Each rule has the following format:

<OUTPUT_NAME type="VALUE_TYPE">INPUT_NAME</OUTPUT_NAME>

Above, the following rule name elements are used:

OUTPUT_NAME defines the name of the field in the output feed. For example, if OUTPUT_NAME is MD5, the field with this value will also be named MD5 in the output feed.
OUTPUT_NAME preserves nested fields. If a field specified in the INPUT_NAME is nested, the field in the output feed will also be nested. For example, if OUTPUT_NAME is MD5_HASH and INPUT_NAME is files/md5, the field in the output feed will be files/MD5_HASH.

For JSON input feeds, OUTPUT_NAME must always use the Field value. Feed Utility uses the field names from the original feed.
VALUE_TYPE is the type of the values stored in this field.
These values will be handled by Feed Utility according to the specified type. For example, if the output feed contains domain names and URLs, then it will be compiled to the binary format.

Following value types are possible:
- url—This value type is used for URLs.
- ip—This value type is used for IP addresses.
- md5—This value type is used for MD5 hashes.
- sha1—This value type is used for SHA1 hashes.
- sha256—This value type is used for SHA256 hashes.
- domain—This value type is used for domain names.
- context—This value type is used for context information.
INPUT_NAME is the name of the field in the input feed. It must be defined according to the input feed format:
- For JSON input feeds, INPUT_NAME must contain the name of the field from the input feed. Nested fields must be delimited by '/' .
- For CSV input feeds, INPUT_NAME must contain the column number.
- For XML input feeds, INPUT_NAME must contain a path to one of the nested elements of the root element. Root element is defined in the rootElement attribute of Parsing element. The path is case sensitive.
- For STIX input feeds, Parsing element must contain no parsing rules.

The following example demonstrates parsing rule syntax for JSON input format:

<Feed>

...

<Field type="md5">files/md5</Field>

</Parsing>

...

<Feed>

The following example demonstrates parsing rule syntax for CSV input format:

<Feed>

...

</Parsing>

...

<Feed>

The following example demonstrates parsing rule syntax for XML input format:

<Feed>

...

<GEO type="context">context</GEO>

</Parsing>

...

<Feed>

Page top