Filtering rules are criteria that Feed Utility uses to filter the original feed files.
Filtering rules are specified for each feed in a Filters
element. Each filtering rule is set in a Field
element: the field name is specified in the name
attribute and the filtering criteria are specified in the value
attribute. A field can have only one filtering rule associated with it; you cannot have two Field
parameters for one field.
The following is an example of filtering rules for a feed. These rules specify that the output feed must include only records that have the popularity
field equal to 4
or 5
and the mask
field containing .ru
or .com
.
<Feed> ... <Filters> <Field name="popularity" value="4;5"/> <Field name="mask" value=".ru;.com"/> </Filters> ... <Feed> |
Feed Utility ignores leading and terminating space symbols, or tab symbols in the value of the "value"
attribute.
Only those records that match all the specified criteria are included in the output file. If a filtering criterion is specified for a field, and the field is missing from a record, Feed Utility will not include this record in the output file.
Defining filtering criteria for numeric values
Numeric values are integers. Decimal values are not supported.
You can define filtering criteria for numeric fields in the following ways:
value="*"
A field can have any value.
For example, <Field name="type" value="*"/>
means that the type
field can have any value.
value="%value%"
Exact numeric value. A field must be equal to %value%
.
For example, <Field name="popularity" value="1"/>
means that the popularity
field must be equal to 1
.
value="%value1%;%value2%"
One of several numeric values. A field can have one of the specified numeric values (%value1%
or %value2%
).
You can specify additional values using ";"
as a delimiter.
For example, <Field name="popularity" value="1;3"/>
means that the popularity
field must be equal to 1
or 3
, but not 2
.
value="[%value1%;%value2%]"
Range of numeric values.
A field can have one of the values in the specified range between %value1%
and %value2%
.
For example, <Field name="popularity" value="[1;3]"/>
means that the popularity
field must have a value from 1
to 3
, including 2
.
value="[%value1%;*]"
or value="[*;%value1%]"
Open range of numeric values. Same as range, but an asterisk (*
) specifies infinity.
For example, <Field name="popularity" value="[2;*]"/>
means that the value of the popularity
field must be greater than or equal to 2
.
Defining filtering criteria for strings
You can define filtering criteria for string fields in the following ways:
value="*"
A field can have any value.
For example, <Field name="mask" value="*"/>
means that the mask
field can have any value.
%string%
"A field must contain the specified string.
For example, <Field name="geo" value="ru"/>
means that the value of the geo
field must contain "ru"
.
value="%string1%;%string2%"
Contains one or more of the specified strings.
For example, <Field name="geo" value="ru;us"/>
means that the value of the geo
field must contain "ru"
or "us"
, or both "ru"
and "us"
.
Defining filtering criteria for dates
Date values in feeds are formatted either in the pattern "dd.MM.yyyy HH:mm"
(for example, "26.04.2014 18:00"
) or in the pattern "M/d/y h:mm:ss tt"
(for example, "4/26/2014 6:00:00 PM"
). The "M/d/y h:mm:ss tt"
pattern is used in P-SMS Trojan Data Feed.
Only the date part is used in filtering; hours and minutes are ignored.
You can define filtering criteria for fields with dates in the following ways:
value="*"
A field can have any value.
For example, <Field name="last_seen" value="*"/>
means that the last_seen
field can have any value.
value="%date%"
A field must contain the specified date.
For example, <Field name="first_seen" value="14.10.2015"/>
means that the first_seen
field value must be 14 October 2015.
value="[%date1%;%date2%]"
A field must contain the date in the specified range.
For example, <Field name="first_seen" value="[01.02.2013;01.02.2015]"/>
means the first_seen
field value must be from 1 February 2013 to 1 February 2015.
value="[%date1%;*]"
or value="[*;%date1%]"
Open range of dates. Same as range of dates, that is, value="[%date1%;%date2%]"
. But an asterisk (*
) specifies infinity.
For example, <Field name="first_seen" value="[*;10.12.2015]"/>
means that the first_seen
field value must be on or before 10 December 2015.
Excluding records with missing fields
In the original feed files, some records can have extra fields or can lack some fields. For records with extra fields, Feed Utility includes only those fields that are specified in the RequiredFields
element of feed rules for a specified feed. For records that lack some fields, Feed Utility includes such records in the output if they contain at least one of the fields specified in the RequiredFields
element. If some fields specified in the RequiredFields
element are missing from a record in the original feed, the record in the processed feed will not contain them.
If you want to exclude records with missing fields from the output, you must create filtering rules for all required fields.
In the following example, Feed Utility will include records that have popularity
, or mask
, or both popularity
and mask
, fields.
<RequiredFields>popularity;mask</RequiredFields> |
If you want Feed Utility to include only those records that have both popularity
and mask
, create a filtering rule for both fields. You can specify criteria for field values, or use an asterisk (*)
to specify any value.
In the following example, only records that have both fields (mask
and popularity
) are included in the resulting feed.
<Filters> <Field name="popularity" value="*"/> <Field name="mask" value="*"/> </Filters> <RequiredFields>popularity;mask</RequiredFields> |
You can specify exact criteria, in the same manner. The following example instructs Feed Utility to include only records that have the popularity
field with a value of 5
and the mask
field with any value.
<Filters> <Field name="popularity" value="5"/> <Field name="mask" value="*"/> </Filters> <RequiredFields>popularity;mask</RequiredFields> |