About regular expressions

Regular expressions are used to parse incoming events processed by normalizing rules; regular expressions extract information to be checked in feeds and to be used in outgoing events. Regular expressions are defined in elements nested in the InputSettings > RegExps > Source elements of the Feed Service configuration file; normalizing rules are also defined in the InputSettings > RegExps > Source > NormalizingRules elements.

The original Feed Service configuration file of the distribution kit contains regular expressions that correspond to the format of the events used in the verification test.

After the verification test is performed, you may have to add some new regular expressions or change existing ones for use with specific event source software. For examples of regular expressons to be used for parsing events issued by popular devices, see section "Regular expressions for popular devices".

We recommend that you set regular expressions for extracting such data as the IP address and port of the event source and of the event target, the user name, and the date. Use these regular expressions to define the format of the outgoing events (OutputSettings > EventFormat).

When you add, remove, or rename regular expressions, make sure that you also change the name of the regular expression in the input_regexp_to_match attribute of the Feeds > Feed > Field elements. You may also have to change OutputSettings > EventFormat, which might use the names of the regular expressions, and the configuration files of the event target software.

Defining regular expressions

You can use any name for a regular expression except the following ones:

Regular expression parameters

In the configuration file you can set parameters for regular expressions. The parameters are set in attributes of XML elements. Two attributes are supported: concatenate and extract.

Compound values

The concatenate attribute is used to set a rule for creating a compound value from data extracted from an event. A rule refers to groups of extracted data by means of #N symbols, where N is the number of a group (starting from 1). If a backslash (\) precedes the hash symbol (#), the latter is not used in the number of a group; instead, the # is treated merely as a number sign.

The following example event is parsed:

url_1=http://domain test_event url_2=/page/mypage test

The regular expressions used and the results of parsing of the example event are provided in the table below.

Examples of applying regular expressions

Regular expression

Result of parsing

<RE_URL concatenate="#1#2">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

http://domain/page/mypage

<RE_URL concatenate="#2#1">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

/page/mypagehttp://domain

<RE_URL concatenate="#2_/_#1">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

/page/mypage_/_http://domain

If no concatenation rule is set or the value of the concatenate attribute is empty, and the regular expression contains more than one group, the values of the groups are concatenated in the order in which they appear in the regular expression.

If the concatenate attribute contains more groups than the regular expression contains, the extra groups will be ignored and will be substituted with the corresponding #N text.

Event being parsed:

url_1=http://domain test_event url_2=/page/my_page test

Regular expression used:

<RE_URL concatenate="#1#2#3">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

Result of parsing:

http://domain/page/my_page#3

Multiple matching

When parsing an event by using a regular expression, it is possible to extract all values that match the regular expression. For this purpose, set the value of the extract attribute to "all". If this value is set to "first" or the attribute is not specified, only the first value that matches the regular expression will be extracted.

For every matched value a separate detection event is generated. If the detection process does not affect a certain event field, the value of this field in the output event is set to a hyphen (-).

Event being parsed:

ip1=12.12.12.12 ip2=23.23.23.23 hash1=abc hash2=cde user1=N1 user2=N2

Configuration file elements:

<RegExps>

<Source id="default">

<RE_IP extract="all">...</RE_IP>

<RE_HASH extract="all">...</RE_HASH>

<RE_USER extract="first">...</RE_USER>

</Source>

</RegExps>

<EventFormat>ip=%RE_IP% hash=%RE_HASH% user=%RE_USER% %FeedContext%</EventFormat>

Available feed records:

IP = 12.12.12.12

IP = 23.23.23.23

hash = cde

Detection events generated:

ip=12.12.12.12 hash=- user= N1 <context for 12.12.12.12>

ip=23.23.23.23 hash=- user= N1 <context for 23.23.23.23>

ip=- hash=cde user=N1 <context for cde>

Specifying characters by their hexadecimal code

Feed Service uses regular expressions that conform to PCRE syntax. This syntax allows specifying a character by its code in several ways.

Feed Service does not support specifying a character in \x{hhh..} format. Instead, specify a character by its code in the following way: \uhhhh, where hhhh is the hexadecimal code of the character. For example, you cannot use a ([\x{00a1}-\x{ffff}]) expression, but you can use a ([\u00a1-\uffff]) expression.

Optimization of regular expressions

You can optimize regular expressions to prevent backtracking that interfere with matching a string.

To optimize regular expressions, use the following rules:

Page top