About regular expressions

February 27, 2024

ID 171632

This section describes regular expressions and provides information about using them.

About regular expressions

Regular expressions are used to parse incoming events processed by normalization rules. They extract information to be checked in feeds and to be used in outgoing events.

The preset regular expressions correspond to the format of the events used in the verification test.

After the verification test is performed, you may have to add some new regular expressions or change existing ones for use with specific event source software. For examples of regular expressions to be used for parsing events issued by popular devices, see section "Regular expressions for popular devices".

We recommend that you set regular expressions for extracting data such as the IP address and port of the event source, and of the event target, user name, and date. Use these regular expressions to define the format of the outgoing events.

About regular expression names

You can use any name for a regular expression except the following ones:

  • SourceId
  • MatchedIndicator
  • RecordContext
  • Category
  • ActionableFields
  • Confidence
  • IndicatorInfo

Compound values

The concatenate attribute is used to set a rule for creating a compound value from data extracted from an event. A rule refers to groups of extracted data by using #N symbols, where N is the number of a group (starting from 1). If a backslash (\) precedes the hash symbol (#), the latter is not used in the number of a group; instead, the # is treated merely as a number sign.

The following example event is parsed:

url_1=http://domain test_event url_2=/page/mypage test

The regular expressions used and the results of parsing of the example event are provided in the table below.

Examples of applying regular expressions

Regular expression

Result of parsing

<RE_URL concatenate="#1#2">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

http://domain/page/mypage

<RE_URL concatenate="#2#1">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

/page/mypagehttp://domain

<RE_URL concatenate="#2_/_#1">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

/page/mypage_/_http://domain

If no concatenation rule is set or the value of the concatenate attribute is empty, and the regular expression contains more than one group, the values of the groups are concatenated in the order in which they appear in the regular expression.

If the concatenate attribute contains more groups than the regular expression contains, the extra groups will be ignored and will be substituted with the corresponding #N text.

Event being parsed:

url_1=http://domain test_event url_2=/page/my_page test

Regular expression used:

<RE_URL concatenate="#1#2#3">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

Result of parsing:

http://domain/page/my_page#3

Multiple matching

When parsing an event by using a regular expression, it is possible to extract all values that match the regular expression. For this purpose, set the value of the extract attribute to "all". If this value is set to "first" or the attribute is not specified, only the first value that matches the regular expression will be extracted.

For every matched value a separate detection event is generated. If the detection process does not affect a certain event field, the value of this field in the output event is set to a hyphen (-).

Event being parsed:

ip1=12.12.12.12 ip2=23.23.23.23 hash1=abc hash2=cde user1=N1 user2=N2

Configuration file elements:

<RegExps>

<Source id="default">

<RE_IP extract="all">...</RE_IP>

<RE_HASH extract="all">...</RE_HASH>

<RE_USER extract="first">...</RE_USER>

</Source>

</RegExps>

<EventFormat>ip=%RE_IP% hash=%RE_HASH% user=%RE_USER% %FeedContext%</EventFormat>

Available feed records:

IP = 12.12.12.12

IP = 23.23.23.23

hash = cde

Detection events generated:

ip=12.12.12.12 hash=- user= N1 <context for 12.12.12.12>

ip=23.23.23.23 hash=- user= N1 <context for 23.23.23.23>

ip=- hash=cde user=N1 <context for cde>

Specifying characters by their hexadecimal code

Kaspersky CyberTrace Service uses regular expressions that conform to PCRE syntax. This syntax allows specifying a character by its code in several ways.

Kaspersky CyberTrace Service does not support specifying a character in \x{hhh..} format. Instead, specify a character by its code in the following way: \uhhhh, where hhhh is the hexadecimal code of the character. For example, you cannot use a ([\x{00a1}-\x{ffff}]) expression, but you can use a ([\u00a1-\uffff]) expression.

Optimization of regular expressions

You can optimize regular expressions to prevent backtracking that interfere with matching a string.

To optimize regular expressions, use the following rules:

  • Use possessive quantifiers (++, *+).
  • If possible, use non-matching group (?:) with outer brackets.
  • Try to use alternation as little as possible and find matches at the end of the string. The alternation operator has the lowest precedence of all regular expression operators.
  • Use the anchors (^, $) that match the starting and the ending position within the string.
  • Use an atomic group. Atomic groups automatically discard all backtracking positions remembered by any tokens inside the group. The syntax is (?> ...).
  • In long regular expressions, try to avoid exponentially increasing the amount of backtracking. An example such as (qwerty.*)* is not recommended.

Did you find this article helpful?
What can we do better?
Thank you for your feedback! You're helping us improve.
Thank you for your feedback! You're helping us improve.