This section describes regular expressions and provides information about using them.
About regular expressions
Regular expressions are used to parse incoming events processed by normalization rules. They extract information to be checked in feeds and to be used in outgoing events.
The preset regular expressions correspond to the format of the events used in the verification test.
After the verification test is performed, you may have to add some new regular expressions or change existing ones for use with specific event source software. For examples of regular expressions to be used for parsing events issued by popular devices, see section "Regular expressions for popular devices".
We recommend that you set regular expressions for extracting data such as the IP address and port of the event source, and of the event target, user name, and date. Use these regular expressions to define the format of the outgoing events.
About regular expression names
You can use any name for a regular expression except the following ones:
SourceId
MatchedIndicator
RecordContext
Category
ActionableFields
Confidence
IndicatorInfo
Compound values
The concatenate
attribute is used to set a rule for creating a compound value from data extracted from an event. A rule refers to groups of extracted data by using #N
symbols, where N is the number of a group (starting from 1). If a backslash (\) precedes the hash symbol (#)
, the latter is not used in the number of a group; instead, the # is treated merely as a number sign.
The following example event is parsed:
url_1=http://domain test_event url_2=/page/mypage test
The regular expressions used and the results of parsing of the example event are provided in the table below.
Examples of applying regular expressions
Regular expression |
Result of parsing |
|
http://domain/page/mypage |
|
/page/mypagehttp://domain |
|
/page/mypage_/_http://domain |
If no concatenation rule is set or the value of the concatenate
attribute is empty, and the regular expression contains more than one group, the values of the groups are concatenated in the order in which they appear in the regular expression.
If the concatenate
attribute contains more groups than the regular expression contains, the extra groups will be ignored and will be substituted with the corresponding #N
text.
Event being parsed: url_1=http://domain test_event url_2=/page/my_page test Regular expression used: <RE_URL concatenate="#1#2#3">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL> Result of parsing: http://domain/page/my_page#3 |
Multiple matching
When parsing an event by using a regular expression, it is possible to extract all values that match the regular expression. For this purpose, set the value of the extract
attribute to "all"
. If this value is set to "first"
or the attribute is not specified, only the first value that matches the regular expression will be extracted.
For every matched value a separate detection event is generated. If the detection process does not affect a certain event field, the value of this field in the output event is set to a hyphen (-).
Event being parsed: ip1=12.12.12.12 ip2=23.23.23.23 hash1=abc hash2=cde user1=N1 user2=N2 Configuration file elements: <RegExps> <Source id="default"> <RE_IP extract="all">...</RE_IP> <RE_HASH extract="all">...</RE_HASH> <RE_USER extract="first">...</RE_USER> </Source> </RegExps> <EventFormat>ip=%RE_IP% hash=%RE_HASH% user=%RE_USER% %FeedContext%</EventFormat> Available feed records: IP = 12.12.12.12 IP = 23.23.23.23 hash = cde Detection events generated: ip=12.12.12.12 hash=- user= N1 <context for 12.12.12.12> ip=23.23.23.23 hash=- user= N1 <context for 23.23.23.23> ip=- hash=cde user=N1 <context for cde> |
Specifying characters by their hexadecimal code
Kaspersky CyberTrace Service uses regular expressions that conform to PCRE syntax. This syntax allows specifying a character by its code in several ways.
Kaspersky CyberTrace Service does not support specifying a character in \x{hhh..}
format. Instead, specify a character by its code in the following way: \uhhhh
, where hhhh
is the hexadecimal code of the character. For example, you cannot use a ([\x{00a1}-\x{ffff}])
expression, but you can use a ([\u00a1-\uffff])
expression.
Optimization of regular expressions
You can optimize regular expressions to prevent backtracking that interfere with matching a string.
To optimize regular expressions, use the following rules:
(++, *+)
.(?:)
with outer brackets.(^, $)
that match the starting and the ending position within the string.(?> ...)
.(qwerty.*)*
is not recommended.