Contents
Regex security model
The Regex security model lets you implement text data validation based on statically defined regular expressions.
A PSL file containing a description of the Regex security model is located in the KasperskyOS SDK at the following path:
toolchain/include/nk/regex.psl
Regex security model object
The regex.psl
file contains a declaration that creates a Regex security model object named re
. Consequently, inclusion of the regex.psl
file into the solution security policy description will create a Regex security model object by default.
A Regex security model object does not have any parameters.
A Regex security model object can be covered by a security audit. In this case, you also need to define the audit conditions specific to the Regex security model. To do so, use the following constructs in the audit configuration description:
emit : ["match"]
– the audit is performed if thematch
method is called.emit : ["select"]
– the audit is performed if theselect
method is called.emit : ["match", "select"]
– the audit is performed if thematch
orselect
method is called.emit : []
– the audit is not performed.
It is necessary to create additional objects of the Regex security model in the following cases:
- You need to configure a security audit differently for different objects of the Regex security model (for example, you can apply different audit profiles or different audit configurations of the same profile for different objects).
- You need to distinguish between calls of methods provided by different objects of the Regex security model (audit data includes the name of the security model method and the name of the object that provides this method, so you can verify that the method of a specific object was called).
Regex security model methods
The Regex
security model contains the following expressions:
match {text :
<Text
>, pattern :
<Text
>}
Returns a value of the
Boolean
type. If the specifiedtext
matches thepattern
regular expression, it returnstrue
. Otherwise it returnsfalse
.Example:
assert (re.match {text : message.text, pattern : "[0-9]*"})
select {text :
<Text
>}
It is intended to be used as an expression that verifies fulfillment of the conditions in the
choice
construct (for details on thechoice
construct, see "Binding methods of security models to security events"). It checks whether the specifiedtext
matches regular expressions. Depending on the results of this check, various options for security event handling can be performed.Example:
choice (re.select {text : "hello world"}) {
"hello\ .*": grant ()
".*world" : grant ()
_ : deny ()
}
Syntax of regular expressions of the Regex security model
A regular expression for the match
method of the Regex security model can be written in two ways: within the multi-line regex
block or as a text literal.
When writing a regular expression as a text literal, all backslash instances must be doubled.
For example, the following two regular expressions are identical:
// Regular expression within the multi-line regex block
{ pattern:
```regex
Hello\ world\!
```
, text: "Hello world!"
}
// Regular expression as a text literal (doubled backslash)
{ pattern: "Hello\\ world\\!"
, text: "Hello world!"
}
Regular expressions for the select
method of the Regex security model are written as text literals with a double backslash.
A regular expression is defined as a template string and may contain the following:
- Literals (ordinary characters)
- Metacharacters (characters with special meanings)
- White-space characters
- Character sets
- Character groups
- Operators for working with characters
Regular expressions are case sensitive.
Literals and metacharacters in regular expressions
- A literal can be any ASCII character except the metacharacters
.()*&|!?+[]\
and a white-space character. (Unicode characters are not supported.)For example, the regular expression
KasperskyOS
corresponds to the textKasperskyOS
. - Metacharacters have special meanings that are presented in the table below.
Special meanings of metacharacters
Metacharacter
Special meaning
[]
Square brackets (braces) denote the beginning and end of a set of characters.
()
Round brackets (parentheses) denote the beginning and end of a group of characters.
*
An asterisk denotes an operator indicating that the character preceding it can repeat zero or more times.
+
A plus sign denotes an operator indicating that the character preceding it can repeat one or more times.
?
A question mark denotes an operator indicating that the character preceding it can repeat zero or one time.
!
An exclamation mark denotes an operator excluding the subsequent character from the list of valid characters.
|
A vertical line denotes an operator for selection between characters (logically close to the "OR" conjunction).
&
An ampersand denotes an operator for overlapping of multiple conditions (logically close to the "AND" conjunction).
.
A dot denotes any character.
For example, the regular expression
K.S
corresponds to the sequences of charactersKOS
,KoS
,KES
and a multitude of other sequences consisting of three characters that begin withK
and end withS
, and in which the second character can be any character: literal, metacharacter, or dot.\
\
<metaSymbol
>A backslash indicates that the metacharacter that follows it will lose its special meaning and instead be interpreted as a literal. A backslash placed before a metacharacter is known as an escape character.
For example, a regular expression that consists of a dot metacharacter (
.
) corresponds to any character. However, a regular expression that consists of a backslash with a dot (\.
) corresponds to only a dot character.Accordingly, a backslash also escapes another subsequent backslash. For example, the regular expression
C:\\Users
corresponds to the sequence of charactersC:\Users
. - The
^
and$
characters are not used to designate the start and end of a line.
White-space characters in regular expressions
- A space character has an ASCII code of
20
in a hexadecimal number system and has an ASCII code of40
in an octal number system. Although a space character does not infer any special meaning, it must be escaped to avoid any ambiguous interpretation by the regular expression interpreter.For example, the regular expression
Hello\ world
corresponds to the sequence of charactersHello world
. \r
Carriage return character.
\n
Line break character.
\t
Horizontal tab character.
Definition of a character based on its octal or hexadecimal code in regular expressions
\x{
<hex
>}
Definition of a character using its
hex
code from the ASCII character table. The character code must be less than0x100
.For example, the regular expression
Hello\x{20}world
corresponds to the sequence of charactersHello world
.\o{
<octal
>}
Definition of a character using its
octal
code from the ASCII character table. The character code must be less than0o400
.For example, the regular expression
\o{75}
corresponds to the=
character.
Sets of characters in regular expressions
A character set is defined within square brackets []
as a list or range of characters. A character set tells the regular expression interpreter that only one of the characters listed in the set or range of characters can be at this specific location in a sequence of characters. A character set cannot be left blank.
[
<BracketSpec
>]
– character set.One character corresponds to any character from the
BracketSpec
character set.For example, the regular expression
K[OE]S
corresponds to the sequences of charactersKOS
andKES
.[^
<BracketSpec
>]
– inverted character set.One character corresponds to any character that is not in the
BracketSpec
character set.For example, the regular expression
K[^OE]S
corresponds to the sequences of charactersKAS
,K8S
and any other sequences consisting of three characters that begin withK
and end withS
, excludingKOS
andKES
.
The BracketSpec
character set can be listed explicitly or can be defined as a range of characters. When defining a range of characters, the first and last character in the set must be separated with a hyphen.
[
<Digit1
>-
<DigitN
>]
Any number from the range
Digit1
,Digit2
, ... ,DigitN
.For example, the regular expression
[0-9]
corresponds to any numerical digit. The regular expressions[0-9]
and[0123456789]
are identical.Please note that a range is defined by one character before a hyphen and one character after the hyphen. The regular expression
[1-35]
corresponds only to the characters1
,2
,3
and5
, and does not represent the range of numbers from1
to35
.[
<Letter1
>-
<LetterN
>]
Any English letter from the range
Letter1
,Letter2
, ... ,LetterN
(these letters must be in the same case).For example, the regular expression
[a-zA-Z]
corresponds to all letters in uppercase and lowercase from the ASCII character table.
The ASCII code for the upper boundary character of a range must be higher than the ASCII code for the lower boundary character of the range.
For example, the regular expressions [5-2]
or [z-a]
are invalid.
The hyphen (minus) -
character is interpreted as a special character only within a set of characters. Outside of a character set, a hyphen is a literal. For this reason, the \
metacharacter does not have to precede a hyphen. To use a hyphen as a literal within a character set, it must be indicated first or last in the set.
Examples:
The regular expressions [-az]
and [az-]
correspond to the characters a
, z
and -
.
The regular expression [a-z]
corresponds to any of the 26 English letters from a
to z
in lowercase.
The regular expression [-a-z]
corresponds to any of the 26 English letters from a
to z
in lowercase and -
.
The circumflex (caret character) ^
is interpreted as a special character only within a character set when it is located directly after an opening square bracket. Outside of a character set, a circumflex is a literal. For this reason, the \
metacharacter does not have to precede a circumflex. To use a circumflex as a literal within a character set, it must be indicated in a location other than first in the set.
Examples:
The regular expression [0^9]
correspond to the characters 0
, 9
and ^
.
The regular expression [^09]
corresponds to any character except 0
and 9
.
Within a character set, the metacharacters *.&|!?+
lose their special meaning and are instead interpreted as literals. Therefore, they do not have to be preceded by the \
metacharacter. The backslash \
retains its special meaning within a character set.
For example, the regular expressions [a.]
and [a\.]
are identical and correspond to the character a
and a dot interpreted as a literal.
Groups of characters and operators in regular expressions
A character group uses parentheses ()
to distinguish its portion (subexpression) within a regular expression. Groups are normally used to allocate subexpressions as operands. Groups can be embedded into each other.
Operators are applied to more than one character in a regular expression only if they are immediately before or after the definition of a set or group of characters. If this is the case, the operator is applied to the entire group or set of characters.
The syntax contains definitions of the following operators (listed in descending order of their priority):
!
<Expression
>, whereExpression
can be a character, set or group of characters.This operator excludes the
Expression
from the list of valid expressions.Examples:
The regular expression
K!OS
corresponds to the sequences of charactersKoS
,KES
, and a multitude of other sequences that consist of three characters and begin withK
and end withS
, excludingKOS
.The regular expression
K!(OS)
corresponds to the sequences of charactersKos
,KES
,KOT
, and a multitude of other sequences that consist of three characters and begin withK
, excludingKOS
.The regular expression
K![OE]S
corresponds to the sequences of charactersKoS
,KeS
,K;S
, and a multitude of other sequences that consist of three characters and begin withK
and end withS
, excludingKOS
andKES
.- <
Expression
>*
, whereExpression
can be a character, set or group of characters.This operator means that the
Expression
may occur in the specific position zero or more times.Examples:
The regular expression
0-9*
corresponds to the sequences of characters0-
,0-9
,0-99
, ... .The regular expression
(0-9)*
corresponds to the empty sequence""
and the sequences of characters0-9
,0-90-9
, ... .The regular expression
[0-9]*
corresponds to the empty sequence""
and any non-empty sequence of numbers. - <
Expression
>+
, whereExpression
can be a character, set or group of characters.This operator means that the
Expression
may occur in the specific position one or more times.Examples:
The regular expression
0-9+
corresponds to the sequences of characters0-9
,0-99
,0-999
, ... .The regular expression
(0-9)+
corresponds to the sequences of characters0-9
,0-90-9
, ... .The regular expression
[0-9]+
corresponds to any non-empty sequence of numbers. - <
Expression
>?
, whereExpression
can be a character, set or group of characters.This operator means that the
Expression
may occur in the specific position zero or one time.Examples:
The regular expression
https?://
corresponds to the sequences of charactershttp://
andhttps://
.The regular expression
K(aspersky)?OS
corresponds to the sequences of charactersKOS
andKasperskyOS
. - <
Expression1
><Expression2
> – concatenation.Expression1
andExpression2
can be characters, sets or groups of characters.This operator does not have a specific designation. In the resulting expression,
Expression2
followsExpression1
.For example, concatenation of the sequences of characters
micro
andkernel
will result in the sequence of charactersmicrokernel
. - <
Expression1
>|
<Expression2
> – disjunction.Expression1
andExpression2
can be characters, sets or groups of characters.This operator selects either
Expression1
orExpression2
.Examples:
The regular expression
KO|ES
corresponds to the sequences of charactersKO
andES
, but notKOS
orKES
because the concatenation operator has a higher priority than the disjunction operator.The regular expression
Press (OK|Cancel)
corresponds to the sequences of charactersPress OK
orPress Cancel
.The regular expression
[0-9]|()
corresponds to numbers from0
to9
or an empty string. - <
Expression1
>&
<Expression2
> – conjunction.Expression1
andExpression2
can be characters, sets or groups of characters.This operator intersects the result of
Expression1
with the result ofExpression2
.Examples:
The regular expression
[0-9]&[^3]
corresponds to numbers from0
to9
, excluding3
.The regular expression
[a-zA-Z]&()
corresponds to all English letters and an empty string.