The Regex security model lets you implement text data validation based on statically defined regular expressions.
A PSL file containing a description of the Regex security model is located in the KasperskyOS SDK at the following path:
toolchain/include/nk/regex.psl
Regex security model object
The regex.psl
file contains a declaration that creates a Regex security model object named re
. Consequently, inclusion of the regex.psl
file into the solution security policy description will create a Regex security model object by default.
A Regex security model object does not have any parameters.
A Regex security model object can be covered by a security audit. In this case, you also need to define the audit conditions specific to the Regex security model. To do so, use the following constructs in the audit configuration description:
emit : ["match"]
– the audit is performed if the match
method is called.emit : ["select"]
– the audit is performed if the select
method is called.emit : ["match", "select"]
– the audit is performed if the match
or select
method is called.emit : []
– the audit is not performed.It is necessary to create additional objects of the Regex security model in the following cases:
Regex security model methods
The Regex
security model contains the following expressions:
match {text :
<Text
>, pattern :
<Text
>}
Returns a value of the Boolean
type. If the specified text
matches the pattern
regular expression, it returns true
. Otherwise it returns false
.
Example:
assert (re.match {text : message.text, pattern : "[0-9]*"})
select {text :
<Text
>}
It is intended to be used as an expression that verifies fulfillment of the conditions in the choice
construct (for details on the choice
construct, see "Binding methods of security models to security events"). It checks whether the specified text
matches regular expressions. Depending on the results of this check, various options for security event handling can be performed.
Example:
choice (re.select {text : "hello world"}) {
"hello\ .*": grant ()
".*world" : grant ()
_ : deny ()
}
Syntax of regular expressions of the Regex security model
A regular expression for the match
method of the Regex security model can be written in two ways: within the multi-line regex
block or as a text literal.
When writing a regular expression as a text literal, all backslash instances must be doubled.
For example, the following two regular expressions are identical:
// Regular expression within the multi-line regex block
{ pattern:
```regex
Hello\ world\!
```
, text: "Hello world!"
}
// Regular expression as a text literal (doubled backslash)
{ pattern: "Hello\\ world\\!"
, text: "Hello world!"
}
Regular expressions for the select
method of the Regex security model are written as text literals with a double backslash.
A regular expression is defined as a template string and may contain the following:
Regular expressions are case sensitive.
Literals and metacharacters in regular expressions
.()*&|!?+[]\
and a white-space character. (Unicode characters are not supported.)For example, the regular expression KasperskyOS
corresponds to the text KasperskyOS
.
Special meanings of metacharacters
Metacharacter |
Special meaning |
---|---|
|
Square brackets (braces) denote the beginning and end of a set of characters. |
|
Round brackets (parentheses) denote the beginning and end of a group of characters. |
|
An asterisk denotes an operator indicating that the character preceding it can repeat zero or more times. |
|
A plus sign denotes an operator indicating that the character preceding it can repeat one or more times. |
|
A question mark denotes an operator indicating that the character preceding it can repeat zero or one time. |
|
An exclamation mark denotes an operator excluding the subsequent character from the list of valid characters. |
|
A vertical line denotes an operator for selection between characters (logically close to the "OR" conjunction). |
|
An ampersand denotes an operator for overlapping of multiple conditions (logically close to the "AND" conjunction). |
|
A dot denotes any character. For example, the regular expression |
|
A backslash indicates that the metacharacter that follows it will lose its special meaning and instead be interpreted as a literal. A backslash placed before a metacharacter is known as an escape character. For example, a regular expression that consists of a dot metacharacter ( Accordingly, a backslash also escapes another subsequent backslash. For example, the regular expression |
^
and $
characters are not used to designate the start and end of a line.White-space characters in regular expressions
20
in a hexadecimal number system and has an ASCII code of 40
in an octal number system. Although a space character does not infer any special meaning, it must be escaped to avoid any ambiguous interpretation by the regular expression interpreter.For example, the regular expression Hello\ world
corresponds to the sequence of characters Hello world
.
\r
Carriage return character.
\n
Line break character.
\t
Horizontal tab character.
Definition of a character based on its octal or hexadecimal code in regular expressions
\x{
<hex
>}
Definition of a character using its hex
code from the ASCII character table. The character code must be less than 0x100
.
For example, the regular expression Hello\x{20}world
corresponds to the sequence of characters Hello world
.
\o{
<octal
>}
Definition of a character using its octal
code from the ASCII character table. The character code must be less than 0o400
.
For example, the regular expression \o{75}
corresponds to the =
character.
Sets of characters in regular expressions
A character set is defined within square brackets []
as a list or range of characters. A character set tells the regular expression interpreter that only one of the characters listed in the set or range of characters can be at this specific location in a sequence of characters. A character set cannot be left blank.
[
<BracketSpec
>]
– character set.One character corresponds to any character from the BracketSpec
character set.
For example, the regular expression K[OE]S
corresponds to the sequences of characters KOS
and KES
.
[^
<BracketSpec
>]
– inverted character set.One character corresponds to any character that is not in the BracketSpec
character set.
For example, the regular expression K[^OE]S
corresponds to the sequences of characters KAS
, K8S
and any other sequences consisting of three characters that begin with K
and end with S
, excluding KOS
and KES
.
The BracketSpec
character set can be listed explicitly or can be defined as a range of characters. When defining a range of characters, the first and last character in the set must be separated with a hyphen.
[
<Digit1
>-
<DigitN
>]
Any number from the range Digit1
, Digit2
, ... ,DigitN
.
For example, the regular expression [0-9]
corresponds to any numerical digit. The regular expressions [0-9]
and [0123456789]
are identical.
Please note that a range is defined by one character before a hyphen and one character after the hyphen. The regular expression [1-35]
corresponds only to the characters 1
, 2
, 3
and 5
, and does not represent the range of numbers from 1
to 35
.
[
<Letter1
>-
<LetterN
>]
Any English letter from the range Letter1
, Letter2
, ... , LetterN
(these letters must be in the same case).
For example, the regular expression [a-zA-Z]
corresponds to all letters in uppercase and lowercase from the ASCII character table.
The ASCII code for the upper boundary character of a range must be higher than the ASCII code for the lower boundary character of the range.
For example, the regular expressions [5-2]
or [z-a]
are invalid.
The hyphen (minus) -
character is interpreted as a special character only within a set of characters. Outside of a character set, a hyphen is a literal. For this reason, the \
metacharacter does not have to precede a hyphen. To use a hyphen as a literal within a character set, it must be indicated first or last in the set.
Examples:
The regular expressions [-az]
and [az-]
correspond to the characters a
, z
and -
.
The regular expression [a-z]
corresponds to any of the 26 English letters from a
to z
in lowercase.
The regular expression [-a-z]
corresponds to any of the 26 English letters from a
to z
in lowercase and -
.
The circumflex (caret character) ^
is interpreted as a special character only within a character set when it is located directly after an opening square bracket. Outside of a character set, a circumflex is a literal. For this reason, the \
metacharacter does not have to precede a circumflex. To use a circumflex as a literal within a character set, it must be indicated in a location other than first in the set.
Examples:
The regular expression [0^9]
correspond to the characters 0
, 9
and ^
.
The regular expression [^09]
corresponds to any character except 0
and 9
.
Within a character set, the metacharacters *.&|!?+
lose their special meaning and are instead interpreted as literals. Therefore, they do not have to be preceded by the \
metacharacter. The backslash \
retains its special meaning within a character set.
For example, the regular expressions [a.]
and [a\.]
are identical and correspond to the character a
and a dot interpreted as a literal.
Groups of characters and operators in regular expressions
A character group uses parentheses ()
to distinguish its portion (subexpression) within a regular expression. Groups are normally used to allocate subexpressions as operands. Groups can be embedded into each other.
Operators are applied to more than one character in a regular expression only if they are immediately before or after the definition of a set or group of characters. If this is the case, the operator is applied to the entire group or set of characters.
The syntax contains definitions of the following operators (listed in descending order of their priority):
!
<Expression
>, where Expression
can be a character, set or group of characters.This operator excludes the Expression
from the list of valid expressions.
Examples:
The regular expression K!OS
corresponds to the sequences of characters KoS
, KES
, and a multitude of other sequences that consist of three characters and begin with K
and end with S
, excluding KOS
.
The regular expression K!(OS)
corresponds to the sequences of characters Kos
, KES
, KOT
, and a multitude of other sequences that consist of three characters and begin with K
, excluding KOS
.
The regular expression K![OE]S
corresponds to the sequences of characters KoS
, KeS
, K;S
, and a multitude of other sequences that consist of three characters and begin with K
and end with S
, excluding KOS
and KES
.
Expression
>*
, where Expression
can be a character, set or group of characters.This operator means that the Expression
may occur in the specific position zero or more times.
Examples:
The regular expression 0-9*
corresponds to the sequences of characters 0-
, 0-9
, 0-99
, ... .
The regular expression (0-9)*
corresponds to the empty sequence ""
and the sequences of characters 0-9
, 0-90-9
, ... .
The regular expression [0-9]*
corresponds to the empty sequence ""
and any non-empty sequence of numbers.
Expression
>+
, where Expression
can be a character, set or group of characters.This operator means that the Expression
may occur in the specific position one or more times.
Examples:
The regular expression 0-9+
corresponds to the sequences of characters 0-9
, 0-99
, 0-999
, ... .
The regular expression (0-9)+
corresponds to the sequences of characters 0-9
, 0-90-9
, ... .
The regular expression [0-9]+
corresponds to any non-empty sequence of numbers.
Expression
>?
, where Expression
can be a character, set or group of characters.This operator means that the Expression
may occur in the specific position zero or one time.
Examples:
The regular expression https?://
corresponds to the sequences of characters http://
and https://
.
The regular expression K(aspersky)?OS
corresponds to the sequences of characters KOS
and KasperskyOS
.
Expression1
><Expression2
> – concatenation. Expression1
and Expression2
can be characters, sets or groups of characters.This operator does not have a specific designation. In the resulting expression, Expression2
follows Expression1
.
For example, concatenation of the sequences of characters micro
and kernel
will result in the sequence of characters microkernel
.
Expression1
>|
<Expression2
> – disjunction. Expression1
and Expression2
can be characters, sets or groups of characters.This operator selects either Expression1
or Expression2
.
Examples:
The regular expression KO|ES
corresponds to the sequences of characters KO
and ES
, but not KOS
or KES
because the concatenation operator has a higher priority than the disjunction operator.
The regular expression Press (OK|Cancel)
corresponds to the sequences of characters Press OK
or Press Cancel
.
The regular expression [0-9]|()
corresponds to numbers from 0
to 9
or an empty string.
Expression1
>&
<Expression2
> – conjunction. Expression1
and Expression2
can be characters, sets or groups of characters.This operator intersects the result of Expression1
with the result of Expression2
.
Examples:
The regular expression [0-9]&[^3]
corresponds to numbers from 0
to 9
, excluding 3
.
The regular expression [a-zA-Z]&()
corresponds to all English letters and an empty string.