Regex security model
The Regex security model implements text data validation based on statically defined regular expressions.
A PSL file containing a description of the Regex security model is located in the KasperskyOS SDK at the following path:
toolchain/include/nk/regex.psl
Regex security model object
The regex.psl file contains a declaration that creates a Regex security model object named re. Consequently, inclusion of the regex.psl file into the solution security policy description will create a Regex security model object by default.
A Regex security model object does not have any parameters.
A Regex security model object can be covered by a security audit. In this case, you also need to define the audit conditions specific to the Regex security model. To do so, use the following constructs in the audit configuration description:
emit : ["match"]– the audit is performed if thematchmethod is called.emit : ["select"]– the audit is performed if theselectmethod is called.emit : ["match", "select"]– the audit is performed if thematchorselectmethod is called.emit : []– the audit is not performed.
It is necessary to create additional objects of the Regex security model in the following cases:
- You need to configure a security audit differently for different objects of the Regex security model (for example, you can apply different audit profiles or different audit configurations of the same profile for different objects).
- You need to distinguish between calls of methods provided by different objects of the Regex security model (audit data includes the name of the security model method and the name of the object that provides this method, so you can verify that the method of a specific object was called).
Regex security model methods
The Regex security model contains the following expressions:
match {text :<Text>, pattern :<Text>}Returns a value of the
Booleantype. If the specifiedtextmatches thepatternregular expression, it returnstrue. Otherwise it returnsfalse.Example:
assert (re.match {text : message.text, pattern : "[0-9]*"})
select {text :<Text>}It is intended to be used as an expression that verifies fulfillment of the conditions in the
choiceconstruct (for details on thechoiceconstruct, see "Binding methods of security models to security events"). It checks whether the specifiedtextmatches regular expressions. Depending on the results of this check, various options for security event handling can be performed.Example:
choice (re.select {text : "hello world"}) {
"hello\ .*": grant ()
".*world" : grant ()
_ : deny ()
}
Syntax of regular expressions of the Regex security model
A regular expression for the match method of the Regex security model can be written in two ways: within the multi-line regex block or as a text literal.
When writing a regular expression as a text literal, all backslash instances must be doubled.
For example, the following two regular expressions are identical:
// Regular expression within the multi-line regex block
{ pattern:
```regex
Hello\ world\!
```
, text: "Hello world!"
}
// Regular expression as a text literal (doubled backslash)
{ pattern: "Hello\\ world\\!"
, text: "Hello world!"
}
Regular expressions for the select method of the Regex security model are written as text literals with a double backslash.
A regular expression is defined as a template string and may contain the following:
- Literals (ordinary characters)
- Metacharacters (characters with special meanings)
- White-space characters
- Character sets
- Character groups
- Operators for working with characters
Regular expressions are case sensitive.
Literals and metacharacters in regular expressions
- A literal can be any ASCII character except the metacharacters
.()*&|!?+[]\and a white-space character. (Unicode characters are not supported.)For example, the regular expression
KasperskyOScorresponds to the textKasperskyOS. - Metacharacters have special meanings that are presented in the table below.
Special meanings of metacharacters
Metacharacter
Special meaning
[]Square brackets (braces) denote the beginning and end of a set of characters.
()Round brackets (parentheses) denote the beginning and end of a group of characters.
*An asterisk denotes an operator indicating that the character preceding it can repeat zero or more times.
+A plus sign denotes an operator indicating that the character preceding it can repeat one or more times.
?A question mark denotes an operator indicating that the character preceding it can repeat zero or one time.
!An exclamation mark denotes an operator excluding the subsequent character from the list of valid characters.
|A vertical line denotes an operator for selection between characters (logically close to the "OR" conjunction).
&An ampersand denotes an operator for overlapping of multiple conditions (logically close to the "AND" conjunction).
.A dot denotes any character.
For example, the regular expression
K.Scorresponds to the sequences of charactersKOS,KoS,KESand a multitude of other sequences consisting of three characters that begin withKand end withS, and in which the second character can be any character: literal, metacharacter, or dot.\\<metaSymbol>A backslash indicates that the metacharacter that follows it will lose its special meaning and instead be interpreted as a literal. A backslash placed before a metacharacter is known as an escape character.
For example, a regular expression that consists of a dot metacharacter (
.) corresponds to any character. However, a regular expression that consists of a backslash with a dot (\.) corresponds to only a dot character.Accordingly, a backslash also escapes another subsequent backslash. For example, the regular expression
C:\\Userscorresponds to the sequence of charactersC:\Users. - The
^and$characters are not used to designate the start and end of a line.
White-space characters in regular expressions
- A space character has an ASCII code of
20in a hexadecimal number system and has an ASCII code of40in an octal number system. Although a space character does not infer any special meaning, it must be escaped to avoid any ambiguous interpretation by the regular expression interpreter.For example, the regular expression
Hello\ worldcorresponds to the sequence of charactersHello world. \rCarriage return character.
\nLine break character.
\tHorizontal tab character.
Definition of a character based on its octal or hexadecimal code in regular expressions
\x{<hex>}Definition of a character using its
hexcode from the ASCII character table. The character code must be less than0x100.For example, the regular expression
Hello\x{20}worldcorresponds to the sequence of charactersHello world.\o{<octal>}Definition of a character using its
octalcode from the ASCII character table. The character code must be less than0o400.For example, the regular expression
\o{75}corresponds to the=character.
Sets of characters in regular expressions
A character set is defined within square brackets [] as a list or range of characters. A character set tells the regular expression interpreter that only one of the characters listed in the set or range of characters can be at this specific location in a sequence of characters. A character set cannot be left blank.
[<BracketSpec>]– character set.One character corresponds to any character from the
BracketSpeccharacter set.For example, the regular expression
K[OE]Scorresponds to the sequences of charactersKOSandKES.[^<BracketSpec>]– inverted character set.One character corresponds to any character that is not in the
BracketSpeccharacter set.For example, the regular expression
K[^OE]Scorresponds to the sequences of charactersKAS,K8Sand any other sequences consisting of three characters that begin withKand end withS, excludingKOSandKES.
The BracketSpec character set can be listed explicitly or can be defined as a range of characters. When defining a range of characters, the first and last character in the set must be separated with a hyphen.
[<Digit1>-<DigitN>]Any number from the range
Digit1,Digit2, ... ,DigitN.For example, the regular expression
[0-9]corresponds to any numerical digit. The regular expressions[0-9]and[0123456789]are identical.Please note that a range is defined by one character before a hyphen and one character after the hyphen. The regular expression
[1-35]corresponds only to the characters1,2,3and5, and does not represent the range of numbers from1to35.[<Letter1>-<LetterN>]Any English letter from the range
Letter1,Letter2, ... ,LetterN(these letters must be in the same case).For example, the regular expression
[a-zA-Z]corresponds to all letters in uppercase and lowercase from the ASCII character table.
The ASCII code for the upper boundary character of a range must be higher than the ASCII code for the lower boundary character of the range.
For example, the regular expressions [5-2] or [z-a] are invalid.
The hyphen (minus) - character is interpreted as a special character only within a set of characters. Outside of a character set, a hyphen is a literal. For this reason, the \ metacharacter does not have to precede a hyphen. To use a hyphen as a literal within a character set, it must be indicated first or last in the set.
Examples:
The regular expressions [-az] and [az-] correspond to the characters a, z and -.
The regular expression [a-z] corresponds to any of the 26 English letters from a to z in lowercase.
The regular expression [-a-z] corresponds to any of the 26 English letters from a to z in lowercase and -.
The circumflex (caret character) ^ is interpreted as a special character only within a character set when it is located directly after an opening square bracket. Outside of a character set, a circumflex is a literal. For this reason, the \ metacharacter does not have to precede a circumflex. To use a circumflex as a literal within a character set, it must be indicated in a location other than first in the set.
Examples:
The regular expression [0^9] correspond to the characters 0, 9 and ^.
The regular expression [^09] corresponds to any character except 0 and 9.
Within a character set, the metacharacters *.&|!?+ lose their special meaning and are instead interpreted as literals. Therefore, they do not have to be preceded by the \ metacharacter. The backslash \ retains its special meaning within a character set.
For example, the regular expressions [a.] and [a\.] are identical and correspond to the character a and a dot interpreted as a literal.
Groups of characters and operators in regular expressions
A character group uses parentheses () to distinguish its portion (subexpression) within a regular expression. Groups are normally used to allocate subexpressions as operands. Groups can be embedded into each other.
Operators are applied to more than one character in a regular expression only if they are immediately before or after the definition of a set or group of characters. If this is the case, the operator is applied to the entire group or set of characters.
The syntax contains definitions of the following operators (listed in descending order of their priority):
!<Expression>, whereExpressioncan be a character, set or group of characters.This operator excludes the
Expressionfrom the list of valid expressions.Examples:
The regular expression
K!OScorresponds to the sequences of charactersKoS,KES, and a multitude of other sequences that consist of three characters and begin withKand end withS, excludingKOS.The regular expression
K!(OS)corresponds to the sequences of charactersKos,KES,KOT, and a multitude of other sequences that consist of three characters and begin withK, excludingKOS.The regular expression
K![OE]Scorresponds to the sequences of charactersKoS,KeS,K;S, and a multitude of other sequences that consist of three characters and begin withKand end withS, excludingKOSandKES.- <
Expression>*, whereExpressioncan be a character, set or group of characters.This operator means that the
Expressionmay occur in the specific position zero or more times.Examples:
The regular expression
0-9*corresponds to the sequences of characters0-,0-9,0-99, ... .The regular expression
(0-9)*corresponds to the empty sequence""and the sequences of characters0-9,0-90-9, ... .The regular expression
[0-9]*corresponds to the empty sequence""and any non-empty sequence of numbers. - <
Expression>+, whereExpressioncan be a character, set or group of characters.This operator means that the
Expressionmay occur in the specific position one or more times.Examples:
The regular expression
0-9+corresponds to the sequences of characters0-9,0-99,0-999, ... .The regular expression
(0-9)+corresponds to the sequences of characters0-9,0-90-9, ... .The regular expression
[0-9]+corresponds to any non-empty sequence of numbers. - <
Expression>?, whereExpressioncan be a character, set or group of characters.This operator means that the
Expressionmay occur in the specific position zero or one time.Examples:
The regular expression
https?://corresponds to the sequences of charactershttp://andhttps://.The regular expression
K(aspersky)?OScorresponds to the sequences of charactersKOSandKasperskyOS. - <
Expression1><Expression2> – concatenation.Expression1andExpression2can be characters, sets or groups of characters.This operator does not have a specific designation. In the resulting expression,
Expression2followsExpression1.For example, concatenation of the sequences of characters
microandkernelwill result in the sequence of charactersmicrokernel. - <
Expression1>|<Expression2> – disjunction.Expression1andExpression2can be characters, sets or groups of characters.This operator selects either
Expression1orExpression2.Examples:
The regular expression
KO|EScorresponds to the sequences of charactersKOandES, but notKOSorKESbecause the concatenation operator has a higher priority than the disjunction operator.The regular expression
Press (OK|Cancel)corresponds to the sequences of charactersPress OKorPress Cancel.The regular expression
[0-9]|()corresponds to numbers from0to9or an empty string. - <
Expression1>&<Expression2> – conjunction.Expression1andExpression2can be characters, sets or groups of characters.This operator intersects the result of
Expression1with the result ofExpression2.Examples:
The regular expression
[0-9]&[^3]corresponds to numbers from0to9, excluding3.The regular expression
[a-zA-Z]&()corresponds to all English letters and an empty string.