Regex security model

August 2, 2023

ID ssp_descr_security_models_regex

The Regex security model lets you implement text data validation based on statically defined regular expressions.

A PSL file containing a description of the Regex security model is located in the KasperskyOS SDK at the following path:

toolchain/include/nk/regex.psl

Regex security model object

The regex.psl file contains a declaration that creates a Regex security model object named re. Consequently, inclusion of the regex.psl file into the solution security policy description will create a Regex security model object by default.

A Regex security model object does not have any parameters.

A Regex security model object can be covered by a security audit. In this case, you also need to define the audit conditions specific to the Regex security model. To do so, use the following constructs in the audit configuration description:

  • emit : ["match"] – the audit is performed if the match method is called.
  • emit : ["select"] – the audit is performed if the select method is called.
  • emit : ["match", "select"] – the audit is performed if the match or select method is called.
  • emit : [] – the audit is not performed.

It is necessary to create additional objects of the Regex security model in the following cases:

  • You need to configure a security audit differently for different objects of the Regex security model (for example, you can apply different audit profiles or different audit configurations of the same profile for different objects).
  • You need to distinguish between calls of methods provided by different objects of the Regex security model (audit data includes the name of the security model method and the name of the object that provides this method, so you can verify that the method of a specific object was called).

Regex security model methods

The Regex security model contains the following expressions:

  • match {text : <Text>, pattern : <Text>}

    Returns a value of the Boolean type. If the specified text matches the pattern regular expression, it returns true. Otherwise it returns false.

    Example:

    assert (re.match {text : message.text, pattern : "[0-9]*"})

  • select {text : <Text>}

    It is intended to be used as an expression that verifies fulfillment of the conditions in the choice construct (for details on the choice construct, see "Binding methods of security models to security events"). It checks whether the specified text matches regular expressions. Depending on the results of this check, various options for security event handling can be performed.

    Example:

    choice (re.select {text : "hello world"}) {

    "hello\ .*": grant ()

    ".*world" : grant ()

    _ : deny ()

    }

Syntax of regular expressions of the Regex security model

A regular expression for the match method of the Regex security model can be written in two ways: within the multi-line regex block or as a text literal.

When writing a regular expression as a text literal, all backslash instances must be doubled.

For example, the following two regular expressions are identical:

// Regular expression within the multi-line regex block

{ pattern:

```regex

Hello\ world\!

```

, text: "Hello world!"

}

// Regular expression as a text literal (doubled backslash)

{ pattern: "Hello\\ world\\!"

, text: "Hello world!"

}

Regular expressions for the select method of the Regex security model are written as text literals with a double backslash.

A regular expression is defined as a template string and may contain the following:

  • Literals (ordinary characters)
  • Metacharacters (characters with special meanings)
  • White-space characters
  • Character sets
  • Character groups
  • Operators for working with characters

Regular expressions are case sensitive.

Literals and metacharacters in regular expressions

  • A literal can be any ASCII character except the metacharacters .()*&|!?+[]\ and a white-space character. (Unicode characters are not supported.)

    For example, the regular expression KasperskyOS corresponds to the text KasperskyOS.

  • Metacharacters have special meanings that are presented in the table below.

    Special meanings of metacharacters

    Metacharacter

    Special meaning

    []

    Square brackets (braces) denote the beginning and end of a set of characters.

    ()

    Round brackets (parentheses) denote the beginning and end of a group of characters.

    *

    An asterisk denotes an operator indicating that the character preceding it can repeat zero or more times.

    +

    A plus sign denotes an operator indicating that the character preceding it can repeat one or more times.

    ?

    A question mark denotes an operator indicating that the character preceding it can repeat zero or one time.

    !

    An exclamation mark denotes an operator excluding the subsequent character from the list of valid characters.

    |

    A vertical line denotes an operator for selection between characters (logically close to the "OR" conjunction).

    &

    An ampersand denotes an operator for overlapping of multiple conditions (logically close to the "AND" conjunction).

    .

    A dot denotes any character.

    For example, the regular expression K.S corresponds to the sequences of characters KOS, KoS, KES and a multitude of other sequences consisting of three characters that begin with K and end with S, and in which the second character can be any character: literal, metacharacter, or dot.

    \

    \<metaSymbol>

    A backslash indicates that the metacharacter that follows it will lose its special meaning and instead be interpreted as a literal. A backslash placed before a metacharacter is known as an escape character.

    For example, a regular expression that consists of a dot metacharacter (.) corresponds to any character. However, a regular expression that consists of a backslash with a dot (\.) corresponds to only a dot character.

    Accordingly, a backslash also escapes another subsequent backslash. For example, the regular expression C:\\Users corresponds to the sequence of characters C:\Users.

  • The ^ and $ characters are not used to designate the start and end of a line.

White-space characters in regular expressions

  • A space character has an ASCII code of 20 in a hexadecimal number system and has an ASCII code of 40 in an octal number system. Although a space character does not infer any special meaning, it must be escaped to avoid any ambiguous interpretation by the regular expression interpreter.

    For example, the regular expression Hello\ world corresponds to the sequence of characters Hello world.

  • \r

    Carriage return character.

  • \n

    Line break character.

  • \t

    Horizontal tab character.

Definition of a character based on its octal or hexadecimal code in regular expressions

  • \x{<hex>}

    Definition of a character using its hex code from the ASCII character table. The character code must be less than 0x100.

    For example, the regular expression Hello\x{20}world corresponds to the sequence of characters Hello world.

  • \o{<octal>}

    Definition of a character using its octal code from the ASCII character table. The character code must be less than 0o400.

    For example, the regular expression \o{75} corresponds to the = character.

Sets of characters in regular expressions

A character set is defined within square brackets [] as a list or range of characters. A character set tells the regular expression interpreter that only one of the characters listed in the set or range of characters can be at this specific location in a sequence of characters. A character set cannot be left blank.

  • [<BracketSpec>] – character set.

    One character corresponds to any character from the BracketSpec character set.

    For example, the regular expression K[OE]S corresponds to the sequences of characters KOS and KES.

  • [^<BracketSpec>] – inverted character set.

    One character corresponds to any character that is not in the BracketSpec character set.

    For example, the regular expression K[^OE]S corresponds to the sequences of characters KAS, K8S and any other sequences consisting of three characters that begin with K and end with S, excluding KOS and KES.

The BracketSpec character set can be listed explicitly or can be defined as a range of characters. When defining a range of characters, the first and last character in the set must be separated with a hyphen.

  • [<Digit1>-<DigitN>]

    Any number from the range Digit1, Digit2, ... ,DigitN.

    For example, the regular expression [0-9] corresponds to any numerical digit. The regular expressions [0-9] and [0123456789] are identical.

    Please note that a range is defined by one character before a hyphen and one character after the hyphen. The regular expression [1-35] corresponds only to the characters 1, 2, 3 and 5, and does not represent the range of numbers from 1 to 35.

  • [<Letter1>-<LetterN>]

    Any English letter from the range Letter1, Letter2, ... , LetterN (these letters must be in the same case).

    For example, the regular expression [a-zA-Z] corresponds to all letters in uppercase and lowercase from the ASCII character table.

The ASCII code for the upper boundary character of a range must be higher than the ASCII code for the lower boundary character of the range.

For example, the regular expressions [5-2] or [z-a] are invalid.

The hyphen (minus) - character is interpreted as a special character only within a set of characters. Outside of a character set, a hyphen is a literal. For this reason, the \ metacharacter does not have to precede a hyphen. To use a hyphen as a literal within a character set, it must be indicated first or last in the set.

Examples:

The regular expressions [-az] and [az-] correspond to the characters a, z and -.

The regular expression [a-z] corresponds to any of the 26 English letters from a to z in lowercase.

The regular expression [-a-z] corresponds to any of the 26 English letters from a to z in lowercase and -.

The circumflex (caret character) ^ is interpreted as a special character only within a character set when it is located directly after an opening square bracket. Outside of a character set, a circumflex is a literal. For this reason, the \ metacharacter does not have to precede a circumflex. To use a circumflex as a literal within a character set, it must be indicated in a location other than first in the set.

Examples:

The regular expression [0^9] correspond to the characters 0, 9 and ^.

The regular expression [^09] corresponds to any character except 0 and 9.

Within a character set, the metacharacters *.&|!?+ lose their special meaning and are instead interpreted as literals. Therefore, they do not have to be preceded by the \ metacharacter. The backslash \ retains its special meaning within a character set.

For example, the regular expressions [a.] and [a\.] are identical and correspond to the character a and a dot interpreted as a literal.

Groups of characters and operators in regular expressions

A character group uses parentheses () to distinguish its portion (subexpression) within a regular expression. Groups are normally used to allocate subexpressions as operands. Groups can be embedded into each other.

Operators are applied to more than one character in a regular expression only if they are immediately before or after the definition of a set or group of characters. If this is the case, the operator is applied to the entire group or set of characters.

The syntax contains definitions of the following operators (listed in descending order of their priority):

  • !<Expression>, where Expression can be a character, set or group of characters.

    This operator excludes the Expression from the list of valid expressions.

    Examples:

    The regular expression K!OS corresponds to the sequences of characters KoS, KES, and a multitude of other sequences that consist of three characters and begin with K and end with S, excluding KOS.

    The regular expression K!(OS) corresponds to the sequences of characters Kos, KES, KOT, and a multitude of other sequences that consist of three characters and begin with K, excluding KOS.

    The regular expression K![OE]S corresponds to the sequences of characters KoS, KeS, K;S, and a multitude of other sequences that consist of three characters and begin with K and end with S, excluding KOS and KES.

  • <Expression>*, where Expression can be a character, set or group of characters.

    This operator means that the Expression may occur in the specific position zero or more times.

    Examples:

    The regular expression 0-9* corresponds to the sequences of characters 0-, 0-9, 0-99, ... .

    The regular expression (0-9)* corresponds to the empty sequence "" and the sequences of characters 0-9, 0-90-9, ... .

    The regular expression [0-9]* corresponds to the empty sequence "" and any non-empty sequence of numbers.

  • <Expression>+, where Expression can be a character, set or group of characters.

    This operator means that the Expression may occur in the specific position one or more times.

    Examples:

    The regular expression 0-9+ corresponds to the sequences of characters 0-9, 0-99, 0-999, ... .

    The regular expression (0-9)+ corresponds to the sequences of characters 0-9, 0-90-9, ... .

    The regular expression [0-9]+ corresponds to any non-empty sequence of numbers.

  • <Expression>?, where Expression can be a character, set or group of characters.

    This operator means that the Expression may occur in the specific position zero or one time.

    Examples:

    The regular expression https?:// corresponds to the sequences of characters http:// and https://.

    The regular expression K(aspersky)?OS corresponds to the sequences of characters KOS and KasperskyOS.

  • <Expression1><Expression2> – concatenation. Expression1 and Expression2 can be characters, sets or groups of characters.

    This operator does not have a specific designation. In the resulting expression, Expression2 follows Expression1.

    For example, concatenation of the sequences of characters micro and kernel will result in the sequence of characters microkernel.

  • <Expression1>|<Expression2> – disjunction. Expression1 and Expression2 can be characters, sets or groups of characters.

    This operator selects either Expression1 or Expression2.

    Examples:

    The regular expression KO|ES corresponds to the sequences of characters KO and ES, but not KOS or KES because the concatenation operator has a higher priority than the disjunction operator.

    The regular expression Press (OK|Cancel) corresponds to the sequences of characters Press OK or Press Cancel.

    The regular expression [0-9]|() corresponds to numbers from 0 to 9 or an empty string.

  • <Expression1>&<Expression2> – conjunction. Expression1 and Expression2 can be characters, sets or groups of characters.

    This operator intersects the result of Expression1 with the result of Expression2.

    Examples:

    The regular expression [0-9]&[^3] corresponds to numbers from 0 to 9, excluding 3.

    The regular expression [a-zA-Z]&() corresponds to all English letters and an empty string.

Did you find this article helpful?
What can we do better?
Thank you for your feedback! You're helping us improve.
Thank you for your feedback! You're helping us improve.