URL normalization rules

Various malicious software attempts to hide its activity by using URL obfuscation techniques (using national domain names, including those with single characters, representing IP addresses in octal notation, repeated slashes, etc.). In this case, the same content can be frequently accessed via technically different addresses (for example, addresses that differ in scheme, port, or character case in a URL address).

As a result, when matching a URL with the lists of indicators of compromise (IoCs) in their initial form, this leads to a problem of threat omission, because no matching with IoCs occurs.

For example, github.com@520966948 is an obfuscated IP address 31.13.83.36 that actually belongs to facebook.com.

CyberTrace has two advantage features:

The Kaspersky data feeds cannot allow 13 variants of a URL with a different normalization variant, because this will lead to an unreasonable increase of the feed's size. However, if the user sends us a known URL in a specific format, we can transform it, search for matches in the feeds, and detect it by using normalization.

At the moment, 13 rules of URL normalization are used. The following are the examples of applying these rules:

For closing the groups of a malicious URL, the feeds use eight types of entries that are divided into masked and unmasked entries.

Matching a normalized URL with the entries from the databases on the basis of the URL should be performed regarding the purpose of certain types of entries. Using URL normalization and masks provides an increase in the feed's detection rate, as well as minimizing the supplied data volume and decreasing false positives.

Detailed information is provided in Kaspersky Threat Intelligence Data Feeds Implementation Guide.

Page top