Threat Actors Customize URLs to Avoid Detection
Threat actors have many ways to avoid being detected. Today, let’s look at how they tweak URLs to bypass firewall rules—and what you can do to stop them from succeeding.
By combining various components of Uniform Resource Locators (URLs) and certain methods of encoding and obfuscation, OLE Object Relationships can be abused to download malicious content while avoiding many forms of detection both dynamically and statically. Because different operating systems and versions of Microsoft Office handle the URLs involved in these relationships differently, threat actors can craft URLs that cannot be easily detected with the same static firewall rule across those systems. Variations in the method of handling can allow malicious content to be loaded without the victim’s consent or knowledge.
Object Relationships: A Refresher
Microsoft created the Object Linking and Embedding (OLE) architecture to enable various types of files to share different kinds of information, such as a Word document or an Excel spreadsheet. Shared information is represented as an object. An OLE object maintains a relationship with the application that created it, which can be static, embedded, or linked.
Threat actors can abuse object relationships within Open XML Document files (such as .docx and .pptx)—a tactic Cofense Intelligence™ has previously explored. With this type of abuse, the object relationships in .docx files specify the target object as being “external”, and when the document is opened, it attempts to load the external resource. An example of the XML markup file denoting this linked relationship can be seen in Figure 1.
Figure 1: External Resource Target
The basic version of this external object linking method is easily detected by unpacking the document and scanning for strings such as http://. This method can also be detected by most network traffic monitoring systems because of the unusual User-Agent and HTTP request methods (Figure 2).
Figure 2: Unusual Network Traffic
A file like this will typically get upwards of 20/60 detections by antivirus companies on VirusTotal.
Combining these methods of detection with policies that prompt users (as shown in Figure 3) before external resources are loaded can help prevent malware from downloading its next stage.
Figure 3: External Link Prompt
Methods of Encoding and Obfuscating
IP Address Modification
The encoding and obfuscation methods described in this document adhere to the standards set out by Microsoft. Details can be found here. A basic URL consists of a “scheme,” “authority,” and “path.”
There are several accepted schemes, but the most relevant ones are FTP, HTTP(s), and file. By changing the scheme, threat actors can circumvent some protections. A recent example of this tactic is discussed in a Cofense Intelligence blog.
The authority portion of a URL allows more flexibility in terms of content modification and selection. A simple example of this is converting an IP address into a decimal representation. For example, the IP address 192.168.10.1 becomes the decimal value 3232238081. This conversion enables threat actors to avoid tools that look for the standard dotted decimal format of an IP address or a domain name in the strings of a document. The before (Target_00) and after (Target_01) URLs can be seen in Figure 4.
Figure 4: IP Address and Decimal Converted URL
By converting an IP address to a decimal value, the threat actors also change the network traffic to only a GET request (see Figure 5). HTTP GET requests are not inherently unusual while the OPTIONS, HEAD, and PROPFIND request methods as well as the corresponding User-Agents shown in Figure 2 are unusual and would be suspicious enough to be noticed by network monitoring systems. Without these indicators the network traffic is significantly less likely to be detected.
Figure 5: Decimal IP Address Network Traffic
The combination of encoded authority and altered network traffic reduces the detections from upwards of 20 to upwards of 10/60 by antivirus companies on VirusTotal. This low detection rate can be combatted by applying signatures that search for a decimal formatted IP address and having network monitors look for unusual NBNS queries.
The URL standards discussed above allow the selection of different schemes, including “file.” The “file” scheme fails to work in some situations (such as when using an IP address as the authority), but is very effective when used with standard domain names to download OLE objects. When using the “file” scheme, the document first attempts to load the object over SMB. If this is not available (as is often the case), the document defaults back to HTTP GET requests (as can be seen in Figure 6).
Figure 6: File Scheme Network Traffic
While this method avoids tools that search for the “http” scheme in file contents, it is vulnerable to network traffic monitoring for the HTTP request methods and SMB communications. Any SMB traffic to a host outside of an organization’s local network should be a major red flag, as it would almost always take place over a VPN. VPN traffic should always be encrypted, so any plaintext SMB traffic traversing that boundary should be rare enough to warrant immediate investigation.
The URLs used in normal interactions with web resources are not designed to accept specific characters directly. These characters include spaces, certain symbols, and non-ASCII characters. For these characters to be correctly processed they must be encoded using a standardized syntax that is accepted. For example, the hexadecimal value of each character preceded by a “%.” Although this is not obvious to users, when a URL is typed into a browser, the browser will convert certain characters automatically to hexadecimal notation, such as white space to %20. Like many features that automatically convert data, this can be exploited by threat actors to “encode” full URLs into a format that is not easily human readable. This encoding can also apply to normal characters and letters such as “A” or “B”.
Figure 7: Example Two Layers of Hexadecimal Encoding
In Figure 7, the letters “A” and “B” in the target URL of Layer_1 are replaced with their hexadecimal equivalents in the target URL of Layer_2. In the target URL of Layer_3 the entire target URL of Layer_1 has been encoded with hexadecimal. This change makes the URL harder to read by normal users but has minimal effect on network monitoring systems. It also makes the document more likely to be detected by antivirus technologies due to its use of hexadecimal encoding, which is rare in legitimate documents.
The use of hexadecimal encoding is not limited to the “http” scheme. It can also be combined with other obfuscation methods and applied to other schemes such as “file.” By applying hexadecimal encoding to URLs with the “file” scheme, it is also possible to avoid the “update links” prompt which often displays when an “http” scheme URL is hexadecimal encoded. This method provides not only URL obfuscation, but the added benefits of network traffic that is inconsistent with its intended purpose, and a mechanism for avoiding antivirus that is not adjusted to look for the presence of hexadecimal encoding. These benefits are all negated if antivirus and network monitoring systems are designed to monitor for unusual activity as opposed to known malicious software and network activities.
Conversely, combining hexadecimal encoding with the decimal conversion of IP addresses forces behavior that is not as easy to detect. This technique provides all the protections of a decimal IP address (including the forced GET request) with the basic detection prevention of hexadecimal encoding. A file crafted to take advantage of this combination can have as little as 3/58 detections by antivirus companies on VirusTotal. Mitigating this type of threat is similar to creating it, by combining each mitigation step involved in each layer of obfuscation or encoding, this type of threat can be detected and prevented.
Figure 8: Example Layers of IP Address Conversion and Encoding
As shown in Figure 8, when comparing the detections of the test documents crafted with different versions of the same end payload URL, we see:
- the test document using only the IP address (Id=”IP”) had 18/59 detections
- the test document using the decimal encoded version of the IP address (Id=”DecimalIP”) had 11/60 detections
- the test document using the hexadecimal encoded and decimal encoded IP address (id=”HexadecimalDecimalIP”) had 3/58 detections
Each additional layer of obfuscating and encoding reduces the number of detections by a significant amount.
A single layer of encoding in a document does offer advantages particularly if it is combined with other techniques, but multiple layers of encoding allow for different possibilities. In particular, the target seen in Figure 9 is interpreted at least three different ways.
Figure 9: Example Three Layers of Hexadecimal Encoding
A portion of the URL in the image above has been encoded three times.
Figure 10 shows how this URL is processed when using a Windows 7 OS with Microsoft Office 2014.
Figure 10: Windows 7 Microsoft Office 2014 Network Traffic
Figure 11 shows how the exact same document is processed when using Windows 10 OS with Microsoft Office 2016.
Figure 11: Windows 10 Microsoft Office 2016 Network Traffic
This third image (Figure 12) shows how the exact same document is processed when using Mac OS High Sierra with Microsoft Office 2016.
Figure 12: Mac Microsoft Office 2016 Network Traffic
The URLs seen in these network captures differ, thus making a static network signature for all platforms difficult. The method by which these URLs are decoded is also inconstant, meaning that the order in which hexadecimal pairs are decoded cannot be easily predicted for each platform. In each situation, the original URL from Figure 9 is not fully decoded, but the original target file is downloaded even if the requests seen in the network traffic do not show this, as can be seen in Figure 13. This server-side decoding happens for at least 6 layers of encoding.
Figure 13: File Download Despite Incomplete Decoding
Multi-layer encoding can be used to generate network traffic that diverts automated systems and analyst attention, making it difficult to manually determine the actual targeted resource.
What You Can Do to Block these Tactics
As we’ve shown, threat actors can use combinations of multiple legitimate methods of encoding and obfuscation to retrieve online resources while avoiding both automated systems and analysts. To detect and prevent these attempts, preparation and intelligence are necessary.
By setting antivirus to look for suspicious indicators that rarely appear in legitimate documents such as hexadecimal encoded characters, unusual URL authority contents, and URL schemes that are used incorrectly, previously undetected or unknown events become easily recognizable and preventable.
Similarly, by setting network traffic monitoring systems to look for indicators such as multiple HTTP requests with WebDAV user agents, NBNS queries to websites or for decimal values, and HTTP requests with hexadecimal encoding of normal characters (such as %41 instead of A), you can make anomalous behavior much more evident.
Also, human-verified threat intelligence reinforces these preparations, enabling you to detect and prevent threat actors’ evasions. Learn how Cofense Intelligence delivers phishing-specific threat intelligence to keep you in front of trouble.
Disclosure: The domains example.com and ip.anysrc.net are examples and do not host malware.
The IP address 31[.]220[.]40[.]22 is known to host malicious binaries, malware command and control centers, and domains for credential phishing websites. See ThreatHQ for more information on the relevant malware and credential phishing reports.
All third-party trademarks referenced by Cofense™ whether in logo form, name form or product form, or otherwise, remain the property of their respective holders, and use of these trademarks in no way indicates any relationship between Cofense and the holders of the trademarks.