Turning a blind eye: How end-users and NLP AI are being tricked by clever phishing techniques like ZeroFont
Recently, an older email security detection bypass method was seen being used to successfully surpass Microsoft’s spam and phishing filters. This technique described above makes use of two methods and was dubbed “ZeroFont Phishing” by Avanan. ZeroFont Phishing is the method when attackers insert random strings within keywords or phrases that many artificially intelligent systems use to identify malicious or suspicious content. When these strings are placed within the HTML span tags mixed with setting the font-size attribute to zero, they become invisible to the end user, but simultaneously appear to neuter the ability of existing Natural Language Processing (NLP), Machine Learning (ML), and Artificial Intelligence (AI) systems to understand what is in the plaintext of the email. In the majority of implementations NLP attempts to understand the meaning of email text to determine context and patterns that will assist in overall classification. These methods are not new, so we decided to take a deeper look at these older techniques and explore the potential variants that could have similar results.
Sample Phishing Email
We started by creating a simple email HTML template that should trigger any moderately intelligent spam and/or phishing detection algorithm.
Figure 1 – Sample Phishing Email HTML Code
When rendered in an email, this template takes on a convincing look and feel, urging the recipient to update their Microsoft Office 365 password.
Figure 2 – Rendered Sample Phishing Email
Sample ZeroFont Implementation
Taking the same HTML code as our sample phishing email, we then peppered strings throughout phrases and keywords that any NLP based system would flag as potentially malicious.
Figure 3 – Sample ZeroFont Phishing Email HTML Code
When rendered in an email, this template has the exact same look and feel as the original sample phishing email. All the extra text is hidden from the user.
Figure 4 – Rendered Sample ZeroFont Phishing Email
Plaintext Review of Phishing Samples
To the naked eye, there is no visible difference between these two rendered emails. In order to uncover how this email could bypass systems that “read” the text to evaluate, we will need to look at the plaintext portions of these emails.
Figure 5 – Original Phishing Sample Plaintext
Figure 6 – ZeroFont Phishing Sample Plaintext
There is a striking difference in what is presented to the scanning engines. Most solutions would be prone to marking the email as spam when presented with this type of garbled text, and would be unlikely to detect this as a phishing attempt.
Alternative Obfuscation Methods:
Font size of zero achieves its goal of hiding the plaintext nonsense from the end user but remaining present in the scanned part of the email. However, this is not the only way in which data can be hidden or manipulated within HTML code.
Let’s review two more methods that can be used.
Figure 7 – HTML Span Style=”display: none”
Replacing font-size: 0 with the above HTML span will give the exact same results. Text between the span tags will vanish from the users view but remain in the analyzed plaintext. A slightly more interesting example would be the abuse of Unicode styling and the direction in which text is rendered.
Figure 8 – HTML Span Style=”Unicode-bidi”
While the scanning engine will see the following:
devreser sthgir llA .noitaroproC tfosorciM 8102
The end user will see this instead:
2018 Microsoft Corporation. All rights reserved
We have now covered a few examples of where HTML code that can be utilized to obfuscate and/or manipulate text that is presented to end users, and ultimately NLP filters and other text or keyword-based email solutions. Based on testing performed by Cofense™, multiple NLP based detection solutions can now mostly identify ZeroFont coded emails, unfortunately several variations of similar HTML coding methods that achieve similar results are still going undetected.
Armed with the knowledge that targets are using systems that scan the plaintext of emails, any attacker with the smallest understanding of HTML/CSS could go about successfully bypassing in a variety of ways. Adversaries will constantly change their tactics and adapt to the security measures that are preventing their success. Unfortunately, being reliant on machines alone is a losing strategy, that is why conditioning users to recognize and report threats that use these techniques to bypass email security filters is the best method. People will always be the first and best line of defense against phishing.
To be proactive and keep ahead of threats, sign up for Cofense Threat Alerts. These alerts are a simple way to stay on top of emerging phishing and malware threats and attacks. Our Threat Alerts subscription service was developed to provide all businesses with fast delivery and immediate visibility into emerging or changing phishing and malware trends.
All third-party trademarks referenced by Cofense whether in logo form, name form or product form, or otherwise, remain the property of their respective holders, and use of these trademarks in no way indicates any relationship between Cofense and the holders of the trademarks.