Deriving Malware Context Requires Human Analysis
Man versus machine is one of the oldest technology tropes. In the modern tech economy, it represents one of the largest driving forces in many industries in which processes are streamlined by the inclusion of robotics and automated processes. For the threat intelligence industry, the automated malware sandbox represents the machine that has been put in place to replace the work done by analysts. However, while producing high quality threat intelligence can be enhanced with the inclusion of some automation, completely replacing the human aspect greatly impacts the quality of your analysis.
The automated sandbox provides a snapshot of a malware’s behavior—what it does and how—but it often leaves out important context such as why. Another way to describe this is to consider much of what a sandbox collects as quantitative data that lacks qualitative explanation. Quantitative characteristics of indicators include facts such as the type of indicator (URL, IPv4 Address, etc.) while qualitative characteristics provide insight into the role this indicator plays in the malware’s lifecycle and botnet infrastructure. It is these qualitative characteristics that provide the most insight into how the malware operates and how organizations leveraging threat intelligence can mitigate the threat.
For example, even the longest-lived malware families and types can be subject to sudden change at the whim of a threat actor. The characteristics and traits that represent established indicators for a certain malware type can change overnight. When a change like this takes place, automated sandboxes will not produce the expected analysis results. If these results do not match existing rules, the machine may not know that something bad will come of running that application. This may allow new malware binaries to slip past automated defenses.
However, having humans have a greater ability to identify unwanted behavior even if that behavior does not match any known rules. In these cases, an analyst can know an application is hostile and define what makes it hostile even if the malware has not been previously defined.
Identifying these qualitative characteristics can be a complex task. The process by which this definition takes place must consider the unique context of every malware sample analyzed while at the same time provide a consistent framework for identifying the role each associated indicator plays in a malware’s lifecycle. PhishMe’s malware analysis is driven by human beings who manipulate the malware’s execution within a specialized environment. This human-driven analysis process gives PhishMe analysts an intimate and contextual understanding of the malware’s lifecycle.
Having analysts involved in this process means that communications between malware samples and their supporting infrastructure are subject to scrutiny in real-time. This in turn means that analysis results include a one-to-one parity between observations of a malware’s behavior and its use of supporting infrastructure. This has two implications. First, it allows for the detailed classification and qualification for a malware’s infrastructure. Secondly, it reduces the incidence of false positives since each quantitative indicator is matched to a behavior adding a vetting process to malware analysis.
Given the controlled nature of PhishMe’s analysis, it is easy to construct a distinct ontology for each malware sample based on the parity that can be drawn between infrastructure usage and resulting behavior. It is this understanding of cause-effect relationships that provides the context for categorizing the qualitative characteristics of malware indicators. Those characteristics, vetted by human analysts form the core of the rich intelligence provided by PhishMe.