Hunting Malware Threats from Just One Word: How to Perform a Fruitful Investigation with Practically Nothing
Posted by: Jason Meurer, Researcher, Cofense
As security researchers, we sometimes have very little information to begin our investigations or research activities. A rumor here or there can sometimes spread from a single word attributed to a current phishing or malware campaign. This was exactly the case for us on February 27th, when we identified a phishing campaign but were provided with very limited information to aid us in starting our research.
While the campaign itself was not particularly novel, we thought it might be interesting to discuss the overall research and investigative processes that we followed. This article will demonstrate how we started with a single data point—one word, a brand name—and were ultimately able to identify multiple related and relevant email phishing campaigns, samples, and actionable IOC’s.
The Hunt Begins:
We started with the name of a popular online reservation service being used in an ongoing phishing campaign, one that leveraged their brand to trick users into installing malicious software.
- We began our search in much the same way as many would; by leveraging VirusTotal and other online intelligence repositories to look for anything that might resemble what we believed to be evidence. A quick search for the brand name returned a large amount of hits.
- Unfortunately, this brand name happened to share some overlap in programming terminology. No worries; we just pivoted.
- We believed this to be an email-based threat leveraging malicious documents, so we added a filter to extract anything that might be an email or office document.
- Success! We found one document that appeared to have this brand name in the filename as well as a few AV hits.
- At this point we downloaded the identified file and began to examine it to verify that it may have been related to the rumors.
To begin our analysis, we fired up our Linux-based analysis environment and took a closer look at the suspect document.
- The document was labeled with the .doc extension, but dumping the raw contents with xxd quickly revealed that it was an RTF doc.
Figure 1 Hexadecimal output of rtf document from xxd
We found this quite interesting and further examined the document for other possible information.
- By using the strings command we discovered that there are several embedded objects as well as an interesting URL in an INCLUDEPICTURE section of the doc.
- Philippe Lagadec’s suite of tools, oletools, was then used to extract the objects that were located within the file.
Figure 2 Output from rtfobj utility
Interestingly, this returned an embedded script to review, along with a few other objects.
- A quick review of the scriptlet showed that it included VBScript to execute what appeared to be a screensaver program named “intel.scr” from the system temp folder.
Figure 3 Contents of embedded scriptlet
We knew what the screensaver program was called, but we didn’t know where it came from.
- Rtfobj is a program written to extract any embedded objects from an RTF document. By reviewing the other contents dumped from rtfobj, it was possible perhaps to obtain additional clues. Since only one of the files had any size to it, we focused on that specific embedded file.
Figure 4 Hexadecimal output of embedded object
We immediately found some interesting information within the first section of the file. The file was dropped as the intel.scr file we identified earlier. We also found that it contained an executable, by seeing the MZ file header.
Now that we had a good idea of what the rtf was doing, we took another look at that INCLUDEPICTURE URL that we found earlier.
Figure 5 INCLUDEPICTURE string that performs check-in
- Our thought at this point was that it could be very interesting to investigate the remote host that the URL pointed to and see if it hosted any other files of interest.
- After logging in to our secure sandbox we navigated to the URL and the COD folder it referenced. There we found several php and text files.
Figure 6 Directory listing from check-in site
- The most interesting file we came across was the stats.txt text file. This file contained all of the check-ins performed by affected systems, which essentially provided us with the effectiveness of the campaign.
- Since we knew that the two text files appeared to be linked to the dropper, it made sense to do some quick Google dorking and see if any other sites look similar.
Figure 7 Google dork to identify other hosts
After reviewing a few of the returned sites, we were confident that this was more widespread than just the one domain we initially found.
- We used this newfound information to create a Yara rule to looked for other files that matched the one we discovered. The rule to hunt for this was simple to create.
Figure 8 Yara rule to identify further samples
After letting this rule run against samples in our malware zoo as well as in VT, we came across several other samples that used this style of check-in for affected hosts.
- Some quick command line magic provided us with a unique list of hosts to help search for more data about this malicious dropper.
Figure 9 Listing of all check-in hosts found across sample set
We quickly saw that there was some domain reuse, leveraging multiple directories. We also found a few unexpected domains; google.com and localhost (more on this later).
- We were able to gather the stats file from each of these hosts to determine a rough count of hosts that may have been infected.
Figure 10 Command to pull down stats.txt file from listing generated
We discovered that several of the hosts appeared to be down, as the connections timed out.
- We concatenated our gathered stats and stripped out the unique IP and user agent string pairs and ran a count on the number we found.
Figure 11 Quick summation of unique hosts discovered in stats.txt
We can conservatively state that there were at least 1804 hosts that had been infected, but potentially a lot more as NAT may have been masking multiple hosts within a single organization. While this is not a tremendous number, it indicates that there was at least some effort being put into this campaign.
The discovery of the google and localhost domains in some samples were peculiar. This is very likely an indication that the samples were being generated by some kit that was not yet fully understood by the attacker.
Being able to pivot quickly on information discovered along the way is a critical skill for everyone who works in security. We were asked to discover a threat based solely on a single word and ended up stumbling upon a new exploit kit.
Never underestimate the power of simple command line tools to gather information quickly.
- cut – Utility for quickly chopping up data in stdout. Helpful if trying to extract specific fields for further processing.
- grep – Tool for finding patterns within files
- oletools – Suite of tools for handling OLE2 files
- sed – Command to parse and transform text
- Sort – Utility for sorting data
- strings – Command line utility to display any printable characters from a file
- wc – Tool to get counts of words, bytes, or lines
- wget – Network download tool
- xxd – Command line utility to display the hex output of a file
- Yara – Powerful matching language