Using RTF Files as a Delivery Vector for Malware
During malware analysis we often see attackers using features in creative ways to deliver and obfuscate malware. We’ve recently seen an increase with samples leveraging RTF temp files as a delivery method to encapsulate and drop malware.
The attack uses the following process to drop and execute the payload on a system.
Figure 1 – Malware Delivery
- The User opens the Office document and enables macros.
- The macro saves the active document as an RTF file.
- The macro silently opens the RTF document.
- On Open the RTF document drops the embedded object to Temp.
- The macro executes the dropped file.
To better understand how this delivery method works, we need to take a look at how binaries are encapsulated within RTF documents and the default behavior when handling these embedded objects.
When a RTF formatted document is opened that includes embedded objects (objects inserted from a file), the objects are extracted to the user’s temp directory where they are launched with the default handler if the user clicks the object within the document. Once the document is closed the files are cleaned up by removing them from the user’s temp directory. So during the time period the document is open, these files are available to other processes on the system.
When an object is embedded into a document from a file it will use the Packager Object Server; a legacy implementation from OLE1. Unfortunately the Packager format has not been publicly documented and is not included within the Microsoft Open Specifications.
The objects are stored in Embedded Object (objemb) sections of a document. The header fields define the object server, size and other metadata about the underlying embedded data.
Figure 2 – Object Header
More information on the header format can be found in MS-OLEDS – Section 2.2.4 ObjectHeader
Understanding the OLE1 Packager Format
Analyzing a number of samples we’ve been able to understand the majority of the Packager format and how it relates to embedded objects within documents. The following table Figure 3 outlines the legacy Packager data format and can be used as a guide to parse package data streams.
|Header||4||Stream Header always set to 0200|
|Label||Variable||Label of embedded object defaulted to filename. (Null Terminated)|
|OrgPath||Variable||Original path of embedded object. (Null Terminated)|
|UType||8||Unknown – Possibly a FormatId
– Set to 00000300 for embedded objects
– Set to 00000100 for linked objects
|DataPathLen||8||Length of DataPath|
|DataPath||Variable||Extract Path and file name defaulted to %localappdata%/Temp of the source system. (Null Terminated)|
|DataLen||8||Length of embedded data.|
|OrgPathWLen||8||Length of OrgFileW|
|OrgPathW||Variable||Original path of embedded object. (WChar)|
|LabelLen||8||Length of LabelW|
|LabelW||Variable||Label of embedded object defaulted to filename. (WChar)|
|DefPathWLen||8||Length of OrgPathW|
|DefPathW||Variable||Original path of embedded object. (WChar)|
Figure 3 – OLE 1 Packager Format
In addition to the embedded binary the format also includes metadata for the embedded object which may be helpful in DFIR.
- By default the Label value will consist of the filename used in OrgPath and ObjFile. If this differs it indicates that the label was modified.
- The OrgPath will include the original path of the binary which was embedded.
- The ObjFile will include the authoring systems %localappdata% path which will include the username C:Users<username>…
To help analyze suspicious package streams we’ve written a python tool psparser which processes the data format and will output the metadata and optionally extract the embedded object. Using the tool to help analyze malicious RTF files we’ve seen in a couple of recent phishing campaigns, we can see a number of potential similarities with the metadata of the embedded objects.
Figure 4 – Sample Analysis
Using the embedded metadata gives us a some primary indicators we can use to find further related samples for analysis.
To accomplish this delivery method attackers start with a RTF document and embed a malicious executable, the document is then converted to a Word (.doc) file. Once in Word the attacker adds the required macro calls to save, open and execute the payload encapsulated within the source document.
Figure 5 – Malicious Macro
Figure 5 shows an example of a malicious macro we’ve seen leveraging this delivery vector. The Macro uses the functions SaveAs to save the active document with the format (wdFormatRTF) and CreateObject to open a new instance of Word opening the document silently. Once the RTF document is open, the macro executes the payload which was extracted to the users Temp directory.
Although the executable files are removed once the instance of Word is closed, the RTF files the macro wrote to the temp directory can persist and be used as a host indicator during triage or response activities. Examination of the macro can quickly confirm if the RTF files are also removed once the payload is executed.
RTF to Doc and back again…
Although the default behavior differs in that the embedded object is no longer extracted to the Temp directory when the Word file is opened, analyzing the embedded Ole10Native Stream shows that it is not modified when the file is saved in the new format.
When the malicious macro saves the document back to RTF format, there are changes to the overall document with additional fields and formatting, but the stream with the malicious payload remains unchanged.
Figure 6 – Original vs Saved RTF Document
Once the document is back in the original format, the default behavior occurs where the embedded object is automatically extracted to the temp directory when the file is opened.
We can see that this is a creative use of a default behavior to encapsulate and deliver a malicious binary. We’ve also seen attackers apply additional layers of obfuscation (xor) to better mask the inclusion of a binary file within the containing document.
Looking at the malicious macros in these samples there are no explicit calls to write or download the binary file which is executed as the payload. This could cause some confusion during malware triage when identifying then initial delivery vector of the malicious binary.