:: hiddenillusion ::: classification

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.

updated 08/20/2012 - added two new signatures

There were some recent discussions going on regarding the use, or possible use of bypassing security products or even the end user by having a XML Data Package (XDP) file with a PDF file. If you aren't familiar with XDP files, don't feel bad... neither was I. According to the information Adobe provides, this is essentially a wrapper for PDF files so they can be treated as XML files. If you want to know more about this file then take a look at the link above as I'm not going to go heavily into detail but note that the documentation is a bit on the light side as it is. There're other things that can be included in the XDP file but for this post we're looking at the ability to have a PDF within it.

Adobe states that :

"The PDF packet encloses the remainder of the PDF document that resulted from extracting any subassemblies into the XDP. XML is a text format, and is not designed to host binary content. PDF files are binary and therefore must be encoded into a text format before they can be enclosed within an XML format such as XDP. The most common method for encoding binary resources into a text format, and the method used by the PDF packet, is base64 encoding [RFC2045]."

Based on my limited testing, when you open a XDP file, Adobe Reader recognizes it and is the default handler. When the file is opened, Adobe Reader decodes the base64 stream (the PDF within it), saves it to the %temp% directory and then opens it.

Brandon's post included a SNORT signature for this type of file but I wanted to get some identification/classification for more of a host based analysis. Since I couldn't get a hold of a big data set I grabbed a few samples (Google dork = ext:xdp) and thought I'd first try TrID - but that generally just classified them as XML files (with a few exceptions) and the same thing with 'file'. I can't blame them, I mean they are XML files but I wanted to show them as XDP files with PDF's if that was the case - that way I could do post-processing and extract the base64 encoded PDF from within the XDP file and then process it as a standard PDF file in an automated fashion.

I then looked to TrIDScan but unfortunately that didn't work as hoped. I tried creating my own XML signature for it as well but kept receiving seg. faults .. so... no bueno. My next thought was to put it into a YARA rule but I thought I'd try something else that was on my mind. I've been told in the past to mess around with ClamAV's sectional MD5 hashing but that's generally done by extracting the PE files sections then hashing those. Since this is a XML that wasn't going to work. I remembered some slides I looked at a bit ago regarding writing ClamAv signatures so when I revisited them the lightbulb about the ability to create Logical Signatures came back to me.

ClamAV's Logical Signatures

Logical Signatures in ClamAV are very similar to the thought/flow of YARA signatures in that they allow you to create detection based on..well.. logic. The following is the structure, the 'Subsig*' are HEX values... so you can either use an online/local resource to convert your ASCII to HEX or you can leverage ClamAV's sigtool (remember to delete trailing 0a though):

sigtool --hex-dump

Logical Signature Structure:

SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;Subsig1;Subsig2;...

Looking back to Adobe's information they also mention that the PDF packet has the following format:

<pdf xmlns="http://ns.adobe.com/xdp/pdf/">
     <document>
          <chunk>
               ...base64 encoded PDF content...
          </chunk>
     </document>
</pdf>

ClamAV Signature

The beauty is that you can create your own custom Logical Database (.ldb) and pop it into your default ClamAV directory (i.e. /var/lib/clamav) with the other databases and it'll automatically be included in your scan. While just detecting this may not indicate it's malicious, at least it's a way to detect the presence of the file for further analysis/post-processing. So based on everything I now know I can create the following ClamAV signature :

XDP_embedded_PDF;Target:0;(0&1&2);3c70646620786d6c6e733d;3c6368756e6b3e4a564245526930;3c2f7064663e

Explained:

XDP_embedded_PDF - Signature name

Target:0 - Any file

(0&1&2) - match all of the following

0
ASCII : <pdf xmlns=
HEX : 3c70646620786d6c6e733d

1
ASCII : <chunk>JVBERi0
HEX : 3c6368756e6b3e4a564245526930

* JVBERi0 is the Base64 encoded ASCII text " %PDF- ", which signifies the PDF header. It was converted into HEX and added to the end of the 'chunk' to help catch the PDF

ASCII : </pdf>

HEX : 3c2f7064663e

update #1 on 08/20/2012 :
The above first created ClamAV signatures works but I started to think that the '<chunk>JVBERi0' may not be next to each other in all cases ... not sure if they have to nor not by specification but this is Adobe so I'd rather separate them and match on both anyway..

XDP_embedded_PDF_v2;Target:0;(0&1&2&3);3c70646620786d6c6e733d;3c6368756e6b3e;4a564245526930;3c2f7064663e

update #2 on 08/20/2012:

YARA signature:

rule XDP_embedded_PDF
{
meta:
author = "Glenn Edwards (@hiddenillusion)"
version = "0.1"
ref = "http://blog.9bplus.com/av-bypass-for-malicious-pdfs-using-xdp"

strings:
$s1 = "<pdf xmlns="
$s2 = "<chunk>"
$s3 = "</pdf>"
$header0 = "%PDF"
$header1 = "JVBERi0"

condition:
all of ($s*) and 1 of ($header*)
}

Questions to answer

Actors are always trying to find new ways to exploit/take advantage of users/applications so it's good that this was brought to attention as we can now be aware and look for it. While the above signature will trigger on an XDP file with a PDF (from what I had to test on), there're still questions to be answered and without having more samples or information they stand unanswered at this point:

Could these values within the XDP file be encoded and still recognized like other PDF specs
Can it be encoded with something other than base64 and still work
Will any other PDF readers like FoxIT treat them/work the same as Adobe Reader

Comments and questions are always welcome ... never know if someone else has a better way or something I said doesn't work.

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.

As you are all aware of, there are a ton of different tools out there and the list just keeps growing. A coworker of mine is working on some malware automation and often times we needed to determine which tools we wanted to run against said files. This outcome varies based on the type of file it can be identified and classified as of course but still... what do you use? I know I can't remember everything I come across everyday or sometimes have a brain fart and forget what tool can be used in what situation or on what type of file so I started to create a spreadsheet that would help aid in this type of debacle. The list started to take on a life of its own and as you can imagine the scope can be very large depending on what your goals are and how you want to store this information.

This list could very well be made into a DB and represented/maintained better but for quick answers on the road this was the best option for me. I'm posting this up on Google Docs where I will periodically update the list (feel free to give me recommendations to add -> @hiddenillusion). Hopefully it will be of use for others as I know it's come in use for me and some others already. If you need to modify it for your own means then please do - just download a local copy and have at it. Don't send me feedback that there's a row incomplete, I'm aware.

As I started to say, my first intention was to have a list of tools broken down by what files they could be used to analyze. Because a tool can be used to analyze more than one file extension (i.e. 7zip for .zip/.rar/.7z/.jar etc.) I have certain tools listed multiple times. When I have a sortable list, I don't want to have to result in searching the spreadsheet to find what I'm looking for but would rather just filter by the file extension and see what results I have stored.

Update : changed this so multiple file extensions would be listed next to the same tool. It was convenient to filter just by the extension but I got tired of having so many duplicate lines for a single tool. You can just as easily click on that column and filter based on a cell containing what you're looking to analyze.

As I continued to populate this list I thought why just list out tools for malware analysis? There's plenty of dfir & general use tools/sites which this would be applicable to in my everyday environment so I added a few other columns.... You may notice that some of it is incomplete (i.e. not all fields are filled in for every row) or that I may have forgotten some common tools but hey, we all get busy and it's a living file - meaning it will never be complete because new things are always being released..

Some of the other notes to take into account are that for me personally, I would like to know if it's a CLI/GUI (or both) type of tool, what the tools described as, where I can get it, any useful switches that I should know about, does it require an install to use and finally is it a part of anything else I may already have so I don't have to go and get it. With that being said, the structure of the spreadsheet is as follows:

-----------------------
Column - Purpose
-----------------------

File Ext - what file extensions can be processed by this tool
Tool - the name of the tool
Category - what's the best fitting main category to apply to this tool (you'll notice that there's overlap)
Sub-Category - helping to narrow down for particular situations of analysis. (i.e. I may be looking for a tool to use for ADS or VSCs or Rootkits). This is especially helpful for those tools that aren't just to be classified by the file extensions they can handle.
Useful Switches - helps save time reading man pages or looking it up online
Type - useful to know if it's a CLI/GUI/Both for scripting purposes & forensic footprints on IR engagements
Tool Description - quick summary of what the tool is or what it can do. In full disclosure - most of the time I did not personally write these; I usually just copied and pasted them from the authors description or wherever I found out about the tool. Why re-invent the wheel, but credit goes to the other guys when appropriate.
Linkage - helpful to know where to get the tool at...
Require Install? - this is very important to know in certain situations so if I know that there's a full install required it will have some impact on my decision if I'm on an IR engagement and not doing some postmortem analysis in a lab.
Included In? -I started to put in some of the common frameworks/distros such as TSK, REMnux, SIFT etc.

If the link to the spreadsheet doesn't work, head over to the menu bar > Spotlight > Files > Google Docs > Tools classify list ... sometimes the link gets screwy when it's edited :(

:: hiddenillusion ::

Pages

Tuesday, June 19, 2012

XDP files and ClamAV

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.

ClamAV's Logical Signatures

ClamAV Signature

YARA signature:

Monday, February 13, 2012

What to use for analysis on a per file extension -or- category basis

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.