Thursday, April 19, 2012

YARA + Volatility ... the beginning

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.

YARA - the sleeping giant.  There's been mention of it over the last few years but as far as adoption - I think it's still lacking in the tool set of many analysts. I personally like to leverage YARA on its own, within pescanner and most definitely within volatility's malfind.  I've recently encountered two obstacles (1) Converting ClamAV to YARA signatures and (2) How to process multiple YARA rule files.  If you take a look at page 26 of YARA's v1.6 User's Manual you'll see it outlines an option to include multiple rule files from within a single file (thanks Par).  In other words, if you use the standard syntax for calling YARA from the cli "yara /path/to/rules.yara <file>" you can't specify multiple rule files (without some foo of course).  Another prime example is within MHL's pescanner where you define the location of your rules file at the bottom, but again, a single rules file:

The above image shows the configuration within pescanner where you define the path to your YARA rules.  This particular example is taken from REMnux and is already filled out, generally it's left blank for your own configuration.

The use of the 'include' feature is one way of circumventing such a restriction because by placing this with the path to your other rule files to the top of the main rules file you're invoking, YARA will automatically process those additional rule files as well.  Here's an example of what I mean:

Simple and straightforward.  Just pop that syntax into the top of your main rule file and you're good to go.

So.. cool right?  Sort of... maybe useful if you have certain rule files you want to use for certain things, like pescanner, but I have a lot of files :/ . If you don't have many rule files then sure... but what if you have a bunch of different ones and foresee yourself continuing to split up or create new ones?  Having to constantly update the main rule file with an "include /path/to/new/rules.yara" every time just sounds like too much upkeep.  Say don't see yourself having that many rule files for it to be a concern you say? ... Well what if, for example, you convert the ClamAV signatures to YARA rules?

The Malware Analysts Cookbook provides such a means with  At the time of writing this there is an open issue with this script but there are a couple modified versions which work a bit better - still produce some errors, but not nearly as many.  There are a few tutorials out there on how to convert ClamAV signatures to YARA rules and it looks pretty straight forward, but I found some things have either changed or people just left out details.  If you have a fresh install of ClamAV you need to make sure you unpack its signature file before you can use the conversion script on it.  This can be done using ClamAV's sigtool:
$ sigtool -u /var/lib/clamav/main.cvd
which when complete will leave present you with the following:

Once you have the .ndb file you can proceed to converting as follows:
$ python -f main.ndb -o clamav.yara
Based on what I've encountered I believe depending on what version of the ClamAV signature DB you have and which version of the script you have, you may or may not get some signatures which YARA won't process.  I happened to get the problem child this time around and if get errors relating to invalid jumps etc. you can just remove those rules as needed since the errors are nice enough to tell you which lines it doesn't like.

The resulting file was ~18 MB of newly generated YARA rules based off the ClamAV signatures.. fwe.. that's a lot.  I tried multiple ways/attempts to get YARA to use this rule file but failed every time.  My assumption was that it's just too big to process in a timely manner like all of the other (smaller) rule files.  But I had a thought... so I started to split this big ol' file into smaller chunks and wanted to see at about what size would be ideal.  Finally at ~512K it seemed to be pretty fast and effective.  To split the file in an easy fashion you can use some form of the 'split' command... i.e :
$ split -d -b 512k clamav.yara
* if you split based on size like I did here you need to realize that it's going to cut the top/bottom signatures into pieces because you're only taking size into consideration and not splitting based on the signatures' structure. This can be easily fixed by going through each one and re-assembling just those two rules but if you don't do this, it's going to scream about the broken rules.

If you did the math, now you can see where I'm going.  This little workaround produced (33) YARA rule files and no, I don't want to add them all statically in case something changes.  When I'm doing some Volatility automation I usually define the path to my YARA rules in the beginning, i.e. :
but because of what we've just found out, a simple workaround to use instead is:
YARA_Rules=(`find /path/to/rules/ -type f -iname *.yara -exec ls {} \;`);
What this essentially does (in Bash) is point to the location where you keep all of your YARA rules and then it will list them all so that you don't have list them one-by-one... you can then parse them in an array:
for rule in "${YARA_Rules[@]}"; do 
and then pass them to the normal volatility syntax from within your volatility automation script, i.e:
YARA_Rules=(`find /path/to/rules/ -type f -iname *.yara -exec ls {} \;`);  

for rule in "${YARA_Rules[@]}"; do -f <mem.raw> --profile=<profile> malfind -Y $rule -D /path/to/dump/directory >> log
Hopefully my troubles and workarounds will help someone else out there.. as always, ping me for feedback, tips etc.


  1. Ah yes, 18MB of converted clamav signatures is a lot. That's why in the book we said "it is not useful to convert *all* ClamAV signatures" ;-)

    If your goal is to scan memory with all clamav signatures, and you already have clamav installed, which you must in order to use sigtool, I'd suggest either:

    1) use vaddump and moddump to extract data to disk, then run clamscan on the directory
    2) write a volatility plugin that uses pyclamd API or invokes clamscan

    The problem with your method above is that you're calling malfind once for each yara rules file, and you have 33, which results in the entire scan taking 33 times longer than it normally would.

    Just to see how much effort was involved, I wrote a few sample plugins which are posted here: If you want to combine scanning to use all clamav rules and your custom yara rules which are spread across multiple rules files, do the rules file enumeration inside the plugin. That way, the data you're scanning is only carved from the memory dump once, and it will all be a lot faster.

    1. Thanks for the great feedback Mike! With all the page views that this post has gotten, you're the only one to provide any input. It's invaluable to get others feedback and opinions as someone else may know a better way to do what I was seeking to accomplish.

      I realize now that converting all of the ClamAV sigs to YARA rules is a hefty operation but I wanted to write up how to at least do it since other tuts I looked at weren't complete. It could also serve useful if I were on a system without ClamAV installed.

      You're completely right that I'm causing more work by calling each of the YARA rules to malfind, however, I was working within the constraints of the environment and wanted something that could be used on a standard volatility build - not having to rely on custom code/plugins.

      Your method is a lot more efficient though - wish writing the plugins came as easy to me as they do for you. I see you added the examples in that pastebin (volclamapi & volclamcli) in the 2.1 alpha branch at one point but when I re-checked they were gone - plans on incorporating them?