Thursday, June 21, 2012

Getting what you want out of a PDF with REMnux

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


I was talking recently with someone Melissa who brought up the fact that she was having a problem extracting something from a PDF.  It was cheating a little bit since we knew there was definitely something there to extract and look for because of another analysis previously posted.  When I read a post about someone doing an analysis I always like when they show a little more details about how they got to the end result and not just showing the end result - and this was a case of the latter.  As a result of this little exercise I thought I would write a quick post on how to do the same type of thing with the CVE-2010-0188 shown here.

I know there's a wealth of write ups for analyzing PDF's but only a handful are solely done in REMnux and they don't always show multiple ways to get the job done.  I have no problem analyzing on a Windows system with something like PDF Stream Dumper (love the new JS UI) but the fact that REMnux is so feature and tool packed makes it possible to solely stick within its environment to tackle your analysis if need be.  

One of the first things I run on any file I'm analyzing is 'hachoir-subfile'.  There's other tools within this suite which are also useful but this one isn't necessarily file type specific so it's a great tool to run during your analysis and see if you can get any hits... unfortunately, I didn't get any in this instance.

Method 1

Most of you are probably familiar with pdfxray and while the full power of it isn't within REMnux, there's still a slimmed down version, pdfxray_lite, which can provide you an easy to view overview of the PDF:

$ pdfxray_lite -f file.pdf -r rpt_


No, that's not a typo in the report name, I added the "_" so that it would be separated from the default text added to the report name which is its MD5 hash.  If we take a look at the HTML report in Firefox Object 122 stands out as being sketchy.  It looks to contain an '/EmbeddedFile' and the decoded stream looks like it's Base64 encoded ... the repeated characters seen also resemble a NOP sled :






Another one of my favorites is 'pdfextract' from the Origami Framework as it can also extract various data such as streams, scripts, image, fonts, metadata, attachments etc.  It's nice sometimes to have something just go and do the heavy lifting for you but even if you don't get what you wanted extracted, you still might get some other useful information :

$ pdfextract file.pdf
 
The above command results in a directory named '<file>.dump' with sub-directories based on what it tried to extract:



Now.. we're after a TIFF file in this case but still even this tool doesn't seem to have extracted it for us... something unusual must be going on since the above two tools are great for this type of task 9 times out of 10.  In this particular instance, if we list the contents of this dump directory we can see 'script_<numbers>.js' in the root.  Typically, this would be included in the '/scripts' sub-directory so let's take a look at what it holds:


Looks like there was something in the PDF referencing an image field linked to 'exploit.tif'.  People get lazy with their naming conventions or sometimes even just copy stuff that's obvious (check @nullandnull 's slides as he talks more about this trend.).  Since we don't have any extracted images we can check out the contents of the other files extracted.  Pdfxray_lite gave us a starting point so let's dig deeper into Object 122 and check out it's extracted stream from pdfextract :



 

Hm... the content type is 'image/tif' and the HREF link looks empty followed by a blog of Base64 encoded data.  There's online resources to decode Base64, or maybe you've written something yourself, but in a pinch it's nice to know REMnux has this built it by default with the 'base64' command.  If you just try :

$ base64 -d stream_122.dmp > decoded_file

you'll get an error stating "base64: invalid input".  You need to edit that file to only contain the Base64 data.  I popped it into vi and edited it so the file started like so:



 and ended like this:


Now that we got the other junk out of the file we can re-run the previous command :


$ base64 -d stream_122.dmp > decoded_file
 

and if we do a 'file' on the 'decoded_file' we see we now have a TIFF image:

$ file decoded_file




To see if it matches what we saw in the other analysis we can take a look at it through 'xxd' :

$ xxd decoded_file | less 


The top of the file matches and shows some if its commands and the bottom shows the NOP sled in the middle down to those *nix commands :


Method 2


Lenny had a good write up on using peepdf to analyze PDF and its latest release added a couple of other handy features.  Peepdf gives you the ability to quickly interact with the PDF and pull out information or perform the tasks that you are probably seeking to accomplish all within itself.  It's stated that you can script it by supplying a file with the commands you want to run ... and why that might be good for somethings like general information I found it difficult to be able to do that for what I was trying to do. Mainly, on a massive scale I would have to know exactly what I wanted to do on every file and that's not always the case as is with this example.  To enter its interactive console type:


$ peepdf -i file.pdf

 This will drop you into peepdf's interactive mode and display info about the pdf :


The latest version of peepdf also states there's a new way to redirect the console output but since I was working on a back version on REMnux I just changed the output log.  This essentially "tee's" the output from whatever I do within the peepdf console to STDOUT and to the log file I set it to:


You may not need do the above step in all of your situations but I did it for a certain reason which I'll get to in a minute... Since we already know from previous tools that object 122 needs some attention we can issue 'object 122'  from within peepdf which will display the objects contents after being decoded/decrypted:


The top part of the screenshot is the command and the second half of the screenshot is another shell showing the logged output of that command which was sent to what I set my output log to (122.txt)  previously.  We already saw that we could use the built in 'base64' command in REMnux to decode our stream but I wanted to highlight that you can do it within peepdf as well with one of its many commands, 'decode'.  This command enables you to decode variables, offsets or *files*.  Since we logged the content of object 122 to a file we can use this filter from within peepdf's console - I wasn't able to do it all within the console (someone else may shed some light on what I missed?) but I believe it's the same situation where you need to remove the junk other than what you want to Base64 decode.  As such, if I just opened another shell and vi'ed the output log (122.txt) to only contain the base64 encoded data like we did earlier then I could issue the following from within peepdf:

> set output file decoded.txt
> decode file 122.txt b64

The above commands change the output log file of peepdf to "decoded.txt" and then tells peepdf to decode that file by using the base64/b64 filter :


I can once again verify my file in another shell with :

$ file decoded.txt

which as you can see in the bottom half of the above screenshot shows it's a TIFF image.

I've only outlined a few of the many tools within REMnux and touched on some of their many individual features but if you haven't had the time to or never knew of REMnux before I urge you to start utilizing it. Peepdf alone has a ton of other really great features for xoring, decoding, shell code analysis and JS analysis and there are other general tools like pdfid & pdf-parser but it's important to know what tools are available to you and what you can expect from them.

Tuesday, June 19, 2012

XDP files and ClamAV

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


updated 08/20/2012 - added two new signatures

There were some recent discussions going on regarding the use, or possible use of bypassing security products or even the end user by having a XML Data Package (XDP) file with a PDF file.  If you aren't familiar with XDP files, don't feel bad... neither was I.  According to the information Adobe provides, this is essentially a wrapper for PDF files so they can be treated as XML files.  If you want to know more about this file then take a look at the link above as I'm not going to go heavily into detail but note that the documentation is a bit on the light side as it is.  There're other things that can be included in the XDP file but for this post we're looking at the ability to have a PDF within it.


 Adobe states that :
"The PDF packet encloses the remainder of the PDF document that resulted from extracting any subassemblies into the XDP.  XML is a text format, and is not designed to host binary content. PDF files are binary and therefore must be encoded into a text format before they can be enclosed within an XML format such as XDP. The most common method for encoding binary resources into a text format, and the method used by the PDF packet, is base64 encoding [RFC2045]."

Based on my limited testing, when you open a XDP file, Adobe Reader recognizes it and is the default handler.  When the file is opened, Adobe Reader decodes the base64 stream (the PDF within it), saves it to the %temp% directory and then opens it.

Brandon's post included a SNORT signature for this type of file but I wanted to get some identification/classification for more of a host based analysis.  Since I couldn't get a hold of a big data set I grabbed a few samples (Google dork = ext:xdp) and thought I'd first try TrID - but that generally just classified them as XML files (with a few exceptions) and the same thing with 'file'.  I can't blame them, I mean they are XML files but I wanted to show them as XDP files with PDF's if that was the case - that way I could do post-processing and extract the base64 encoded PDF from within the XDP file and then process it as a standard PDF file in an automated fashion.  

I then looked to TrIDScan but unfortunately that didn't work as hoped.  I tried creating my own XML signature for it as well but kept receiving seg. faults .. so... no bueno. My next thought was to put it into a YARA rule but I thought I'd try something else that was on my mind.  I've been told in the past to mess around with ClamAV's sectional MD5 hashing but that's generally done by extracting the PE files sections then hashing those.  Since this is a XML that wasn't going to work.  I remembered some slides I looked at a bit ago regarding writing ClamAv signatures so when I revisited them the lightbulb about the ability to create Logical Signatures came back to me.

ClamAV's Logical Signatures


Logical Signatures in ClamAV are very similar to the thought/flow of YARA signatures in that they allow you to create detection based on..well.. logic.  The following is the structure, the 'Subsig*' are HEX values... so you can either use an online/local resource to convert your ASCII to HEX or you can leverage ClamAV's sigtool (remember to delete trailing 0a though):
 sigtool --hex-dump
Logical Signature Structure:
SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;Subsig1;Subsig2;...
 

Looking back to Adobe's information they also mention that the PDF packet has the following format:

<pdf xmlns="http://ns.adobe.com/xdp/pdf/">
     <document>
          <chunk>
               ...base64 encoded PDF content...
          </chunk>
     </document>
</pdf>

ClamAV Signature

The beauty is that you can create your own custom Logical Database (.ldb) and pop it into your default ClamAV directory (i.e. /var/lib/clamav) with the other databases and it'll automatically be included in your scan. While just detecting this may not indicate it's malicious, at least it's a way to detect the presence of the file for further analysis/post-processing.  So based on everything I now know I can create the following ClamAV signature :

XDP_embedded_PDF;Target:0;(0&1&2);3c70646620786d6c6e733d;3c6368756e6b3e4a564245526930;3c2f7064663e

Explained: 

XDP_embedded_PDF - Signature name

Target:0 - Any file

(0&1&2) - match all of the following

0
ASCII :  <pdf xmlns=
HEX  : 3c70646620786d6c6e733d

1
ASCII :  <chunk>JVBERi0
HEX :  3c6368756e6b3e4a564245526930
* JVBERi0 is the Base64 encoded ASCII text " %PDF- ", which signifies the PDF header.  It was converted into HEX and added to the end of the 'chunk' to help catch the PDF

2
ASCII :  </pdf>
HEX :  3c2f7064663e

update #1 on 08/20/2012 : 
The above first created ClamAV signatures works but I started to think that the '<chunk>JVBERi0' may not be next to each other in all cases ... not sure if they have to nor not by specification but this is Adobe so I'd rather separate them and match on both anyway..

XDP_embedded_PDF_v2;Target:0;(0&1&2&3);3c70646620786d6c6e733d;3c6368756e6b3e;4a564245526930;3c2f7064663e 


update #2 on 08/20/2012:

YARA signature:


rule XDP_embedded_PDF
{
meta:
author = "Glenn Edwards (@hiddenillusion)"
version = "0.1"
ref = "http://blog.9bplus.com/av-bypass-for-malicious-pdfs-using-xdp"

strings:
$s1 = "<pdf xmlns="
$s2 = "<chunk>"
$s3 = "</pdf>"
$header0 = "%PDF"
$header1 = "JVBERi0"

condition:
all of ($s*) and 1 of ($header*)
}


Questions to answer

Actors are always trying to find new ways to exploit/take advantage of users/applications so it's good that this was brought to attention as we can now be aware and look for it.  While the above signature will trigger on an XDP file with a PDF (from what I had to test on), there're still questions to be answered and without having more samples or information they stand unanswered at this point:

  1. Could these values within the XDP file be encoded and still recognized like other PDF specs
  2. Can it be encoded with something other than base64 and still work
  3. Will any other PDF readers like FoxIT treat them/work the same as Adobe Reader

Comments and questions are always welcome ... never know if someone else has a better way or something I said doesn't work.

Wednesday, May 9, 2012

What's in your logs?

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


I've had this on the back burner for a few months but I'm finally getting around to writing up a post about it.  I re-tested the scenarios listed below with log2timeline v0.63 in SIFT v2.12  and verified it's still applicable.

The scenario

I was investigating an image of a web server which was thought to have some data exfiltrated yada yada.. Log analysis was going to be a key part of this investigation and I had gigs to sift through.

Among a few other tools, I ran the logs through log2timeline and received my timeline - or so I thought.  There wasn't any indication that entire files couldn't be parsed or files that were skipped in the STDOUT so one would assume everything was successful- right?  Not so much.  I don't like to stick to one tool and this wasn't going to be any different.  I loaded the logs with a few other tools (Notepad++, Highlighter, Splunk, Bash etc.) and verified my results.  As a result of being thorough, I noticed that there were a bunch of lines from the apache2 error logs which were present in the other tools outputs but were noticeably missing in my timeline.  After some digging around and some additional testing with sample data sets I noticed there were a few problems.

The problems 

1) The apache2_error parser  says it has to match the regex of Apache's defined format or log2timeline won't process it:

"#       DOW    month    day    hour   min    sec    year           level       ip           message
  #  ^\[[^\s]+ (\w\w\w) (\d\d) (\d\d):(\d\d):(\d\d) (\d\d\d\d)\] \[([^\]]+)\] (\[([^\]]+)\])? (.*)$

  #print "parsing line\n";
  if ($line =~ /^\[[^\s]+ (\w\w\w) (\d\d) (\d\d):(\d\d):(\d\d) (\d\d\d\d)\] \[([^\]]+)\] (\[([^\]]+)\]) (.*)$/ )
  {
    $li{'month'} = lc($1);
    $li{'day'} = $2;
    $li{'hour'} = $3;
    $li{'min'} = $4;
    $li{'sec'} = $5;
    $li{'year'} = $6;
    $li{'severity'} = $7;
    $li{'client'} = $8;
    $li{'message'} = $10;

    if ($li{'client'} =~ /client ([0-9\.]+)/) 
    {
      $li{'c-ip'} = $1;
    }
  }
  elsif ($line =~ /^\[[^\s]+ (\w\w\w) (\d\d) (\d\d):(\d\d):(\d\d) (\d\d\d\d)\] \[([^\]]+)\] (.*)$/ ) 
  {
    $li{'month'} = lc($1);
    $li{'day'} = $2;
    $li{'hour'} = $3;
    $li{'min'} = $4;
    $li{'sec'} = $5;
    $li{'year'} = $6;
    $li{'severity'} = $7;
    $li{'message'} = $8;
  }
  else 
  {
    print STDERR "Error, not correct structure ($line)\n";
    return;
  }"

...so why some of the lines in the logs followed it, it was later noticed that others were far from the required standard and resulted in a loss of data being produced.  Examples of what I mean are :

cat: /etc/passwrd: No such file or directory
find: `../etc/shadow': Permission denied

As shown above, some of the logs were errors, permission denied statements etc. as a result of the external actor trying to issue commands via his shell (obviously not fitting the standard format).  Once I noticed not all of the lines were being parsed I checked what else this parser required to be a valid line and did a quick sed on the fly and found any log entry that didn't match the requested format and added a dumby beginning (date, time etc.) so it would at least parse everything. 

This could have been done in many ways, with other regex's etc. but for this example I just wanted a quick look to see exactly how many lines in the files didn't adhere to the standard format so I did it this way:

hehe@SIFT : cat error.log | grep "^\[" > error.log.fixed
hehe@SIFT : cat error.log | grep -v "^\[" > problems.txt
hehe@SIFT : cat problems.txt | sed 's/^/[Fri Dec 25 02:24:08 2010] [error] [client log problem] /' >> error.log.fixed

It was a quick hack, but not an ultimate solution.

2) Even though some of the files didn't contain valid lines, some of them were completely fine but yet still to my surprise, they weren't parsed.  It seemed that if certain lines were existent within the logs that they wouldn't get parsed ... maybe even the possibility that at some point log2timeline would just skip the rest of the files and not try to parse them at all :/


The testing

Here's an example of the type of data I used for the re-testing:

error_fail.log
cat: /etc/passwrd: No such file or directory
find: `../etc/shadow': Permission denied
 
error_mix.log
[Fri Dec 25 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico
cat: /etc/passwrd: No such file or directory
find: `../etc/shadow': Permission denied
[Fri Dec 30 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico

error_ok.log
[Fri Dec 23 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico

error_ok2.log
[Fri Dec 24 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico


*I flip-flopped the number of lines contained in the logs on occasion as well as the dates & order within a single file to test multiple scenarios and to see if certain lines were getting parsed.



Processing multiple files:

So here are two files, both containing all valid lines:
 
So let's try saying "*.log" for the file to be parsed:

...but by doing that log2timeline will only take the first file and skip everything else as the above image shows.  I'll admit that fooled me for a bit, I thought it would work.

If you supply the (-r) option you can't supply "*.log" as it'll result in an empty file (yes, I deleted the test.csv prior):

However, if you supply the (-r) option with a directory (i.e. $PWD) it will try to parse everything & tell you what files it can't open.  It will also tell you if a logs line couldn't be processed, however, it doesn't tell you from what file (if there's multiple being processed) :

and  it also doesn't state that it stopped and didn't continue parsing - If you look above, the error_mix.log had a date of 12/30/2010 after its invalid lines which doesn't end up in our results ...whoops:

So it looks like if there's an invalid line within a log being parsed that log2timeline will stop processing that file? :/ ... not much indication of that unless we already know what's in our data set being parsed.

Right about now some of you are saying... hey man, there's a verbose switch.  Correct, there is.  And while it's helpful to tackle some of the things I've mentioned, it still isn't the savior.  When I ran the following:

hehe@SIFT: log2timeline -z UTC -f apache2_error -v -r $PWD -w test_verbose.csv

I received this to STDOUT:



So the verbose switch told me it was processing the file, that it didn't like a line within the file and that it finished processing this file... but it still didn't process the entire error_mixed.log file again:


* same held true for very verbose

Now it's possible that this has something to do with the amount of lines that are read to determine if there's an actual Apache2 error log base :

# defines the maximum amount of lines that we read until we determine that we do not have a Apache2 error file
    my $max = 15;
    my $i   = 0;
But if that were the case I thought I'd see an error like this:

Ok.. so the above STDOUT at least tells us the file trying to be parsed couldn't because its first 15 lines weren't valid which goes along with the previously stated snippet about the 15 lines needing to be met.  So what happens if we add other files to be parsed in the same directory as that file, same notification? :

Nope - It appears we don't get any notification that a file couldn't be processed  _but_ with the (-v) switch on we get this information.  So at this point the error_fail.log doesn't have 15 valid lines so just for troubleshooting purposes I altered the error_mixed.log to contain the following:

(19x) [Fri Dec 25 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico cat: /etc/passwrd: No such file or directory
find: `../etc/shadow': Permission denied (15x) [Fri Dec 30 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico

This data set would suffice since there are at least 15 valid lines in the beginning of the file to be considered a valid file to parse so let's try to parse a directory with the new error_mixed.log file and two files with all valid entries (error_ok.log & error_ok2.log):


We see again that there was a file that contained an invalid line.  In the images below, we see that it appears the error_ok.log (12/23/12) & error_ok2.log (12/24/12) files were parsed but the error_mixed.log (12/25/12, <errors>,12/30/12) wasn't parsed.  The above STDOUT shows that it didn't like one of the logs lines but it doesn't state that it didn't parse it at all :/



Even with the verbose switch on it still didn't state any indication that it didn't continue parsing the file or skipped over any other parts of it besides the invalid log it pointed out.

Proposed Solution

I have some ideas of what can be done but I opened an issue ticket so others in the community could chime in as well.  I talked with the plugins' author @williballenthin and provided my test samples & findings and he agreed that there should be some others input into the solution.  Here' what I thought...
*the ticket has a typo, it was re-tested in SIFT v2.12 (2.13 wasn't out yet :) )

1) Either the first line in the error log has to fit the standard format or one out of the first x lines (right now it's set to w/in the first 15);  If not, spit out an error stating that particular file couldn't be parsed & continue onto the next log file if there are multiple since the next one may have valid entries.

2) As long as at least one line is found to meet the standard format, once a line is found that doesn't meet the standard format after that (i.e. doesn't start with [DOW month …]) then copy that information from the line before it (with the valid format/timestamp) and add it to the beginning so it meets the format and can be put into the timeline of events.

 

Conclusion

So why did I write all this up and why do you care?  Log2timeline is purely awesome.  It's changed many aspects of DFIR but there's always going to be improvements needed.  It's open source and for the community so the feedback will only make it better.  Someone else may be dealing or have to deal with exactly what you've come across so why not make it known?  It's crucial that you understand how the tools/techniques you're using work to the best of your ability.  If by solely relying on clicking buttons is your method of expertise, you're gonna get caught at some point.  Even though Willi didn't have these types of examples to test the parser on when he originally created it, I wanted to get this information out there because I fear that there are others who might not have realized what I did.  If I hadn't checked my timeline against other tools I would have missed key information for this analysis.  Do you double check your results?  Are you seeing the whole picture?

Monday, April 30, 2012

Let Me In

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


A few months ago I was doing some research regarding various ways incident responders could unlock both a live and dead system for an article I was publishing in Digital Forensics Magazine entitled "Let Me In". If you're not a subscriber to that magazine the article essentially listed some tools (Kon-Boot, Ophcrack ,Back Track, Inception etc.) and reasons for needing to perform such tasks (EFS, FDE, need to use the proprietary software on the system to open data etc.). While it was supposed to be in an earlier issue, it got pushed back to Issue 11 - May of 2012.

There was a good amount of content I had to trim out of that article so I decided I would write up a post to further elaborate about one of the sections – unlocking a live system. When I say ‘unlock’ I am simply referring to bypassing the authentication on the Operating System (OS) level and since Windows it is still the most dominant platform on the market it will serve as the main OS discussed. So why not just follow traditional methods and image the disk to perform forensics offline? There may come a time when you are presented with a locked system and are unable to shut it down because the volatile data is imperative to your investigation, it has Full Disk Encryption (FDE) or maybe it is a critical server. Whatever the reason may be, I asked the question - "What would you do?"

Considerations


Most modern techniques for unlocking a live system rely on the IEEE 1394, or FireWire interface. FireWire is a serial bus interface which allows for fast data transfer. The reason it is able to achieve this and why we care about it for Incident Response is because FireWire provides the ability to read/write directly to a systems memory through Direct Memory Access (DMA). By doing so, we are able to bypass the systems Central Processing Unit (CPU) and OS to circumvent any restrictions which would otherwise prohibit such ability. Before just jumping into trying these techniques you should test and validate your trials to ensure you are aware of the benefits, artifacts created and possible limitations. Some of the considerations that came to my mind were:

  1. Will you have physical access to the system?
  2. Does the target system have Full Disk Encryption (FDE)?
  3. Is there a FireWire port on the target system? If not can you insert an expansion slot (PCIe, ExpressCards etc.) as an alternative for a missing FireWire port? Will that FireWire port suffice?
  4. Whether or not the 1394 stack is disabled on the target system
  5. What OS and patch level does the target system has?
  6. How much Random Access Memory (RAM) does the target system has?
  7. Did the FireWire driver install successfully on the target system?
  8. Is this forensically sound and will it hold up as acceptable/repeatable if questioned in court? Let’s remember that if we choose to unlock the system we are actively writing back to the target system, which could mean we write outside of the memory we want or cause the system to blue screen.

Unlocking a live system with Inception


While the concept of using FireWire to bypass the Windows Lock Screen has been discussed and presented since 2004, most notably Winlockpwn by Adam Boileau which used raw1394, there wasn't a whole lot of development or maintenance of such methods. During my research into this area I came across a tool called "Fire Through the Wire Autopwn" or FTWAutopwn which provided a more stable and reliable means than previous tools, such as Winlockpwn. This was because it incorporated a new open source library called libforensic1394, which uses the new Juju FireWire stack and allows you to present a Serial Bus Protocol 2 (SBP-2) unit directory with original FireWire bus information from your machine to the target system. As previously stated, my article got pushed back an issue and as luck would have it the author of FTWautopwn changed the tool to "Inception" which is the same project just renamed and updated since my initial testings’.

* If you're interested in this topic I suggest reading this paper by Freddie Witherden.

Inception is actively maintained, which means its author is constantly adding new features, bug fixes, and more reliable unlocking techniques. I exchanged a few emails with the tools author back when I was testing the original FTWAutopwn and provided some feedback such as - when there's multiple signatures/offsets for a target, if the correct combo unlocks the system then quit and don't continue to try other combos. After going back to this tools site recently it appears new signatures and methods have been incorporated and a couple of the things I brought up have been addressed so it's nice to see the active maintenance.

This tool works great for Windows XP SP0-3 and Windows 7 x86 SP0-1, however, it may be a hit or miss if you are trying it on Windows x64 systems based on my testing a few months ago - but again, you might have more luck these days. The main reason you might fail at unlocking is because the method it uses relies on the signature it is patching to be at a specific offset and on 64 bit systems the offset address is less stable and more likely to change. If the signatures and offsets within the configuration file are not working for your scenario and you have some disassembly knowledge, you can load the specific msv1_0.dll version into a disassembler and determine the signature/offset combination that you need to add to Inception. Instead of re-posting how to do this, check out here and here.

In Windows, the Dynamic Link Library (DLL) msv1_0.dll is the Microsoft Authentication Package, which is responsible for validating a users' password. Within this DLL is a function called 'MsvpPasswordValidate' which is responsible for performing a comparison between an entered password and the correct password. Inception patches this comparison to say that the correct password was entered regardless of what or if anything was entered at all. Since this is all done in memory, the patching is not persistent and restarting the system will restore to its normal authentication (that's if all goes well of course).

Once you have your system properly configured and DMA access to your target system, choose which target you want to unlock and if you are successful you will see a screen similar to (screenie is from FTWAutopwn):


Dumping the memory of a live system


Besides for being able to unlock a live system on the fly, the libforensic1394 library also provides a means for dumping the memory of a live system. If you take a look at the authors’ paper[PDF] he provided some additional insight of how to do this. The only additional requirement missing is a little knowledge of python. While doing my research I came across another paper[PDF] where a researcher was testing Mac OS Lion memory acquisition using FireWire. While he also utilized the libforensic1394 library he additionally included a PoC python script to dump the memory of a live system. This was another bit of information I passed along to @breaknenter and looks like the updated tool incorporated this feature as well (score).

Start-up script


Instead of remembering what commands need to be entered, what files need to be downloaded and what packages are required I wrote a simple setup script for BackTrack to automate the process. Additionally, it was written to be used with a non-persistent system (Live CD/USB) as well as a system with a persistent configuration. In my opinion, creating a USB with persistent storage works the best but if you are going to run this type of script on a non-persistent system, Internet access is required unless the files/packages required are downloaded prior and stored on some other removable media that would then have to be configured in the script as well. Since the tool has changed and the new version has its own setup script I'm not sure if it’s worth changing my start-up script :( ... I don't believe Inception checks for the all the required files (libforensic1394 etc.) and if you're using Inception on a distro like BackTrack I don't think it will set the environment accordingly so if I see a need I'll make some modifications accordingly.

Wednesday, April 25, 2012

Deobfuscating JavaScript with Malzilla

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


I was asked a question a little while ago from a fellow forensicator about deobfuscating some JS that he came across.  The JS didn't take long to reverse but I suspect there are others out there that would benefit from a quick post regarding another way to go about this task.  While there's jsunpack, js-beautify etc. I chose to run it through Malzilla for this example.

The structure of the JS was noticeably familiar and turns out to be related to an exploit pack; which is a common source of where a lot of the JS you might come across in the DFIR field results from these days.  These types of kits make it point-and-click easy to not only distribute malware but also make it uber-easy to obfuscate the code on their pages.

The first thing to do is copy out what’s in between the ‘<script>’ tags and place it in the top box of the ‘Decoder’ Tab within Malzilla - we don't need the other <html> tags etc., we only need the goods.  Next step is to get rid of what we don’t necessarily need at this point (shown commented out with ‘//’).  This will vary depending on what you're analyzing and may take a bit more knowledge to realize but just remember what your goals are - there will be junk thrown into the mix and since all I care about at this point is to see what gets produced (URL etc.) the top part didn't look relevant for helping me get my question answered :



At this point you have a few options (1) replace the eval() (2) run it through debugging to verify it's working (3) run the script.  Everything looks good enough to work so let's just go ahead and choose to run the script:



Note that even though the bottom text displays “Script can’t be compiled” (seen above) … the eval results were still produced.  To see the results, click on ‘Show eval() results’ then double click on each of the results (one in this instance) and the results will be displayed in the lower pane – this time showing the produced iframe :


There's generally always more than one way to get the results you require so hopefully this will help some of you next time.

Thursday, April 19, 2012

YARA + Volatility ... the beginning

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


YARA - the sleeping giant.  There's been mention of it over the last few years but as far as adoption - I think it's still lacking in the tool set of many analysts. I personally like to leverage YARA on its own, within pescanner and most definitely within volatility's malfind.  I've recently encountered two obstacles (1) Converting ClamAV to YARA signatures and (2) How to process multiple YARA rule files.  If you take a look at page 26 of YARA's v1.6 User's Manual you'll see it outlines an option to include multiple rule files from within a single file (thanks Par).  In other words, if you use the standard syntax for calling YARA from the cli "yara /path/to/rules.yara <file>" you can't specify multiple rule files (without some foo of course).  Another prime example is within MHL's pescanner where you define the location of your rules file at the bottom, but again, a single rules file:


The above image shows the configuration within pescanner where you define the path to your YARA rules.  This particular example is taken from REMnux and is already filled out, generally it's left blank for your own configuration.

The use of the 'include' feature is one way of circumventing such a restriction because by placing this with the path to your other rule files to the top of the main rules file you're invoking, YARA will automatically process those additional rule files as well.  Here's an example of what I mean:

Simple and straightforward.  Just pop that syntax into the top of your main rule file and you're good to go.

So.. cool right?  Sort of... maybe useful if you have certain rule files you want to use for certain things, like pescanner, but I have a lot of files :/ . If you don't have many rule files then sure... but what if you have a bunch of different ones and foresee yourself continuing to split up or create new ones?  Having to constantly update the main rule file with an "include /path/to/new/rules.yara" every time just sounds like too much upkeep.  Say what..you don't see yourself having that many rule files for it to be a concern you say? ... Well what if, for example, you convert the ClamAV signatures to YARA rules?

The Malware Analysts Cookbook provides such a means with clamav_to_yara.py.  At the time of writing this there is an open issue with this script but there are a couple modified versions which work a bit better - still produce some errors, but not nearly as many.  There are a few tutorials out there on how to convert ClamAV signatures to YARA rules and it looks pretty straight forward, but I found some things have either changed or people just left out details.  If you have a fresh install of ClamAV you need to make sure you unpack its signature file before you can use the conversion script on it.  This can be done using ClamAV's sigtool:
$ sigtool -u /var/lib/clamav/main.cvd
which when complete will leave present you with the following:


Once you have the .ndb file you can proceed to converting as follows:
$ python clamav_to_yara.py -f main.ndb -o clamav.yara
Based on what I've encountered I believe depending on what version of the ClamAV signature DB you have and which version of the clamav_to_yara.py script you have, you may or may not get some signatures which YARA won't process.  I happened to get the problem child this time around and if get errors relating to invalid jumps etc. you can just remove those rules as needed since the errors are nice enough to tell you which lines it doesn't like.

The resulting file was ~18 MB of newly generated YARA rules based off the ClamAV signatures.. fwe.. that's a lot.  I tried multiple ways/attempts to get YARA to use this rule file but failed every time.  My assumption was that it's just too big to process in a timely manner like all of the other (smaller) rule files.  But I had a thought... so I started to split this big ol' file into smaller chunks and wanted to see at about what size would be ideal.  Finally at ~512K it seemed to be pretty fast and effective.  To split the file in an easy fashion you can use some form of the 'split' command... i.e :
$ split -d -b 512k clamav.yara
* if you split based on size like I did here you need to realize that it's going to cut the top/bottom signatures into pieces because you're only taking size into consideration and not splitting based on the signatures' structure. This can be easily fixed by going through each one and re-assembling just those two rules but if you don't do this, it's going to scream about the broken rules.

If you did the math, now you can see where I'm going.  This little workaround produced (33) YARA rule files and no, I don't want to add them all statically in case something changes.  When I'm doing some Volatility automation I usually define the path to my YARA rules in the beginning, i.e. :
YARA_Rules="/path/to/capabilities.yara"
but because of what we've just found out, a simple workaround to use instead is:
YARA_Rules=(`find /path/to/rules/ -type f -iname *.yara -exec ls {} \;`);
What this essentially does (in Bash) is point to the location where you keep all of your YARA rules and then it will list them all so that you don't have list them one-by-one... you can then parse them in an array:
for rule in "${YARA_Rules[@]}"; do 
and then pass them to the normal volatility syntax from within your volatility automation script, i.e:
YARA_Rules=(`find /path/to/rules/ -type f -iname *.yara -exec ls {} \;`);  

for rule in "${YARA_Rules[@]}"; do
   vol.py -f <mem.raw> --profile=<profile> malfind -Y $rule -D /path/to/dump/directory >> log
done 
Hopefully my troubles and workarounds will help someone else out there.. as always, ping me for feedback, tips etc.

Monday, March 26, 2012

Making Volatility work for you

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


Lately I've been spending some time customizing Volatility to meet some of the needs I was facing.  What were they?  I needed an automated way to leverage Volatility to perform an analysis and while doing so I noticed there were some small changes to some of its files that I wanted to make so certain information was displayed differently.  The latter is what I'm going to quickly touch on in this post as others may find it beneficial for their own needs and to me personally, just made sense to display the output as I'll show.  While there's a few branches, the following will be focused on the current trunk (v2.0.0) at the time of writing this.  I put in the line numbers but in disclosure, things are always changing so look for the text instead of the line number and you're likely to get a better hit.

* I'm not a Volatility expert, I just wanted things displayed differently for my own needs.  If there's something I did wrong or could've done a different by all means drops me a line *

The below set of modifications resulted from analyzing the output of some plugins that 'dump' files from the memory image.  I noticed that the current way those dumped files were being displayed were stuck with some static text instead of displaying useful information I cared about.  After the static text the naming convention consists of the PID and sometimes the base address followed by a static file extension depending on the plugin.  Now what I didn't want to have to do was look at all of the dumped files and then have to lookup the process name corresponding to the PID.  All of that information is already there so why not include the "process name+PID+base(varies).extension" and so-on?

With the information presented in that new format I no longer have to look in separate places to understand that I'm looking at and saves me a step - sometimes that could mean a lot of time if I have a lot of dumped files to correlate with another plugins output.

The 'procdump' plugin dumps files with the following naming convention: 'executable.pid.exe'.  Note that this plugin has two lines to change unlike the other examples later on:

File: /path/to/volatility/plugins/procdump.py
Line: 58
From: outfd.write("Dumping {0}, pid: {1:6} output: {2}\n".format(task.ImageFileName, pid, "executable." + str(pid) + ".exe"))
To: outfd.write("Dumping {0}, pid: {1:6} output: {2}\n".format(task.ImageFileName, pid, task.ImageFileName + "." + str(pid) + ".exe"))


Line: 59
From: of = open(os.path.join(self._config.DUMP_DIR, "executable." + str(pid) + ".exe"), 'wb')
To: of = open(os.path.join(self._config.DUMP_DIR, task.ImageFileName + "." + str(pid) + ".exe"), 'wb')

After modification we got rid of the static 'executable' text and added the actual process name to the output file name... much better.

The 'dlldump' plugin dumps files with the following naming convention: 'module.pid.procOffset.DllBase'
File: /path/to/volatility/plugins/dlldump.py
Line: 94
From: dump_file = "module.{0}.{1:x}.{2:x}.dll".format(proc.UniqueProcessId, process_offset, mod.DllBase)
To: dump_file = "{0}.{1}.dll".format(mod.BaseDllName, proc.ImageFileName)

There's many ways to change the output around but for this example I got rid of the static 'module' text and modified it so it saves as 'what's being dumped.where it came from.dll'.. this could include the PID, offset, base etc... what fits your needs?

The 'moddump' plugin dumped files with the following naming convention: 'driver.modBase.sys'

File: /path/to/volatility/plugins/moddump.py
Line: 100  
From: dump_file = "driver.{0:x}.sys".format(mod_base)
To: dump_file = "{0}.{1:x}.sys".format(mod_name, mod_base}

Once again, you can see that after the modification the dumped SYS file now starts with the actual process name... yes there's an extra file extension in the beginning but this is just giving examples - you're free to change as needed.  For me, the biggest thing was just pulling all information into the dumped file so I didn't have to look in multiple places.
The point here... open source is great.  You have the ability to give back to the community and customize it to meet your needs so if you want something changed, as I did, don't settle and change it... just be cautious of the projects updates in case one of them conflicts with your modifications.