Pages

Monday, October 1, 2012

dbmgr reloaded

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


I recently had a discussion with another coworker regarding scenarios where you can try and determine if something malicious is or was on a system based on mutexes.  For those unfamiliar with what a mutex/mutant is, a definition:

"Stands for Mutual Exclusion Object, a programming object that may be created by malware to signify that it is currently running in the computer. This can be used as an infection 'marker' in order to prevent multiple instances of the malware from running in the infected computer, thus possibly arousing suspicion."

Mutexes are referred to as mutants when they're in the Windows kernel but for the purpose of this post I'm going to only refer to mutexes even when mutant might be the correct technical term (deal with it).  So in theory, and in practice, by enumerating mutexes on a system and then comparing them to a list of mutexes known to be used by malware you would have good reason to believe something malicious is/was on the system - or at least a starting point of something to dig into if you're in the 'needle in a haystack' situation.  During our conversation I remembered a script from the Malware Analysts Cookbook which scraped ThreatExpert reports and populated a DB (Note : This script requires the 'avsubmit.py' file from the MACB as well since it takes the ThreatExpert class from it).  After taking another look at the script, I figured it would be less time consuming to modify it to fit my needs instead of starting from scratch.  This idea can be implemented across other online sandboxes as well but in this instance I'm just going to touch on ThreatExpert.

I grabbed the latest copy of the 'dbmgr.py' script but when I went to verify it was functioning properly prior to making any modifications I ran into a tiny hiccup.  As a result of a simple grammatical error within this version of the script, the processing would come to a halt and not complete ... I submitted a quick bugfix and within ~2 mins MHL acknowledged the issue, commented and fixed it.  I know it was a small fix but man, what service!

Now that there was a working copy up I took a look at the params/args which ThreatExpert made available and noticed I could use the 'find' parameter in addition to the 'page' parameter (which the script already included) and supply it with whatever I wanted to search for within the archived reports.

The addition of  'sl=1' is credited to another post MHL pointed out a little while ago where another user noted this would filter ThreatExperts results to only show 'known bad' ... after all, for the purposes most of us will be using this for, we don't really want to have 'good' results.  When you query ThreatExpert you receive ~20 results per page and ~200 pages max from what I've seen.  The other post mentioned above included a quick external bash script to loop the dbmgr.py script and supply it with a new value to grab different pages for bulk results.  To make things easier, I added another def to the script so you have the ability loop through multiple result pages and I also put in a simple check to stop processing results if there's no more left (i.e. - if you tell it to search 5 pages but only 3 are returned, instead of trying to process the last two it checks for the 'No further results to process' text which ThreatExpert produces and exists).



Example search terms which might be of interest:
  • mutex
    • would  produce results which have a greater chance of containing mutexes since that's a required word within the report based on what we're querying.
  • exploit.java || exploit.swf
    • either of these would produce results which involve either 'exploit.java' or 'exploit.swf' in their A/V name
  • wpbt0.dll
    • could be used to look at reports involving a commonly associated BHEK file


There were also a few other cosmetic changes that you'll notice in the patch but those are mainly to display things a certain way I wanted to see them - but I also came across an instance where there was some funky encoding on a file name it was trying to insert which caused it to fail so I added a little sanity check there as well.



So what's the point of this all and why do you care?  One of the reasons which I mentioned above was to populate a DB with known malicious mutexes (without wasting time grabbing a bunch of other reports that aren't relevant to your needs).  



This becomes even more handy when you're analyzing a memory image and want to do a cross-reference with volatility's 'mutantscan' command.  In fact, if you read the blurb under that commands reference you'll notice the volatility folks actually mentioned a similar PoC they tested so it's good to see others thinking the same way.  Other ways of interest could be to populate a DB and start to put together some stats regarding which registry keys are commonly associated with malware, which registry values, common file names, common file locations targeted, IP addresses contacted via the malware etc.. there's a wealth of data mining that can be done and the great thing is (1) it can be automated and (2) you don't have to have the samples or waste the time processing them in your own sandbox as you can just leverage this free resource.

If you want to play around with the patch I put out, head over to my github and follow the instructions for patching the original version.

:: Note - during recent testing I noticed I wasn't getting results but I believe this might be due to something on ThreatExpert's side, or I'm just being throttled... either way, it works but just be aware in case you aren't getting results every time (even with the original script) ::

Wednesday, September 19, 2012

SWF-ing away

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


:: Disclaimer - the intent of this post is for educational and research purposes only.  Don't be lame and use it to steal copyrighted material ::

There's been quite a bit of chatter lately with the recent discovery of the latest IE 0-day.  While reading through one of the other researchers posts I decided to take a deeper look into some of the files being used in these reported attacks.  The issue that some might be experiencing while trying to analyze the related flash files, as did I, is that they're encrypted with doSWF and therefore take a little more effort in order to get down to what we care about.  A quick search about how to go about decrypting this particular type of encryption led me two good articles (one | two).  

With the addition of another user posting a decompiled version of the ActionScript within the file I was looking at I decided to give a quick look into this referenced script and modify its 'decrypt' function to correspond with the information provided.  I thought it would be an easy task but later turned out to result in errors - Java is something I don't care to debug... The thought of having a script to aid in the future automation of decrypting such files would be helpful and might be re-visited but there's also a learning aspect to doing it manually.  With that being said I opted to go about it in a different approach (attaching OllyDbg to IE and dumping the SWF from memory) which would be repeatable in future analysis efforts even if the type of encryption used was different - which makes it more reliable/flexible in my eyes.  There will be some overlap here to previously linked posts but instead of just saying what was done I feel it's useful for others to see how to do it.

So after wiping off the dust from OllyDbg I was able to get the end results I was seeking from my analysis by performing the steps outlined below.  Note that while some of the steps and content below are specific to the file I set out to analyze, other steps can be applied to help analyze other situations you might encounter.

  1. This initial step can be bypassed depending on what your analyzing and goals are but for this particular situation I started off by commenting out the part of the initial landing page which was responsible for initializing the variables and just left the part in which loaded the SWF file:

  2. Open up IE (this could be another browser, again depending on what your analyzing)
  3. Open OllyDbg and attach the IE process (File > Attach).  If you have more than one instance of IE open, make sure you are attaching to the right one!
  4. Open exploit.htm in IE which will load the SWF file
  5. Locate the SWF loaded within IE's memory.  Go back to OllyDbg:
    View > Memory Map > right click > Search (Ctrl + B)
    The search criteria will be dependent on what you're looking to locate, and be mindful of little endian.  In this case I decided to search for "doSWF" which would be displayed here as "64 6F 53 57 46" in HEX.

     
  6. There may be multiple hits of the text you are searching for - each of which will pop up in its own 'Dump' window.  Take a look at the context of where the found text is located and continue (CTRL + L) until you get one that looks like it's right.

  7. Once you come across what looks to be your decrypted SWF file, within its Dump window; right click > Copy > Select All .  Now you might disagree with this approach and say why copy everything out instead of just what you're after but I'd rather copy it all out and worry about carving the exact SWF file later rather than manually calculating the correct SWF length and carving it out from OllyDbg (more on that latter).

    Once it's all highlighted; right click > Binary > Binary Copy


    Open a hex editor (HxD in this example) and paste the copied information we took from OllyDbg into a new HEX file; Edit > Paste write (CTRL + B )

    You can scroll down to see the 'goodies' which also help in showing it's not encrypted anymore since these are strings we were unable to previously see:
    • File header (46 57 53 | FWS)
    • iFrame reference and 'call' statement to blob

    (image was widened to show it all, not all hex columns are shown as a result)
  8. Since I copied everything over you can obviously see there's other junk there which we don't care about and will prohibit us from solely having the sought after SWF file.  As mentioned earlier, this can be solved by either manually carving it out based on Adobe's SWF specifications [PDF]:

    •   first 3 bytes = header
    •   next 1 byte = version(8 bit #, not ascii representation)
    •   next 4 bytes = total length including header (varies if compressed)

    • 46 57 53 = FWS
    • 09  = version 9
    • BD 18 00 00 = 18BD (little endianed HEX or 6333 (decimal)
  9.  (you can see these corresponding values to the carved SWF in an image below)

    ...or have a tool help you out.  While it's useful to know the specifications of what you're analyzing, having a tool to help you limit the chance of you creating an error is always nice so I opted use Alexs' kick-ass tool xxxswf to do the thinking and lifting for me.
       


    No matter how you go about it, once you've successfully carved it out you should now have just the unencrypted SWF you can continue your analysis:


  10. Open up the unencrypted SWF with your tool of choice for analysis.  If you're using Adobe's SWF Investigator then copy out the disassembled text by; SWF Dissasembler tab > click 'open with text editor'.

    Once you have the dissasembled code, scroll down until you see the blob being passed into the ByteArray and copy out what's inbetween the quotes:



  11. Take that copied out data and paste into a hex editor.  Since we see some 'eval' occurring and this blob being a variable in the ByteArray I'm going to say this looks be the shellcode so let's go ahead and pass this extracted code within a shellcode analyzer for ease (scDbg/libemu in this example).  When initially analyzed it detects that the data is XOR'ed with the key '0xE2' :

    In the output of the first run through libemu we noticed there was an error so by adding the '-d' switch and running through it we get the following:
  12. Well that's helpful - now we don't have to question if it is and/or search for the key since it's already provided to us.  This allows us to move on to reversing the XOR with a capable program (xor.exe in this example):


  13. A quick test of determining what the actual file is (shown in the second command above) after being reversed shows it's just 'data'.  Hm, it doesn't appear to be what I'd expect as a result of the shellcode or did something fail when performing the XOR?  If we view the strings of this data we see it displays a link to a file which could indicate a 2nd stage download:



You can continue your analysis as needed from here but for the purpose of this post hopefully this showed you a thing or two that you either previously weren't aware of or was causing a road block in your analysis efforts.  As always, there are many ways to accomplish the task at hand and if you have a more efficient way to accomplish what I've touched by all means pass it along.

Tuesday, July 17, 2012

Customizing cuckoo to fit your needs

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


With the talk of the .4 release of cuckoo to be publicly released shortly I figured I should get this post out as some of the things I talk about here are said to be addressed and included in that release.  If you don't want to wait for that release or something I touch on here isn't included in that release then hopefully the information below will be of use to you. In full disclosure, I'm not a python guru so if you see something that could have been done an easier way or something turns out not to be working for you please let me know...I found out the hard way python is strict on spacing.  Throughout my testing it all seemed to work fine for me but there may be some scenario I didn't test or think of.

(patches available on my github)

General Notes

The installation notes are pretty straightforward to get you up and running and after you successfully do it the first time, any subsequent installation process should be even faster for you.  There are a couple of notes worth mentioning though:
  • The first user you create during your Ubuntu installation is an admin user.  This is important to remember if you want your cuckoo user to be a limited user.
  • When you add the cuckoo user to its group, you need to log out and log back in for it to take affect.
  • To ensure there are no permission issues, you should do the virtualbox setup as the cuckoo user instead of another admin/root account.
  • If during your analysis the VM isn't able to be restored or you need to kill cuckoo.py then you need to run virtualbox after and take the vm our of 'saved' mode by discarding it.
  • If you are installing 3rd party applications (and you should be if you want to test exploitation), make sure you're properly pointed to them within their appropriate analyzer file "/path/to/cuckoo/setup/packages"
  • There's a default list of hashes for common programs that are automatically discarded in the dropped files section so be aware of them "/path/to/cuckoo/shares/setup/conf/analyzer.conf"

Patching

Instead of re-posting all of the files in the cuckoo repo I decided the easiest way to go about releasing these patches/modifications was to utilize the diff & patch commands in *nix. To create the patches:

diff -u 'original' 'new' > 'file.patch'

and once the patches are downloaded from my github, all you need to do is run:

patch '/path/to/original/cuckoo/file' < 'file.patch'

Customization

Web Reports/Portal

At first I couldn't understand why I was able to continuously reanalyze a sample but when I thought about it , it made sense.  Since cuckoo gives you the ability to analyze a file in multiple VM’s, it has to be processed more than once (duh)…maybe a better approach would be to only have that sample be analyzed once by the same VM.

In the main web portal page you are presented with a single search box to search for a files MD5 hash. For convenience and as a time saver I hyper-linked the files MD5 hash in the general information section as well as the dropped files section so you can quickly see if/when it was analyzed previously instead of having to copy and paste it in the main search box every time.

I didn’t want to clutter up the general information section of the report with all of the scans and lookups I was adding to the report so I created two other sections for the report (signatures & lookups).


Signatures


Within the signatures section I added the following ClamAV (2 versions) and YARA.  If you have other scan engines you wish to run against your files then the same type of method could be re-used.  With all three of these features you need to configure the location to their corresponding signatures within "/path/to/cuckoo/processing/file.py".

ClamAV

(besides for above noted change, you also need to edit the path to your clamscan)
I’m a fan of ClamAV and the numerous ways it can be leveraged just make it ideal to have included in my automated processes.  If you’ve read the Malware Analysts Cookbook  (MACB) you might recall that there’s some really handy code made available and one of which shows how to do exactly what I wanted to do – scan the files with ClamAV and show the results.  I don’t like to re-do what someone else has done if it works how I need it to so I made one or two modifications and plugged it in as necessary.

Custom ClamAV

Using the traditional signatures database from ClamAV is good but it can also be worthwhile to create some of your own signatures (remember how logical signatures can be a big help) so I also added a section where you can point it to your custom ClamAV database so it can pickup on other signatures you’ve personally written/acquired.

YARA

On the cuckoo mailing list I came across another user who said he had patches for implementing YARA into cuckoo.  If you’ve read any of my past posts or follow me on twitter you’ll know that I’m a fan of YARA’s capabilities and as such contacted him to see what he had wrote.  The patches themselves were very straightforward and since they worked I didn’t see a need to change them.  He provided me a link to them on his personal GDrive so if you only want to implement that feature into cuckoo then you can use his files, however, the files I’m releasing have that already implemented so no need to do double the work otherwise.  When/if more than one YARA rule is matched, they'll be comma separated within brackets.  The additional files needed besides for for the ones in my github that you'll need to download and install are:
  • http://yara-project.googlecode.com/files/yara-1.6.tar.gz
  • http://yara-project.googlecode.com/files/yara-python-1.6.tar.gz

    Lookups


    The looksups section only contains two actual lookups at the moment but also contains what I refer to as ‘future linkage’.  I didn't add the lookups section to the dropped files section because I plan on analyzing them automatically with the modifications mentioned earlier and that would just be too repetitive and a waste of a time.  As far as actual lookups I put in Cymru and VirusTotal for right now so if there’s Internet they will pull the last time the sample was scanned/seen with their services and the A/V detection rate (note - I'm only querying for the hashes, I don’t like submitting for a few reasons).

    Team Cymru 

    Team Cymru offers a couple of very useful services and one of which I use during investigations is their Malware Hash Registry (MHR). MHR will take the hash(es) you supply it and tells you if it’s a known bad file, the last time they've seen it and an approximate percentage for A/V detection. MACB also had a recipe for adding this to a script so once again I just modified as necessary and inserted to fit cuckoo.

    VirusTotal

    There are a few scripts online to utilize VirusTotals API and submit/query their site but I decided to use this script.  You can use any method you'd like but if you use the patches I provided just install that script and supply your API key in "/path/to/cuckoo/processing/file.py".  I didn’t want to overly insert code into the existing cuckoo files so I opted to build this file and then import it from within cuckoo.  Essentially I take the files hash and try to get a report of it and if it exists just pull last scan date and detection rate.  While it can be useful to see what the A/V's detected it as, I didn't want to waste time making a collapsable table including all of this information if the new release of cuckoo will already do this.  If it doesn't, then I'll re-visit it.

    If the sample doesn't have any VT detection or exist then I have it just state that and if there’s no current Internet connection then state an error.  The latter is very important because I’ve seen others trying to stuff this capability into their code but they fail to address the scenario when there’s no Internet connectivity and therefore their report will fail to be created because they don’t handle the error created.  I wrote it so it would be generic in catching an error because I don’t want my report to fail because of this so if there’s no Internet connection or another error (note that this will also suppress the error that your API key may be wrong!) and the rest of the report is fine to generate then it can still generate.  The same hold true for the snippet for the Cymru check.

    Internet connection and results found :



    No Internet connection :



    Internet connection and no results found :


    Future Linkage

    I thought it was useful to pre-link the samples to common online sites people use for additional reference/analysis (malwr, shadowserver and threatexpert).  Instead of slowing up the analysis by trying to pull down all of these reports if they exists then parse them I decided it was just easier to create a link for them based on the samples hash that way even if the sample hasn’t been analyzed on any of these sites at the time of my analysis, I could go back to them at a later time and check if a report exists since then.  Just another way to save some time and make life easier.

    Dropped Files

    Cuckoo will take any dropped files during the analysis of the sample and copy them back over to the host machine under the structure "/path/to/cuckoo/analysis/<#>/files".  By default those files are just left in that subfolder and not analyzed (they will have basic information such as file type and hash in the report though) but I felt it didn't make sense to just leave them in that sub-directory (at least for my goals) so I added the following opted to change "/path/to/cuckoo/processing/data.py" so it would take those files and move it to my samples directory (/opt/samples):
     shutil.move(cur_path,'/opt/samples')
    This samples folder is the folder that I'm going to monitoring for new/created files and automatically process them to be analyzed as mentioned later via the watcher.rb script.  Once I did that I noticed another side affect... if there was a queue in the samples directory and the files being moved from the dropped files folder to the samples folder were the same then it would crap out.  I thought the move command would overwrite it but it didn't.  I figured this could be fixed by either copying instead or what I chose to do, check if it exists and if so just delete it from the dropped files folder since it was going to be processed anyway:

            check = os.path.join('/opt/samples/', cur_file)
            if os.path.exists(check):
                os.remove(cur_path)
            else:
                shutil.copy(cur_path,'/opt/samples')
                os.remove(cur_path)

            return dropped
    This may not be something that everyone feels they want to do since one obvious consequent I could think of was that since every file is being moved out of the dropped files directory, any special configuration file etc. that you might be interested in won't be there (unless you do file type identification and only move files which can even be processed or if a file can't be processed, move it out of the samples directory to another folder to store dropped files that couldn't get processed i.e. - html files, js etc.).  Another reason might be because it may end up being a continual loop.  Some malware will go out and download another copy of itself etc. and as such by continuing to automatically analyze them will just cause a loop.  This will vary of course by sample, if the Internet is connected and what you want out of your analysis.  Other than that, your analysis task numbers might rise quickly but that shouldn't be on concern because you aren't going to have a sequential set since there's going to be times when a file can't be processes.


    Samples Directory Watcher

    Melissa wrote a post a little bit ago on integrating cuckoo with NTR and in that post she touched upon the usefulness of having a script running to automatically realize that a new file was created or moved to a certain directory and then take action on that file.  I thought it was nifty and since it was already built into Ruby, I wasn't going to try and hack something else together and see how it held up.   I've read that INotify can be a memory hog so that's something that should be paid attention to although I haven't had any noticeable issues thus far.  If you read the original post you'll soon realize there's some typos... Melissa pointed one out but there are a couple others that might make you frustrated when troubleshooting and to make things easier, I took care of them already.  To get this directory watcher up and running do the following:

    sudo apt-get install ruby rubygems 
    sudo gem install rb-inotify  

    Update 07-18-12 : ... had the wrong command for installing rb-inotify

    Download the modified watcher.rb script (on my github too) and edit it to point to the directory you want to watch and the script you want it to execute upon an action/event occurring.  Instead of having an interim script here you can just pass the new sample to "/path/to/cuckoo/submit.py" but I realized I needed an interim script because the sample might be password protected or in a format that cuckoo wouldn't take (i.e. an archive file).

    That's the basic customization you need to do for this script, however, you can change it as you see fit.  Initially when I was talking to some Ruby gurus they said that using the IO.popen method was overhaul for what I wanted to do since all I'm essentially doing is passing along a string (new file created/moved) to another file to process.  For testing purposes, I changed it to use exec instead... which worked, but would kill the watcher script after each event.... and that basically killed the purpose of me even having it running so I opted to keep the original method. Once you have all of the pre-reqs installed and the script modified to your needs just open another tab in your shell and let it fly (you don't need the '&' at the end but I like to get my terminal back):

    ruby watcher.rb

    Archive Parser


    If you’re like me then you might have some emails which contain malware samples as attachments or download/get sent password protected archives with possible malware.  If you hand cuckoo an archive or email file (pst etc.) then nothing will happen as it doesn't have a default module to handle them.  As far as the email situation goes, the sheer thought of individually saving each sample one by one doesn’t sound like fun so figured within the interim script I'm calling from the watcher script that there would be a check for a Microsoft Outlook data file and if so, run pffexport against the file.  The thought process is to basically just recursively extract everything out of the the email messages and attempt to process them with cuckoo (if you install libpff, remember to sudo ldconfig after you install it).


    To address the archives/pw protected archives issue I try to identify it as an archive file and if so, try to unzip it both with and without a password.  I wasn’t aware that if you supply a wrong password to unzip a file with 7zip that it will still unzip the archive if it turns out that there isn’t even a password protecting the archive (thanks Pär).  I also have a little array set up which contains some of the common password schemes used to password protected malware archives that way I could also add to it in the future (sort of like a dictionary).  

       Additional Software

      Depending on the installation you're performing and what additional features you're going to be installing there might be some additional software required which could include:
      • YARA
        • sudo apt-get install libpcre3 libpcre3-dev 
      • python
        • sudo apt-get install python python2.7-dev python-magic python-dpkt python-mako
      • ssdeep
        • http://sourceforge.net/projects/ssdeep/files/ssdeep-2.8/ssdeep-2.8.tar.gz/download
        • svn checkout http://pyssdeep.googlecode.com/svn/trunk/ pyssdeep
        • g++
          • sudo apt-get install g++
        • subversion
          • sudo apt-get install subversion
        • 7zip
          • sudo apt-get install p7zip

        To-do/Wish List

        • The cuckoo DB that's created "/path/to/cuckoo/db/cuckoo.db" only stores a limited amount of information within it.  Even though information regarding a files SHA1/256 hash, ssdeep hash, mutexes, IP/domains etc. are included in the samples report, they aren't stored in the DB.  This helps keep the DB to a limited size but doesn't help if I want to search my repository of analyzed samples for all samples which called a particular IP/host etc.  I didn't want to start changing big chunks of the code to implement this at this point because updates may kill it etc... so I think the better solution will be to only change the snippet which says which fields to create in the DB and to store other selected fields into that DB after analysis.  Another solution can be used to query that DB as it's a common task many of us do anyway.
        • The file identification process for determine what type of file the sample is and if it should be processed is pretty basic at this point.  It does the job but at times could use a boost.  A similar thing noticed is if there's certain characters in the samples file name then it won't get processed.  This looks like it could be a one or two line fix with something like Python's string.printable .
        • After talking with one of my friends about cuckoo he noted that he's observed not all of the dropped files from the sample being analyzed were being copied back over to the host after the analysis.  This is no bueno... and while I haven't verified this at this time, a simple solution looks to be installing CaptureBAT on the Windows VM and using something (xcopy or robocopy) to copy all of the files caught by CaptureBAT back over to the host after analysis.
        • I'm debating to add a switch so I can choose for the analysis to either run wild on the Internet or feed it something like INetSim for simulation.  There are pros and cons to each scenario and maybe a better solution is to use something like Tor ... but I'm up in the air.  As a side note, installing INetSim can be a pain and I'm spoiled as I'm used to it already being installed so other options to look at could be something like HoneyD
        • I'd like to modify some of the existing analyzers to run additional programs against a sample and report on their results (i.e. hachoir-subfile, pdfextract etc.)


        Monday, July 2, 2012

        Is that an infection or a false positive?

        This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


        Have you been in a situation where there's a file being flagged by A/V and you don't really agree? (...is that rhetorical?)  I was in a situation where I was noticing files being flagged as a generic variant of ZeuS and while at first you can't necessarily disregard the alert -no matter your feelings on the A/V- you can do a little digging and try to determine what's actually going on.  This  is not something you are or should do for every infection you come across, but rather a more practical use is to understand why certain files may be mis-classified when they are in fact benign.


        The particular A/V vendor that was reporting the alerts was classifying them as 'W32/Zbot.gen.*' ... the 'gen.b' was most noticeable.  I grabbed one of the files in question and started to poke around.  Some of the usual first steps led no where - internal hash lookups, external hash look-ups (cymru, VT etc.), pescanner had a generic YARA hit for banker based on a string which looked all too common, dynamic analysis didn't show anything... so I started to extend some of my initial steps:

        I extracted the PE sections with 7zip so I could do sectional MD5 hashing and see if I could get any leads by comparing those to other known bad sectional hashes :

        $ 7z x <file>.exe -osections

        The above syntax will extract the contents of each of the PE sections, in a tree structure (note that these will most likely be hidden since they start with "." so make sure you list all) :




        Now that the PE sections are dumped I opted to use ClamAv for creating the sectional based MD5's.  ClamAV gives you this ability by using the following syntax:

        $ sigtool --mdb <file>

        The format that gets created for these signatures is:
        PESectionSize:MD5:MalwareName

         If you do it right, you should see similar output to this:

        57344:fceb22b4c5be5a981e6b7bd1e47dca63:.data
        45:8e5a1b84a87bcd2eed7b3ab698a72123:files.txt
        77824:34aeb441429d8f6184d0f6ba5d34cddd:.rdata
        479232:68c0e32b605b7ccead4ad9520d5a5acc:.text

        Welp... no luck on that front either -  but since I didn't have all my samples to cross-check them against, it was more of a long shot anyway.  Now what sparked curiosity is that ClamAV was also raising alerts on this particular file with the name "Trojan.Murofet".  That name is interchanged with Zbot depending on which vendor you're using so it was still leaning towards the same kind of classification for this file.  Hey, if two A/V's are flagging it for pretty much the same thing isn't that more credibility?

        I've been incorporating ClamAV and it's misc. tools more into my process because it's free, maintained, cross-platform, I'm able to create my own signatures and I can even view/edit theirs.  Think about how great the latter of that statement is... If I don't like how something's being detected, I can change it myself.  If I want to catch something from my personal collection I can create my own signature and here's the greatest part - I can see what was being used in order to classify a detection.  Most of the bigger A/V companies hold that little gem to themselves and thus make this type of analysis difficult.

        Using ClamAV's sigtool I decompressed its main signature datababase :

        $ sigtool -u /path/to/main.cvd


        The second part of the above image shows me searching for the detection it was classifying it as (Murofet).  You'll notice that there's more than one entry in this case and that they're both a bit different.  The first hit, Trojan.Murofet, is a sectional hash signature taking on the following format:

        69632:7e82be33bfa6b241bf081909d40e265c:Trojan.Murofet

        Explained:
        PESectionSize:MD5:MalwareName

        The second hit, W32.Murofet, is a regular signature taking on the following format:

        W32.Murofet:1:EP+0:e850000000e9????????73686c776170692e646c6c0061647661706933322e646c6c0075726c6d6f6e2e646c6c00536f6674776172655c4d6963726f736f667400746d7000687474703a2f2f002f666f72756d2f005589e583ec0453e8ea0100008945fc

        Explained:
        MalwareName:TargetType:Offset:HexSignature[:MinFL:[MaxFL]]


        The second hit is of more interest in this case if we take a look at what it's really saying...

        For any 32/64 bit EXE, if at the entrypoint you see "èPéshlwapi.dlladvapi32.dllurlmon.dllSoftware\Microsofttmphttp:///forum/U‰Ã¥Æ’ì Sèê ‰Eü" then flag it as W32.Murofet.

        The HEX string showed in the hit can be decoded from HEX to ASCII which will reveal the string displayed above between the quotes.  The use of double question marks inbetween is a wildcard stating "match any byte"


        Since I now know what the signature within ClamAV was triggering on I wanted to take a look at the EXE's entry point and see if those strings were in fact there.  Even though this could have still been done within REMNux, I flipped over to a Windows analysis box and opened the file in CFF Explorer to get a different view of things.  From within the 'Sectional Headers' I could see the entrypoint (bottom right):




        Geared with that value I opened a hex editor, HxD in this example, and pointed it to go to that offset :



        and wouldn't ya know it ... I was presented with "shlwapi.dll , advapi32.dll,  urlmon.dll, Software\Microsoft, tmp, http://, forum" .  So does the presence of these strings make the file malicious or do they simply help in trying to determine its characteristics/capabilities from a static analysis perspective?  If you've ever analyzed a ZeuS sample you'd notice that what was uncovered here doesn't quite line up with the normal data encountered, however, what about ZeuS-Licat?  Trend Micro has a great write up [PDF] here.  What it appears is that there was a version of ZeuS somewhere else which dropped Licat and what we saw at the files entry point was newly added malicious code - now  appended to this once legitimate file (file infector characteristic).

        Even if you can't dig into the signature responsible on the other A/V you shouldn't call it quits.  If you can find another tool (such as ClamAV) which is classifying it in a similar way then there's a good chance that it's following some of the uncovered signatures logic within ClamAV and you have an idea of how/if it was being mis-classified. Even if you look at a file and a large majority of it looks legitimate -and- even may still run as it once did (in this example the malicious code would execute at the entry point and then jump back to the files original entry point so it could run as it normally did) try and look for anomalies and if possible cross reference the file in question with another version of the original file to find discrepancies.

        More information on ClamAV signatures.

        Thursday, June 21, 2012

        Getting what you want out of a PDF with REMnux

        This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


        I was talking recently with someone Melissa who brought up the fact that she was having a problem extracting something from a PDF.  It was cheating a little bit since we knew there was definitely something there to extract and look for because of another analysis previously posted.  When I read a post about someone doing an analysis I always like when they show a little more details about how they got to the end result and not just showing the end result - and this was a case of the latter.  As a result of this little exercise I thought I would write a quick post on how to do the same type of thing with the CVE-2010-0188 shown here.

        I know there's a wealth of write ups for analyzing PDF's but only a handful are solely done in REMnux and they don't always show multiple ways to get the job done.  I have no problem analyzing on a Windows system with something like PDF Stream Dumper (love the new JS UI) but the fact that REMnux is so feature and tool packed makes it possible to solely stick within its environment to tackle your analysis if need be.  

        One of the first things I run on any file I'm analyzing is 'hachoir-subfile'.  There's other tools within this suite which are also useful but this one isn't necessarily file type specific so it's a great tool to run during your analysis and see if you can get any hits... unfortunately, I didn't get any in this instance.

        Method 1

        Most of you are probably familiar with pdfxray and while the full power of it isn't within REMnux, there's still a slimmed down version, pdfxray_lite, which can provide you an easy to view overview of the PDF:

        $ pdfxray_lite -f file.pdf -r rpt_


        No, that's not a typo in the report name, I added the "_" so that it would be separated from the default text added to the report name which is its MD5 hash.  If we take a look at the HTML report in Firefox Object 122 stands out as being sketchy.  It looks to contain an '/EmbeddedFile' and the decoded stream looks like it's Base64 encoded ... the repeated characters seen also resemble a NOP sled :






        Another one of my favorites is 'pdfextract' from the Origami Framework as it can also extract various data such as streams, scripts, image, fonts, metadata, attachments etc.  It's nice sometimes to have something just go and do the heavy lifting for you but even if you don't get what you wanted extracted, you still might get some other useful information :

        $ pdfextract file.pdf
         
        The above command results in a directory named '<file>.dump' with sub-directories based on what it tried to extract:



        Now.. we're after a TIFF file in this case but still even this tool doesn't seem to have extracted it for us... something unusual must be going on since the above two tools are great for this type of task 9 times out of 10.  In this particular instance, if we list the contents of this dump directory we can see 'script_<numbers>.js' in the root.  Typically, this would be included in the '/scripts' sub-directory so let's take a look at what it holds:


        Looks like there was something in the PDF referencing an image field linked to 'exploit.tif'.  People get lazy with their naming conventions or sometimes even just copy stuff that's obvious (check @nullandnull 's slides as he talks more about this trend.).  Since we don't have any extracted images we can check out the contents of the other files extracted.  Pdfxray_lite gave us a starting point so let's dig deeper into Object 122 and check out it's extracted stream from pdfextract :



         

        Hm... the content type is 'image/tif' and the HREF link looks empty followed by a blog of Base64 encoded data.  There's online resources to decode Base64, or maybe you've written something yourself, but in a pinch it's nice to know REMnux has this built it by default with the 'base64' command.  If you just try :

        $ base64 -d stream_122.dmp > decoded_file

        you'll get an error stating "base64: invalid input".  You need to edit that file to only contain the Base64 data.  I popped it into vi and edited it so the file started like so:



         and ended like this:


        Now that we got the other junk out of the file we can re-run the previous command :


        $ base64 -d stream_122.dmp > decoded_file
         

        and if we do a 'file' on the 'decoded_file' we see we now have a TIFF image:

        $ file decoded_file




        To see if it matches what we saw in the other analysis we can take a look at it through 'xxd' :

        $ xxd decoded_file | less 


        The top of the file matches and shows some if its commands and the bottom shows the NOP sled in the middle down to those *nix commands :


        Method 2


        Lenny had a good write up on using peepdf to analyze PDF and its latest release added a couple of other handy features.  Peepdf gives you the ability to quickly interact with the PDF and pull out information or perform the tasks that you are probably seeking to accomplish all within itself.  It's stated that you can script it by supplying a file with the commands you want to run ... and why that might be good for somethings like general information I found it difficult to be able to do that for what I was trying to do. Mainly, on a massive scale I would have to know exactly what I wanted to do on every file and that's not always the case as is with this example.  To enter its interactive console type:


        $ peepdf -i file.pdf

         This will drop you into peepdf's interactive mode and display info about the pdf :


        The latest version of peepdf also states there's a new way to redirect the console output but since I was working on a back version on REMnux I just changed the output log.  This essentially "tee's" the output from whatever I do within the peepdf console to STDOUT and to the log file I set it to:


        You may not need do the above step in all of your situations but I did it for a certain reason which I'll get to in a minute... Since we already know from previous tools that object 122 needs some attention we can issue 'object 122'  from within peepdf which will display the objects contents after being decoded/decrypted:


        The top part of the screenshot is the command and the second half of the screenshot is another shell showing the logged output of that command which was sent to what I set my output log to (122.txt)  previously.  We already saw that we could use the built in 'base64' command in REMnux to decode our stream but I wanted to highlight that you can do it within peepdf as well with one of its many commands, 'decode'.  This command enables you to decode variables, offsets or *files*.  Since we logged the content of object 122 to a file we can use this filter from within peepdf's console - I wasn't able to do it all within the console (someone else may shed some light on what I missed?) but I believe it's the same situation where you need to remove the junk other than what you want to Base64 decode.  As such, if I just opened another shell and vi'ed the output log (122.txt) to only contain the base64 encoded data like we did earlier then I could issue the following from within peepdf:

        > set output file decoded.txt
        > decode file 122.txt b64

        The above commands change the output log file of peepdf to "decoded.txt" and then tells peepdf to decode that file by using the base64/b64 filter :


        I can once again verify my file in another shell with :

        $ file decoded.txt

        which as you can see in the bottom half of the above screenshot shows it's a TIFF image.

        I've only outlined a few of the many tools within REMnux and touched on some of their many individual features but if you haven't had the time to or never knew of REMnux before I urge you to start utilizing it. Peepdf alone has a ton of other really great features for xoring, decoding, shell code analysis and JS analysis and there are other general tools like pdfid & pdf-parser but it's important to know what tools are available to you and what you can expect from them.

        Tuesday, June 19, 2012

        XDP files and ClamAV

        This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


        updated 08/20/2012 - added two new signatures

        There were some recent discussions going on regarding the use, or possible use of bypassing security products or even the end user by having a XML Data Package (XDP) file with a PDF file.  If you aren't familiar with XDP files, don't feel bad... neither was I.  According to the information Adobe provides, this is essentially a wrapper for PDF files so they can be treated as XML files.  If you want to know more about this file then take a look at the link above as I'm not going to go heavily into detail but note that the documentation is a bit on the light side as it is.  There're other things that can be included in the XDP file but for this post we're looking at the ability to have a PDF within it.


         Adobe states that :
        "The PDF packet encloses the remainder of the PDF document that resulted from extracting any subassemblies into the XDP.  XML is a text format, and is not designed to host binary content. PDF files are binary and therefore must be encoded into a text format before they can be enclosed within an XML format such as XDP. The most common method for encoding binary resources into a text format, and the method used by the PDF packet, is base64 encoding [RFC2045]."

        Based on my limited testing, when you open a XDP file, Adobe Reader recognizes it and is the default handler.  When the file is opened, Adobe Reader decodes the base64 stream (the PDF within it), saves it to the %temp% directory and then opens it.

        Brandon's post included a SNORT signature for this type of file but I wanted to get some identification/classification for more of a host based analysis.  Since I couldn't get a hold of a big data set I grabbed a few samples (Google dork = ext:xdp) and thought I'd first try TrID - but that generally just classified them as XML files (with a few exceptions) and the same thing with 'file'.  I can't blame them, I mean they are XML files but I wanted to show them as XDP files with PDF's if that was the case - that way I could do post-processing and extract the base64 encoded PDF from within the XDP file and then process it as a standard PDF file in an automated fashion.  

        I then looked to TrIDScan but unfortunately that didn't work as hoped.  I tried creating my own XML signature for it as well but kept receiving seg. faults .. so... no bueno. My next thought was to put it into a YARA rule but I thought I'd try something else that was on my mind.  I've been told in the past to mess around with ClamAV's sectional MD5 hashing but that's generally done by extracting the PE files sections then hashing those.  Since this is a XML that wasn't going to work.  I remembered some slides I looked at a bit ago regarding writing ClamAv signatures so when I revisited them the lightbulb about the ability to create Logical Signatures came back to me.

        ClamAV's Logical Signatures


        Logical Signatures in ClamAV are very similar to the thought/flow of YARA signatures in that they allow you to create detection based on..well.. logic.  The following is the structure, the 'Subsig*' are HEX values... so you can either use an online/local resource to convert your ASCII to HEX or you can leverage ClamAV's sigtool (remember to delete trailing 0a though):
         sigtool --hex-dump
        Logical Signature Structure:
        SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;Subsig1;Subsig2;...
         

        Looking back to Adobe's information they also mention that the PDF packet has the following format:

        <pdf xmlns="http://ns.adobe.com/xdp/pdf/">
             <document>
                  <chunk>
                       ...base64 encoded PDF content...
                  </chunk>
             </document>
        </pdf>

        ClamAV Signature

        The beauty is that you can create your own custom Logical Database (.ldb) and pop it into your default ClamAV directory (i.e. /var/lib/clamav) with the other databases and it'll automatically be included in your scan. While just detecting this may not indicate it's malicious, at least it's a way to detect the presence of the file for further analysis/post-processing.  So based on everything I now know I can create the following ClamAV signature :

        XDP_embedded_PDF;Target:0;(0&1&2);3c70646620786d6c6e733d;3c6368756e6b3e4a564245526930;3c2f7064663e

        Explained: 

        XDP_embedded_PDF - Signature name

        Target:0 - Any file

        (0&1&2) - match all of the following

        0
        ASCII :  <pdf xmlns=
        HEX  : 3c70646620786d6c6e733d

        1
        ASCII :  <chunk>JVBERi0
        HEX :  3c6368756e6b3e4a564245526930
        * JVBERi0 is the Base64 encoded ASCII text " %PDF- ", which signifies the PDF header.  It was converted into HEX and added to the end of the 'chunk' to help catch the PDF

        2
        ASCII :  </pdf>
        HEX :  3c2f7064663e

        update #1 on 08/20/2012 : 
        The above first created ClamAV signatures works but I started to think that the '<chunk>JVBERi0' may not be next to each other in all cases ... not sure if they have to nor not by specification but this is Adobe so I'd rather separate them and match on both anyway..

        XDP_embedded_PDF_v2;Target:0;(0&1&2&3);3c70646620786d6c6e733d;3c6368756e6b3e;4a564245526930;3c2f7064663e 


        update #2 on 08/20/2012:

        YARA signature:


        rule XDP_embedded_PDF
        {
        meta:
        author = "Glenn Edwards (@hiddenillusion)"
        version = "0.1"
        ref = "http://blog.9bplus.com/av-bypass-for-malicious-pdfs-using-xdp"

        strings:
        $s1 = "<pdf xmlns="
        $s2 = "<chunk>"
        $s3 = "</pdf>"
        $header0 = "%PDF"
        $header1 = "JVBERi0"

        condition:
        all of ($s*) and 1 of ($header*)
        }


        Questions to answer

        Actors are always trying to find new ways to exploit/take advantage of users/applications so it's good that this was brought to attention as we can now be aware and look for it.  While the above signature will trigger on an XDP file with a PDF (from what I had to test on), there're still questions to be answered and without having more samples or information they stand unanswered at this point:

        1. Could these values within the XDP file be encoded and still recognized like other PDF specs
        2. Can it be encoded with something other than base64 and still work
        3. Will any other PDF readers like FoxIT treat them/work the same as Adobe Reader

        Comments and questions are always welcome ... never know if someone else has a better way or something I said doesn't work.

        Wednesday, May 9, 2012

        What's in your logs?

        This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


        I've had this on the back burner for a few months but I'm finally getting around to writing up a post about it.  I re-tested the scenarios listed below with log2timeline v0.63 in SIFT v2.12  and verified it's still applicable.

        The scenario

        I was investigating an image of a web server which was thought to have some data exfiltrated yada yada.. Log analysis was going to be a key part of this investigation and I had gigs to sift through.

        Among a few other tools, I ran the logs through log2timeline and received my timeline - or so I thought.  There wasn't any indication that entire files couldn't be parsed or files that were skipped in the STDOUT so one would assume everything was successful- right?  Not so much.  I don't like to stick to one tool and this wasn't going to be any different.  I loaded the logs with a few other tools (Notepad++, Highlighter, Splunk, Bash etc.) and verified my results.  As a result of being thorough, I noticed that there were a bunch of lines from the apache2 error logs which were present in the other tools outputs but were noticeably missing in my timeline.  After some digging around and some additional testing with sample data sets I noticed there were a few problems.

        The problems 

        1) The apache2_error parser  says it has to match the regex of Apache's defined format or log2timeline won't process it:

        "#       DOW    month    day    hour   min    sec    year           level       ip           message
          #  ^\[[^\s]+ (\w\w\w) (\d\d) (\d\d):(\d\d):(\d\d) (\d\d\d\d)\] \[([^\]]+)\] (\[([^\]]+)\])? (.*)$

          #print "parsing line\n";
          if ($line =~ /^\[[^\s]+ (\w\w\w) (\d\d) (\d\d):(\d\d):(\d\d) (\d\d\d\d)\] \[([^\]]+)\] (\[([^\]]+)\]) (.*)$/ )
          {
            $li{'month'} = lc($1);
            $li{'day'} = $2;
            $li{'hour'} = $3;
            $li{'min'} = $4;
            $li{'sec'} = $5;
            $li{'year'} = $6;
            $li{'severity'} = $7;
            $li{'client'} = $8;
            $li{'message'} = $10;

            if ($li{'client'} =~ /client ([0-9\.]+)/) 
            {
              $li{'c-ip'} = $1;
            }
          }
          elsif ($line =~ /^\[[^\s]+ (\w\w\w) (\d\d) (\d\d):(\d\d):(\d\d) (\d\d\d\d)\] \[([^\]]+)\] (.*)$/ ) 
          {
            $li{'month'} = lc($1);
            $li{'day'} = $2;
            $li{'hour'} = $3;
            $li{'min'} = $4;
            $li{'sec'} = $5;
            $li{'year'} = $6;
            $li{'severity'} = $7;
            $li{'message'} = $8;
          }
          else 
          {
            print STDERR "Error, not correct structure ($line)\n";
            return;
          }"

        ...so why some of the lines in the logs followed it, it was later noticed that others were far from the required standard and resulted in a loss of data being produced.  Examples of what I mean are :

        cat: /etc/passwrd: No such file or directory
        find: `../etc/shadow': Permission denied

        As shown above, some of the logs were errors, permission denied statements etc. as a result of the external actor trying to issue commands via his shell (obviously not fitting the standard format).  Once I noticed not all of the lines were being parsed I checked what else this parser required to be a valid line and did a quick sed on the fly and found any log entry that didn't match the requested format and added a dumby beginning (date, time etc.) so it would at least parse everything. 

        This could have been done in many ways, with other regex's etc. but for this example I just wanted a quick look to see exactly how many lines in the files didn't adhere to the standard format so I did it this way:

        hehe@SIFT : cat error.log | grep "^\[" > error.log.fixed
        hehe@SIFT : cat error.log | grep -v "^\[" > problems.txt
        hehe@SIFT : cat problems.txt | sed 's/^/[Fri Dec 25 02:24:08 2010] [error] [client log problem] /' >> error.log.fixed

        It was a quick hack, but not an ultimate solution.

        2) Even though some of the files didn't contain valid lines, some of them were completely fine but yet still to my surprise, they weren't parsed.  It seemed that if certain lines were existent within the logs that they wouldn't get parsed ... maybe even the possibility that at some point log2timeline would just skip the rest of the files and not try to parse them at all :/


        The testing

        Here's an example of the type of data I used for the re-testing:

        error_fail.log
        cat: /etc/passwrd: No such file or directory
        find: `../etc/shadow': Permission denied
         
        error_mix.log
        [Fri Dec 25 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico
        cat: /etc/passwrd: No such file or directory
        find: `../etc/shadow': Permission denied
        [Fri Dec 30 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico

        error_ok.log
        [Fri Dec 23 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico

        error_ok2.log
        [Fri Dec 24 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico


        *I flip-flopped the number of lines contained in the logs on occasion as well as the dates & order within a single file to test multiple scenarios and to see if certain lines were getting parsed.



        Processing multiple files:

        So here are two files, both containing all valid lines:
         
        So let's try saying "*.log" for the file to be parsed:

        ...but by doing that log2timeline will only take the first file and skip everything else as the above image shows.  I'll admit that fooled me for a bit, I thought it would work.

        If you supply the (-r) option you can't supply "*.log" as it'll result in an empty file (yes, I deleted the test.csv prior):

        However, if you supply the (-r) option with a directory (i.e. $PWD) it will try to parse everything & tell you what files it can't open.  It will also tell you if a logs line couldn't be processed, however, it doesn't tell you from what file (if there's multiple being processed) :

        and  it also doesn't state that it stopped and didn't continue parsing - If you look above, the error_mix.log had a date of 12/30/2010 after its invalid lines which doesn't end up in our results ...whoops:

        So it looks like if there's an invalid line within a log being parsed that log2timeline will stop processing that file? :/ ... not much indication of that unless we already know what's in our data set being parsed.

        Right about now some of you are saying... hey man, there's a verbose switch.  Correct, there is.  And while it's helpful to tackle some of the things I've mentioned, it still isn't the savior.  When I ran the following:

        hehe@SIFT: log2timeline -z UTC -f apache2_error -v -r $PWD -w test_verbose.csv

        I received this to STDOUT:



        So the verbose switch told me it was processing the file, that it didn't like a line within the file and that it finished processing this file... but it still didn't process the entire error_mixed.log file again:


        * same held true for very verbose

        Now it's possible that this has something to do with the amount of lines that are read to determine if there's an actual Apache2 error log base :

        # defines the maximum amount of lines that we read until we determine that we do not have a Apache2 error file
            my $max = 15;
            my $i   = 0;
        But if that were the case I thought I'd see an error like this:

        Ok.. so the above STDOUT at least tells us the file trying to be parsed couldn't because its first 15 lines weren't valid which goes along with the previously stated snippet about the 15 lines needing to be met.  So what happens if we add other files to be parsed in the same directory as that file, same notification? :

        Nope - It appears we don't get any notification that a file couldn't be processed  _but_ with the (-v) switch on we get this information.  So at this point the error_fail.log doesn't have 15 valid lines so just for troubleshooting purposes I altered the error_mixed.log to contain the following:

        (19x) [Fri Dec 25 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico cat: /etc/passwrd: No such file or directory
        find: `../etc/shadow': Permission denied (15x) [Fri Dec 30 02:24:08 2010] [error] [client 1.2.3.4] File does not exist: /var/www/favicon.ico

        This data set would suffice since there are at least 15 valid lines in the beginning of the file to be considered a valid file to parse so let's try to parse a directory with the new error_mixed.log file and two files with all valid entries (error_ok.log & error_ok2.log):


        We see again that there was a file that contained an invalid line.  In the images below, we see that it appears the error_ok.log (12/23/12) & error_ok2.log (12/24/12) files were parsed but the error_mixed.log (12/25/12, <errors>,12/30/12) wasn't parsed.  The above STDOUT shows that it didn't like one of the logs lines but it doesn't state that it didn't parse it at all :/



        Even with the verbose switch on it still didn't state any indication that it didn't continue parsing the file or skipped over any other parts of it besides the invalid log it pointed out.

        Proposed Solution

        I have some ideas of what can be done but I opened an issue ticket so others in the community could chime in as well.  I talked with the plugins' author @williballenthin and provided my test samples & findings and he agreed that there should be some others input into the solution.  Here' what I thought...
        *the ticket has a typo, it was re-tested in SIFT v2.12 (2.13 wasn't out yet :) )

        1) Either the first line in the error log has to fit the standard format or one out of the first x lines (right now it's set to w/in the first 15);  If not, spit out an error stating that particular file couldn't be parsed & continue onto the next log file if there are multiple since the next one may have valid entries.

        2) As long as at least one line is found to meet the standard format, once a line is found that doesn't meet the standard format after that (i.e. doesn't start with [DOW month …]) then copy that information from the line before it (with the valid format/timestamp) and add it to the beginning so it meets the format and can be put into the timeline of events.

         

        Conclusion

        So why did I write all this up and why do you care?  Log2timeline is purely awesome.  It's changed many aspects of DFIR but there's always going to be improvements needed.  It's open source and for the community so the feedback will only make it better.  Someone else may be dealing or have to deal with exactly what you've come across so why not make it known?  It's crucial that you understand how the tools/techniques you're using work to the best of your ability.  If by solely relying on clicking buttons is your method of expertise, you're gonna get caught at some point.  Even though Willi didn't have these types of examples to test the parser on when he originally created it, I wanted to get this information out there because I fear that there are others who might not have realized what I did.  If I hadn't checked my timeline against other tools I would have missed key information for this analysis.  Do you double check your results?  Are you seeing the whole picture?