Monday, February 13, 2012

What to use for analysis on a per file extension -or- category basis

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


As you are all aware of, there are a ton of different tools out there and the list just keeps growing.  A coworker of mine is working on some malware automation and often times we needed to determine which tools we wanted to run against said files.  This outcome varies based on the type of file it can be identified and classified as of course but still... what do you use?  I know I can't remember everything I come across everyday or sometimes have a brain fart and forget what tool can be used in what situation or on what type of file so I started to create a spreadsheet that would help aid in this type of debacle.  The list started to take on a life of its own and as you can imagine the scope can be very large depending on what your goals are and how you want to store this information.

This list could very well be made into a DB and represented/maintained better but for quick answers on the road this was the best option for me.  I'm posting this up on Google Docs where I will periodically update the list (feel free to give me recommendations to add -> @hiddenillusion).  Hopefully it will be of use for others as I know it's come in use for me and some others already.  If you need to modify it for your own means then please do - just download a local copy and have at it.  Don't send me feedback that there's a row incomplete, I'm aware.

As I started to say, my first intention was to have a list of tools broken down by what files they could be used to analyze.  Because a tool can be used to analyze more than one file extension (i.e. 7zip for .zip/.rar/.7z/.jar etc.) I have certain tools listed multiple times.  When I have a sortable list, I don't want to have to result in searching the spreadsheet to find what I'm looking for but would rather just filter by the file extension and see what results I have stored.

Update : changed this so multiple file extensions would be listed next to the same tool.  It was convenient to filter just by the extension but I got tired of having so many duplicate lines for a single tool.  You can just as easily click on that column and filter based on a cell containing what you're looking to analyze.

As I continued to populate this list I thought why just list out tools for malware analysis?  There's plenty of dfir & general use tools/sites which this would be applicable to in my everyday environment so I added a few other columns.... You may notice that some of it is incomplete (i.e. not all fields are filled in for every row) or that I may have forgotten some common tools but hey, we all get busy and it's a living file - meaning it will never be complete because new things are always being released..

Some of the other notes to take into account are that for me personally, I would like to know if it's a CLI/GUI (or both) type of tool, what the tools described as, where I can get it, any useful switches that I should know about, does it require an install to use and finally is it a part of anything else I may already have so I don't have to go and get it.  With that being said, the structure of the spreadsheet is as follows:

-----------------------
Column - Purpose
-----------------------

File Ext - what file extensions can be processed by this tool
Tool - the name of the tool
Category - what's the best fitting main category to apply to this tool (you'll notice that there's overlap)
Sub-Category - helping to narrow down for particular situations of analysis.  (i.e. I may be looking for a tool to use for ADS or VSCs or Rootkits).  This is especially helpful for those tools that aren't just to be classified by the file extensions they can handle.
Useful Switches - helps save time reading man pages or looking it up online
Type - useful to know if it's a CLI/GUI/Both for scripting purposes & forensic footprints on IR engagements
Tool Description - quick summary of what the tool is or what it can do.  In full disclosure - most of the time I did not personally write these; I usually just copied and pasted them from the authors description or wherever I found out about the tool.  Why re-invent the wheel, but credit goes to the other guys when appropriate.
Linkage - helpful to know where to get the tool at...
Require Install? - this is very important to know in certain situations so if I know that there's a full install required it will have some impact on my decision if I'm on an IR engagement and not doing some postmortem analysis in a lab.
Included In? -I started to put in some of the common frameworks/distros such as TSK, REMnux, SIFT etc.

If the link to the spreadsheet doesn't work, head over to the menu bar > Spotlight > Files > Google Docs > Tools classify list ... sometimes the link gets screwy when it's edited :(

Monday, January 9, 2012

Total number of connections to a server from proxy logs

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


Goal :  Go through every log file for a day, print the server/IP that clients were communicating with and give a total sum for the number of times each server/IP was communicated with.
Notes : Each day has anyway from 30+ log files created from multiple sensors which archive the logs in a centralized location and the naming convention for the logs starts with %Y-%d-%m for each log.  I also wanted a timer to see how long it took to process each log as well as use a for-loop which would be supplied by a # of days to recurse back to.
Problems :
  1. A given server/IP could be in multiple files for the same day so I couldn't do a uniq sort on each file during my initial loop or I wouldn't get the exact number of hits for that server/IP but rather a sample.
i.e. -   awk '{print $11}' | sort -u | perl -ne 'chomp; if (/.*\..*?$/){print "$_\n";}' 

The above line just tells awk to print the 11th field (server/IP in this case), sorts the results to unique then gets rid of anything that doesn't look like it's a website/IP.

Here is an example of the data set I was working with:

thatdude@lol:~> cat sample.txt | grep "0.0.0.0$" | less > 0.0.0.0.txt
     1 0.0.0.0
      1 0.0.0.0
      1 0.0.0.0
      1 0.0.0.0
      1 0.0.0.0
      1 0.0.0.0
      1 0.0.0.0
    11 0.0.0.0
    12 0.0.0.0
    15 0.0.0.0
      2 0.0.0.0
      2 0.0.0.0
      2 0.0.0.0
      2 0.0.0.0
      2 0.0.0.0
    28 0.0.0.0
    29 0.0.0.0
    33 0.0.0.0
    37 0.0.0.0
      4 0.0.0.0
      4 0.0.0.0
      4 0.0.0.0
      5 0.0.0.0
      5 0.0.0.0
      5 0.0.0.0
      6 0.0.0.0
      9 0.0.0.0
      9 0.0.0.0
thatdude@lol:~> cat 0.0.0.0.txt | awk '{ sum+=$1 } END {print sum}'
233
So now you can see that the IP '0.0.0.0' was contained within multiple log files for the same day so now that I had a count for how many times that server/IP was listed within each log file I needed to combine all matching server/IP values together.  I was given advice to put it into an array in perl but realized I could also leverage awk to do the same thing. 

i.e. - awk '{array[$2]+=$1} END { for (i in array) {print array[i], i}}'

It's a beautiful thing when it works.... the above awk line creates an array on the 2nd column (server/IP) and as it goes through its for-loop will sum up the the values in the first column when additional, similar values in the 2nd column are found.  To put it all into perspective, the script below met the following decision flow:
  1. List the directory where the logs are located and find all of the logs for a given day
  2. Use a for-loop to tell it how many days to recurse back to
  3. Calculate how long it takes do process each days logs
  4. Once all of the logs are found for a given day, search through each one and print the field containing the server/IP, sort the results, get rid of anything that doesn't look like it's a website or IP address then print a unique count for each server/IP found within a given days logs
  5. Open up the results from a given day and concatenate the results so a unique server/IP would have the total amount of hits while only being displayed once
  6. To save space, compress the results
... and the script:
#!/bin/bash
Log_Path="/path/to/logs"
CurrentDate=`date +%Y-%d-%m`
CompressedDate=`date --date="-$n day" +%d-%m-%Y`
Daily_Stats_Path="/path/to/export"

for ((n=0; n<=50; n++)); do

tic=$(date +%s)
Yesterday=(`date --date="-$n day" +%Y-%d-%m`)
CompressedDate=(`date --date="-$n day" +%m-%d-%Y`)
ls $Log_Path | grep $Yesterday |while read files; do
        zcat $Log_Path/$files | awk '{print $11}' | sort | perl -ne 'chomp; if (/.*\..*?$/){print "$_\n";}' | uniq -c >> $Daily_Stats_Path/$Yesterday.tmp
        done
wait
awk '{array[$2]+=$1} END { for (i in array) {print array[i], i}}' $Daily_Stats_Path/$Yesterday.tmp | sort -nr  >> $Daily_Stats_Path/$CompressedDate.txt
rm $Daily_Stats_Path/$Yesterday.tmp
gzip -9 $Daily_Stats_Path/$CompressedDate.txt
toc=$(date +%s)
total=$(expr $toc - $tic)
min=$(expr $total / 60)
sec=$(expr $total % 60)
echo "$CompressedDate.txt took :" $min"m":$sec"s"
done