Showing posts with label xor. Show all posts
Showing posts with label xor. Show all posts

Wednesday, March 12, 2014

Bruteforcing XOR with YARA

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


In a previous post I looked at coming up with a process for determining XOR keys that were 256 bytes.  I've received and have read some great feedback/posts regarding the tool and even though I wrote it in such a way to try and still possibly see patterns/repetitive bytes for smaller XOR keys, that wasn't its purpose.  There are plenty of other tools out there to try and assist oneself when dealing with XOR'ed files, however, recently a co-worker and I were left unsuccessful after exhausting those resources.

I'm often asked to look at some artifact that's believed to be encoded in some fashion or hear that even if something is XOR'ed that they wouldn't know how to go about decrypting/decoding it.  I'm by no means an expert and sometimes find myself just as lost as you might feel but I thrive on learning and challenges, hence why I decided to work in the dfir space.

I believe this type of scenario is just like most others - the more time you spend doing it, the easier it becomes.  Additionally, pattern recognition is key when it comes to XOR (pun intended).  Determining the XOR key and any other skips etc. that might be used can be quite trivial, but let's look at a few ways that make this type of scenario harder:
  • You don't have access to the source code of the file responsible for performing the XOR 
  • You don't have access to the binary  responsible for performing the XOR 
  • You don't have the knowledge/skills/resources
  • The key you think should work isn't working
So you just have a file that you believe is encoded but you're not sure how (e.g. - you try to open it and you don't see any plain text). One of the easiest ways to determine if it's XOR'ed is if while scrolling through it you start to see patterns emerging.  This could be horizontal, vertically or maybe just repetitive characters constantly appearing - all depends on the key length and any other skips that might be in play.  When I say skips I'm referring to the XOR routine skipping null bytes, line feeds, carriage returns, not XOR'ing itself (e.g. - if the key is A5 then maybe if it sees A5 it skips it instead of XOR'ing itself) or some other trick.  Again, these are easier to determine if you have either of the first two bullet points listed above...but unfortunately that's not always the case.

In a recent blog post there was mention of the malware named XtremeRAT and additionally a few tools to help in scenarios where you're investigating incidents involving it.  One of the scripts listed there is for decrypting a keylog file created from XtremeRAT with a two byte XOR key of '3fa5'.  While it's helpful to know that two byte XOR key is used, what if it doesn't work on your file (bullet point number 4 mentioned above)?  Or what if there's a new variant using a different XOR key that you now need to try and figure out?  To try and solve these questions I decided to leverage a combination of YARA, the script xortools from Malware Analysts Cookbook (the book that keeps on giving) and use case examples from some others within the YaraExchange.  Xortools has some useful functions for creating different XOR's, permutations and then spitting them out into YARA rules... sweet, right?

The functions within xortools didn't quite have a solution for what I was trying to do but some quick modifications to a couple of them was easy enough to implement.  Let's break down the thought process:

  1. I wanted to generate a list of all possible combinations of two byte XOR keys (e.g. - 1010, 1011, 1012 etc.).
  2. Using those combinations I then wanted to XOR a string of my choosing
  3. With the resulting XOR'ed string I wanted to create a YARA rule for their hex bytes.  
  4. I also wanted to keep track of the two byte XOR key being used for each rule and add them to the rules name so if/when a rule triggers, the XOR key is easily identifiable - this wasn't currently included in xortools so see my modified functions
  5. Wash, Rise, Repeat.... this would entail creating different strings that you wanted XOR'ed.  I have a list that I usually feed to xorsearch such as 'http', 'explorer', 'kernel32' but in this particular instance I needed a list of strings that were likely to appear in a keylog file, such as:
  • Backspace
  • Delete
  • CLIPBOARD
  • Arrow Left
  • Arrow Right
  • Caps Lock
  • Left Ctrl
  • Right Ctrl

    For some additional hints on what you might see within a keylog file, check out Ian's YARA rule for DarkComet_Keylogs_Memory.
Good thought process thus far, but what if those strings aren't contained within the keylog file?  You wouldn't necessarily know unless you've previously dealt with this malware or have come across an example online...so another approach to think about is what is likely to be recorded on the system?  Here are some examples I've found helpful:

  • Company name (most likely keylogged email and/or Internet browsing)
  • The persons name/user name
  • Microsoft
This should help make things more flexible and tackling the unknown aspect.

First things first... create a function to generate every combo of two byte XOR keys:


The top is the original and the bottom is an example of how to generate the pair by adding another loop and at the end saving the two byte key for use in the rule name.  Note: Doing it this way may produce hex characters that are only a nibble and YARA will not like that if you're trying to match on hex characters so to circumvent it, I decided to add a wild card (?) as the other nibble.

Next, we need to feed those two bytes to an XOR function and XOR the string we passed it.  Finally, leverage the 'yaratize' function to create the YARA rule.  I got things working and when I went to scan the XOR'ed keylog files I received 'Error 25' from YARA (sad face).  After some troubleshooting I was told this issue was being caused by having too many strings in a single rule. Essentially, Error 25 'ERROR_EXEC_STACK_OVERFLOW' meant I was hitting a hard limit on the stack size. No bueno... My options were to tweak line 24 in libyara/exec.c or create better YARA rules.  By creating so many strings and using the pre-existing 'yaratize' function within xortools my rule looked followed this structure:



You'll notice it's the standard rule format most of you are probably familiar with seeing: rule name followed by the strings to match and at the bottom (not shown) would be the condition.  After some testing I determined that ~16k strings to match on seemed to be the limit that YARA would accept in a single rule (that's based on my systems config. + length of string to match etc.).  Back to my options - I could tweak that setting in YARA which I didn't want to, have a counter and only add X amount of strings to match per rule or the third option of creating one rule per string.  The third might not be familiar to some of you but that's what I opted to go with.  It creates a larger file because of all the extra characters you're adding but with the new version of YARA, performance shouldn't really be too much of a factor.  An example of this type of format is:



Now that this hurdle was bypassed, I was able to use the YARA rules generated.  On a test file that I XOR'ed with the key '3fa5' the YARA rules worked ...however, they still weren't working on the keylog files from XtremeRAT - Err!  



Note: the (-s) switch to YARA tells it to print out what matched, which is important here because our string name has the XOR key in it and the (-f) switch tells it to use fast matching mode, which only prints out the first match in the file instead of every time it's matched.

Alright, so let's pop open the XOR'ed test file I created and check out its hex and compare it to what I was seeing in the XtremeRAT files:

Here's what the test file looks like XOR'ed and in plain text, respectively:



And here is an image of the first 10 lines of two keylog files from XtremeRAT.  If you scroll through this example you'll notice the first file has a second byte consistently of '00' while the second file has a second byte consistently 'a5':




If you've read anything on XOR'ing before you may be aware that XOR keys can present themselves based on what they're XOR'ing (hence why sometimes they have skips/checks implemented).  Focusing on the bottom file, I'd say 'a5' is part of the XOR key - if not the key itself (depends on the length you're dealing with). Circling back to the XtremeRAT blog post we know a common key is '3fa5' so it appears we're being presented with half the key when we browse through the XOR'ed keylog file.  Now if you recall back to previous YARA rules being created, I was producing a straight two byte XOR without any skips... if you look at the above files you'll realize, or maybe after some troubleshooting, that this conversion won't work in this instance as the keylog file doesn't have each byte sequentially (e.g - If the word within the keylog file we're looking for is 'Microsoft', the keylog file doesn't show it as that word XOR'ed in order, but rather with 'a5' in between each XOR'ed character.) Hm, what's happening?  According to the blog post,

"XtremeRAT's key scheduling algorithm (KSA) implementation contains a bug wherein it only considers the length of the key string, not including the null bytes between each character, as found in these Unicode strings".

Now without having the binary or source code to make that determination (which I didn't), it should still become evident if you try and do a comparison:




On the left hand side of the above image is another look at the previously shown test file I created with some common keywords typically found in a keylogger file and on the right hand side is a sanitized copy of one created by XtremeRAT.  In each of the panes, the word 'Microsoft' is highlighted in the format of the particular file it's part of.  For a visual guide of what's going on and what should be expected I put together a quick image:



The top section shows the string 'Microsoft' in its native form, converted to other formats followed by what its representation would be if that particular character was XOR'ed by each half of the two byte XOR key '3fa5' by themselves.  The bottom section again shows the same string but separated by 'a5' as shown when viewing the keylog file XOR'ed followed by what would be required in a YARA rule to match on this particular string as it's seen within the XOR'ed file (hope this makes sense).

When stuck or first starting off with something like this you can reference online tables or use online systems to see binary/decimal/hex conversions but it might be worth while figuring out how to do it programmatically in something you feel comfortable with - python, perl, bash, M$ Excel etc. to try and see what's going on.

Below is another copy of the same exact table shown above, but this time with two columns highlighted.  The top column helps show each character within the string 'Microsoft' as its value in hex once it's XOR'ed with the single byte key of '3f'.  The bottom column contains the same information, but has the second half of the XOR key 'a5' inserted in between each of the strings characters.



In other words? - Because XtremeRAT uses a two byte XOR key and has null bytes in between each character, the second part of the two byte XOR key 'a5' is always displayed.  Essentially, it becomes a one byte XOR key as each character is always XOR'ed with the first half of the XOR key '3f'.  So how do we compensate for this?  After generating the permutations for every two byte XOR key we just read each character one at a time from the string we supply it then XOR each of them with the first half of the two byte key and add the second half of the two byte key right after it as itself (represented in the bottom blue column above).

Once we do that, bingo! :



We first see what the new YARA rule for '3fa5' looks like (which as the second byte as itself 'a5') and first see that it doesn't match on a file that's XOR'ed normally with the two byte key '3fa5' and lastly that it now matches on a keylog file XOR'ed from XtremeRAT with the added null byte routine.

So how easy is it to code?  Pretty easy since the majority of it existed, just some slight modifications and you're good to go.  You just need to modify the permutations function to generate combos of two byte XOR keys:



and push them over to the xor routine of need:





and finally to a function to create the YARA rules:


Other than that, just import the required functions and supply them with the required data; so for the modified functions I created, I could just say:



and voila, game over.  This should hopefully have helped explain a little more on what XOR is, how to go about detecting it and another resource you can use in the future for trying to brute force what a possible XOR key is based on some common strings that might be present.  Since xortools is hosted on Google code I opted to put up a modified version on my github instead of just a patch. I'm not the original author of all the code, just a guy modifying as needed.

Tuesday, January 22, 2013

NoMoreXOR

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


Update 04/09/2013 - NoMoreXOR is now included in REMnux as of version 4.

Have you ever been faced with a file that was XOR'ed with a 256 byte key? While it may not be the most common length for an XOR key, it's still something that has popped up enough over the last few months to make it on my to-do list.  If you take a look at first the two links mentioned above you'll see they both include some in-house tool(s) which do some magic and provide you with the XOR key.  Even though they both state that at some point their tools will be released, that doesn't help me now.

Most of the tools I came across can handle single byte - four byte XOR keys no problem (xortool, xortools, XORBruteForcer, xorsearch etc.) but other than that I didn't notice any that would handle (or actually work) with a large XOR key besides for (okteta, converter and cryptam_unxor).

I noticed Cryptam's online document analysis tool had the ability to do this as well so I sent them a few questions on their process and received a quick, informative response which pointed me to a post on their site.  Within the post/email they said that they don't perform any bruteforcing on the XOR key but rather perform cryptanalysis and then brute force the ROL1-7 (if present).  As shown in the dispersion graphs they provide, they appear to essentially be looking for high frequencies of repetitive data then using whatever appears the most to test as the key(s).

So how do you know if the file is XOR'ed with a 256 byte key in the first place?  Well... you could always try to reverse it but you may also be lucky enough to have some YARA rules which have some pre-calculated rules to help aid in this situation.  A good start would be to look at MACB's xotrools (previously linked) and also consider what it is you might want to look for (i.e. - "This program cannot be run") and XOR it with some permutations.

Manual process



If we open that file within a hex editor and go to the offset flagged (0x25C8) we'll see what is supposedly "This program cannot be run" = 26 bytes :

If we take that original file and covert it to hex we'll essentially just get a big hex blob:





...but that hex blob helps to try and guess the XOR key:


From my initial tests, the XOR key has always been in the top returned results, but even if you're having some difficulties for whatever reason you can always modify the code to fit your needs - gotta love that.

So if we now try to unxor the original file with the first guessed XOR key (remember XOR is symmetric) hopefully we'll get the original content that was XOR'ed:





After the original file was unxored and scanned with YARA we see that it was flagged for having an embedded EXE within it (this rule can be found within MACB's capabilities.yara file) so it looks like it worked.


Now while all this hex may look like a bunch of garbage at times, the human eye is very good at recognizing patterns - and when you look more and more at things like this you'll start to recognize them.  Do you recall the YARA hit that triggered? It stated that the XOR key was incremented.  What this means is that each byte is being XOR'ed with the next byte in an incremental fashion until it wraps back around to the beginning.  That may be confusing the grasp at first so lets visualize it by breaking down the previously found 256 byte XOR key in its' respective order:


868788898a8b8c8d8e8f
909192939495969798999
a9b9c9d9e9f
a0a1a2a3a4a5a6a7a8a9
aaabacadaeaf
b0b1b2b3b4b5b6b7b8b9
babbbcbdbebf
c0c1c2c3c4c5c6c7c8c9
cacbcccdcecf
d0d1d2d3d4d5d6d7d8d9
dadbdcdddedfe
0e1e2e3e4e5e6e7e8e9
eaebecedeeef
f0f1f2f3f4f5f6f7f8f9
fafbfcfdfeff
000102030405060708090
a0b0c0d0e0f
10111213141516171819
1a1b1c1d1e1f
20212223242526272829
2a2b2c2d2e2f
30313233343536373839
3a3b3c3d3e3f
40414243444546474849
4a4b4c4d4e4f
50515253545556575859
5a5b5c5d5e5f
60616263646566676869
6a6b6c6d6e6f
70717273747576777879
7a7b7c7d7e7f
808182838485

As you see, it started with 86 and looped all the way around till it reached 85 - you should also notice the patterns on each line.  This is just an example of incremental/decremental XOR (not as commonly observed in my testing but useful to be aware of) but it's useful to know because it's quite easy to spot if you look at the original file in a hex editor again:






... and that's a pattern that was observed repeating ~56 times.


Automated process

So now we can kind of put together a process flow of what we want to do:
  1. Convert the original, XOR'ed file to hex
  2. Conduct some slight frequency analysis of the newly created hex file and look for the most common characters as well as the most commonly observed hex chunks.  
    1. The first part may help in determining if there's an embedded PE file (usually a lot of \x00's) or possibly help deduce if certain bytes should be skipped.  
    2. The latter essentially reads 512 bytes at a time, stores it and continues till the end of the file.  Once complete it does some simple checking to try and weave out meaningless possible keys then presents the top five most observed 512 bytes or characters in this sense  (i.e. - 512 characters = 1 possible 256 byte key(s))
  3. For each possible XOR key guessed from the previous step, XOR (the entire file for right now) the original file, save it to a new file and scan it with YARA.  
    1. I chose to perform YARA scans here to help determine the likelihood that the key used was correct - you may choose to implement something else such as just a check for an embedded PE file etc.  If there are YARA hits then I stop attempting the other possible XOR keys (if any other were still to be processed) and assume the previous XOR key was the correct one.
* If you stick with the YARA scanning, it will continue to process all of the possible key(s) it outlined as the top, in terms of frequency, so your YARA rules should include something that might be present in the original XOR'ed file.  If not, you might already have the correct XOR key but aren't aware.  Embedded exe's are a good start to look for since they're common - but remember if we XOR the entire file at once instead of a specific section that you might find the embedded content but that doesn't mean the original file will be readable afterwards (i.e - won't be a Word document anymore since it was XOR'ed) 


Let's try out that process flow in a more automated way (on a new file):


As you can see, it worked like a charm :)

As always, I'm sure there's a better way to code some of the stuff I did but hey, it works for me at the moment.  There's a to-do list of things that I want to further implement into this tool, some of which is already included in other tools.  I've been asked before how this tools will work with smaller XOR keys and that's up to you to test and tell me - I created this in order to tackle the problem solely of the 256 byte key files I was observing so I'd recommend using one of the earlier mentioned tools for that situation, at least for the time being.

Example To-do's:
  • ROL(1-7)/ROT(1-25) - either brute forcing or via YARA scans
  • Add ability to skip \x00 & other chosen bytes (ref)
  • more is outlined within the file....

Download

NoMoreXOR can be found on my github

Wednesday, September 19, 2012

SWF-ing away

This has been ported over to my GitHub site and is not longer being maintained here. For any issues, comments or updates head here.


:: Disclaimer - the intent of this post is for educational and research purposes only.  Don't be lame and use it to steal copyrighted material ::

There's been quite a bit of chatter lately with the recent discovery of the latest IE 0-day.  While reading through one of the other researchers posts I decided to take a deeper look into some of the files being used in these reported attacks.  The issue that some might be experiencing while trying to analyze the related flash files, as did I, is that they're encrypted with doSWF and therefore take a little more effort in order to get down to what we care about.  A quick search about how to go about decrypting this particular type of encryption led me two good articles (one | two).  

With the addition of another user posting a decompiled version of the ActionScript within the file I was looking at I decided to give a quick look into this referenced script and modify its 'decrypt' function to correspond with the information provided.  I thought it would be an easy task but later turned out to result in errors - Java is something I don't care to debug... The thought of having a script to aid in the future automation of decrypting such files would be helpful and might be re-visited but there's also a learning aspect to doing it manually.  With that being said I opted to go about it in a different approach (attaching OllyDbg to IE and dumping the SWF from memory) which would be repeatable in future analysis efforts even if the type of encryption used was different - which makes it more reliable/flexible in my eyes.  There will be some overlap here to previously linked posts but instead of just saying what was done I feel it's useful for others to see how to do it.

So after wiping off the dust from OllyDbg I was able to get the end results I was seeking from my analysis by performing the steps outlined below.  Note that while some of the steps and content below are specific to the file I set out to analyze, other steps can be applied to help analyze other situations you might encounter.

  1. This initial step can be bypassed depending on what your analyzing and goals are but for this particular situation I started off by commenting out the part of the initial landing page which was responsible for initializing the variables and just left the part in which loaded the SWF file:

  2. Open up IE (this could be another browser, again depending on what your analyzing)
  3. Open OllyDbg and attach the IE process (File > Attach).  If you have more than one instance of IE open, make sure you are attaching to the right one!
  4. Open exploit.htm in IE which will load the SWF file
  5. Locate the SWF loaded within IE's memory.  Go back to OllyDbg:
    View > Memory Map > right click > Search (Ctrl + B)
    The search criteria will be dependent on what you're looking to locate, and be mindful of little endian.  In this case I decided to search for "doSWF" which would be displayed here as "64 6F 53 57 46" in HEX.

     
  6. There may be multiple hits of the text you are searching for - each of which will pop up in its own 'Dump' window.  Take a look at the context of where the found text is located and continue (CTRL + L) until you get one that looks like it's right.

  7. Once you come across what looks to be your decrypted SWF file, within its Dump window; right click > Copy > Select All .  Now you might disagree with this approach and say why copy everything out instead of just what you're after but I'd rather copy it all out and worry about carving the exact SWF file later rather than manually calculating the correct SWF length and carving it out from OllyDbg (more on that latter).

    Once it's all highlighted; right click > Binary > Binary Copy


    Open a hex editor (HxD in this example) and paste the copied information we took from OllyDbg into a new HEX file; Edit > Paste write (CTRL + B )

    You can scroll down to see the 'goodies' which also help in showing it's not encrypted anymore since these are strings we were unable to previously see:
    • File header (46 57 53 | FWS)
    • iFrame reference and 'call' statement to blob

    (image was widened to show it all, not all hex columns are shown as a result)
  8. Since I copied everything over you can obviously see there's other junk there which we don't care about and will prohibit us from solely having the sought after SWF file.  As mentioned earlier, this can be solved by either manually carving it out based on Adobe's SWF specifications [PDF]:

    •   first 3 bytes = header
    •   next 1 byte = version(8 bit #, not ascii representation)
    •   next 4 bytes = total length including header (varies if compressed)

    • 46 57 53 = FWS
    • 09  = version 9
    • BD 18 00 00 = 18BD (little endianed HEX or 6333 (decimal)
  9.  (you can see these corresponding values to the carved SWF in an image below)

    ...or have a tool help you out.  While it's useful to know the specifications of what you're analyzing, having a tool to help you limit the chance of you creating an error is always nice so I opted use Alexs' kick-ass tool xxxswf to do the thinking and lifting for me.
       


    No matter how you go about it, once you've successfully carved it out you should now have just the unencrypted SWF you can continue your analysis:


  10. Open up the unencrypted SWF with your tool of choice for analysis.  If you're using Adobe's SWF Investigator then copy out the disassembled text by; SWF Dissasembler tab > click 'open with text editor'.

    Once you have the dissasembled code, scroll down until you see the blob being passed into the ByteArray and copy out what's inbetween the quotes:



  11. Take that copied out data and paste into a hex editor.  Since we see some 'eval' occurring and this blob being a variable in the ByteArray I'm going to say this looks be the shellcode so let's go ahead and pass this extracted code within a shellcode analyzer for ease (scDbg/libemu in this example).  When initially analyzed it detects that the data is XOR'ed with the key '0xE2' :

    In the output of the first run through libemu we noticed there was an error so by adding the '-d' switch and running through it we get the following:
  12. Well that's helpful - now we don't have to question if it is and/or search for the key since it's already provided to us.  This allows us to move on to reversing the XOR with a capable program (xor.exe in this example):


  13. A quick test of determining what the actual file is (shown in the second command above) after being reversed shows it's just 'data'.  Hm, it doesn't appear to be what I'd expect as a result of the shellcode or did something fail when performing the XOR?  If we view the strings of this data we see it displays a link to a file which could indicate a 2nd stage download:



You can continue your analysis as needed from here but for the purpose of this post hopefully this showed you a thing or two that you either previously weren't aware of or was causing a road block in your analysis efforts.  As always, there are many ways to accomplish the task at hand and if you have a more efficient way to accomplish what I've touched by all means pass it along.