Disassembler be like…

Disassemblers like IDA pro are a great tool for static analysis to get an overview of the code and its functionality without running it, but what if the disassembler fails to disassemble a particular piece of code or displays you wrong instructions!!!…well in such cases there may be the usage of anti-disassembly techniques. So in general malware authors plot these special codes in the program in a way such that disassemblers fail to do its task properly ultimately delaying the analysis of malicious code. Still, the question remains, how do we know whether these techniques are in play or not…as we go down the topic, all the questions will be answered. But before going further…… let’s learn something about disassemblers.

Linear and flow-oriented are the two algorithms that a disassembler uses. Linear disassembly algorithm disassembles one instruction at a time linearly until the end of buffer, it can’t make decision which instruction to disassemble to next … does it sequentially and fails to tell the difference between code and data because both are stored in the same format thus producing false result, which is a limitation as it disassembles too much of code blindly whether there is logical flow of code or not, which is an advantage for malware authors. While the flow-oriented algorithm can make choices and decisions in situations like conditional branching, where they are two choices: whether to disassemble the true or false branch first, most of the disassemblers processes(disassembles) the false branch first(remember this line) and IDA pro uses flow-oriented disassembly algorithm.



Focus on 00401515 and 00401517 which have instructions jz and jnz respectively, if you look closely, the target address of both instructions is the same (loc_401519+1), which means it does not matter whether the conditions satisfy or not, it will go to that address…so what’s the problem here? Problem here is that both jz and jnz combined together forms an unconditional jump here with same target address (0x00401519+1) which is one byte away from 0x00401519 that will be 0x0040151A which starts from byte that represents “6A”, so next instruction should start from byte representing “6A”, but as the disassembler follows flow-oriented disassembly algorithm, so it will disassemble the false branch of jnz first, therefore, disassembling bytes immediately next to it i.e “E8” which is an opcode for call instruction (at 00401519).

The reality is that the call instruction (opcode-E8) never gets executed as the target address is 0040151A (loc_401519+1) thus cross-reference appears red.


Use key “D” to convert byte “E8” to data and key “C” to convert bytes into instruction.

Here we see a conditional jz at 00401021, now we know which branch is going to be disassembled first, but if you look at the previous instruction at 0040101F — xor eax, eax (used to clear the contents of the register) is unusual in the context of its usage just before jz instruction. Explanation of Unusualness jz instruction jumps to the specified location if the ZF (Zero Flag) is set or ZF=1, as the instruction xor eax, eax clear the contents of eax register and sets zero flag to one (ZF=1) thus…no matter whatever the condition it checks, ZF will always be set which makes jz instruction unconditional in this case, so again the disassembler will process the false branch first which will produce wrong instruction as target address is 0x00401024 (loc_401023+1) which is one byte from 0x00401023 i.e “8B”.

The solution is the same as for the previous one…..use key “D” to convert the byte to data and key “C” to convert bytes into instruction.”.

Starting from 0x00401215 …. we have a four-byte sequence, the first two-byte “EB” and “FF” make a two-byte jump instruction with target location (0x00401216) which is the second byte of jump instruction itself i.e “FF” but bytes “FF” and “C0” combine to form inc eax instruction and instruction after that is dec eax instruction which becomes a NOP instruction conclusively.

It can be solved in the same manner as above by “c” and “d” keys.

Above depicted is the correct listing of instructions, another thing you can do is to NOP out the instructions so that listing starts from 0x00401219.

This technique is mainly focused on the use of retn instruction as it can be misused to change the control flow of the program or as an anti-disassembly method in this case. The call instruction to transfer the control within a program…technically when it is executed, the address of the instruction immediately after it is pushed on the stack and the retn instruction will pop out the value from the top of the stack that was pushed by the call instruction so that the program can continue from where it left.

In this scenario, instruction at 0x004011C0call $+5 simply just calls the instruction immediately next to it…but as a result of this execution, the address 0x004011C5 is pushed on the stack. When add [esp+4+var_4], 5 is executed, the final value in esp will be 004011CA…but how??

Ans- Well the culprit is IDA Pro here…. at 0x004011C0 var_4 represents constant -4 , so add [esp+4+(-4)], 5 will be add [esp], 5…where esp=004011C5…therefore → 004011CA and esp(stack pointer) point to the top of the stack as we all know. Now the retn is executed which pops the value from the top of the stack into eip register, so in this case, the instruction that executes is at 0x004011CA …which looks like a legitimate function but due to the presence of rouge retn instruction it appears not be a part of any function.

Structured Exceptions Handlers or SEH chain is a linked list of functions that are used to handle exceptions in a program. These can be misused to fool the disassemblers. FS segment register is used to gain access to thread environment block (TEB), the first structure in TEB is thread information block(TIB) and the first element in TIB is a pointer to SEH chain and it functions as a stack.

In this case, instructions from 00401497 to 004014A3 are used to build an exception handler. Instruction at 00401497 pushes an exception handler on the stack which is at 004014C0. This exception handler will be triggered when an exception occurs …which it does when instruction at 004014AC (div ecx) is executed as ecx is zeroed out in the instruction previous to it (xor ecx, ecx), thus causing a divide-by-zero exception causing the execution of exception handler at 004014C0 but disassembler fails to recognize it, so turning that into code by pressing “c” key at 004014C0 will resolve that issue.

After correcting the representation, this code will remove the exception from the stack. So the conclusion is that the instructions from 004010AE to 004010BF never get executed and that exception handler could be malicious code that is hidden because of misrepresentation done by the disassembler.

Stack Frame — A collection of local variables, arguments passed to the function, return address and previous function’s base pointer. A function can easily be analyzed by extracting information from the stack frame but disassembler’s analysis of stack frame could be defeated by using nefarious tricks.

The second column from the left shows esp value relative to the beginning of the function. At 00401549, there is a comparison instruction, if the value of esp is less than 1000h then it jumps to 00401556 where 104h is added to esp, else it goes to 00401551 with value 4 added to it. Since it followed the less than branch esp value of becomes -F8 (104h-Ch), but it doesn’t seems right as the stack pointer has a negative value …right?? because if it does, then it means the calling function’s stack frame could be damaged and it thinks that there are 62 arguments (F8h →248, 248/4=62) being pushed on the stack which it is not true because in reality there are only 2 local variables and no arguments. The problem starts when esp is compared with 1000h at 00401549, which forces the disassembler to make a choice depending on the value of esp and adding 104h to it as it seems a legitimate instruction, thus making a wrong choice that caused the misrepresentation.