Malcode Call API by Hash

Author: David Zimmer
Date: 05.12.10 - 6:18am

Its standard MO for shellcode and packers to lookup all of the api they need by hash value, store the pointers in a table, and then reference this table for all their api needs.

Malicious code authors have taken a liking to this and have started to use this technique as their main way to call every single external api they access.

As an added headache for RE guys, they also dont bother caching any of the results and storing them in a pointer table, they prefer to just call the hashing routine each time fresh.

An example might look like this:

seg000:814EB7DC                 push    89379244h
seg000:814EB7E1                 call    Get_NT_DLL_Base
seg000:814EB7E6                 push    eax
seg000:814EB7E7                 call    GetApi_By_Hash
seg000:814EB7EC                 push    esi
seg000:814EB7ED                 call    eax

This is a pretty effective technique because no single pointer entry can be renamed in IDA to make the code readable. Instead you have to search and find every push [hash val] and then add a comment at each one with which api that represents.

In a pointer table scenario, you wait for the table to get filled out, then you can just dump in Olly and it will tell you which address aligns to which api all at once.

when the addresses arent stored anywhere, you have to generate your own listing of the all of the possible hash values for a given dll. In order to do this it takes much more work and you have to do some custom analysis and programming. For user land stuff, I use my iDBG library to extract data at breakpoints. For kernel land, you have to use WinDbg.

As an example here is a clip of a hash lookup function:

seg000:814EBC72 next_import:                             
seg000:814EBC72                 mov     eax, [ebp+arg_0]
seg000:814EBC75                 mov     eax, [edi+eax*4]
seg000:814EBC78                 add     eax, esi
seg000:814EBC7A                 push    eax             ; api export name
seg000:814EBC7B                 call    generate_hash
seg000:814EBC80                 cmp     eax, [ebp+arg4_hash_val] ;eax=its hash val
seg000:814EBC83                 jz      short match_found
seg000:814EBC85                 inc     [ebp+arg_0]
seg000:814EBC88                 mov     eax, [ebp+arg_0]
seg000:814EBC8B                 cmp     eax, [ebp+var_4]
seg000:814EBC8E                 jb      short next_import

And the WinDbg breakpoints to log the data i need:

bp 814ebc7a "da eax;g"
bp 814ebc80 "?eax;g"
bp 814ebc90 ".echo dumping complete"

This code is called in a loop that cycles through all of the exports of NTDLL. So to start logging you have to set a bp at the function start the logging breakpoints mentioned above, and a breakpoint at each exit path to the function. You also have to edit the arg4_hash_val so it never finds a match and cycles through all of the exports. The reason for the bp at the function start is important, because you want to make sure in your messy logging dump that no other threads enter the mix and screw up your dump. To get a clean dump, I had to create it at a very early stage of system startup when the rootkit was just initilizing and before any user processes were running.

If that doesnt sound like a pain in the ass yet, i dont know what does!

but were still not done. now we have a long text file like:

805405c5  "CcCanIWrite"
Evaluate expression: 1600456038 = 5f650566
805405d1  "CcCopyRead"
Evaluate expression: 1163106347 = 4553982b

That has to be parsed into a clean format and it has to be verified that all values align and are correct with observed results.

Once that is done, now we have to do a bulk search of the disassembly for each time one of these hash values are pushed onto the stack and then add a comment to each one with the human readable api name so the disasm is readable. (again there is no standard tool for this task, you have to write a custom script of plugin):

Even after all of that...the disasm is still broken in regards to being readable to some things. One tool which can be very handy is IDAs "Chart References From" call browser. With this, you click on a function and you can see all the functions called within its call chain. WHen the malcode guys program in this way, all you see are teh references to GenerateHash and not a chain of the actual API that are being called.

Some malcode even take it a step further in that instead of a direct call eax in the main body of the function where its used, they handle the call within the hashing function, which makes even api logging and direct return address logging useless because it always shows the exact same return address for every single API accessed. (Unless you modify your Api logging code to extract the next return address on the stack and log that instead of the current one, can be done but just another pita)

Of course they know all of this, which is exactly why they do it. Cat and mouse

So..if we have to deal with code like this, we have to develop tools to automate all of these steps or at least make it more managable to deal with in a semi efficient manner.

So just to keep track of the score. Calling all api by hash like this forces us to:

Analyze hash lookup function
Extract live hash data
custom tool to parse data into usable format
custom tool to search idb push [hash] and add comment of api name
custom tool to search call chains and include these types of call eax API references somehow in the graphing.

Quick Update:

I ended up writing a quick plugin to handle this task. Generically its a search form that lets you add bulk comments to all the search results in one click. For more specialized tasks and for automation, i made it scriptable and gave the scripts access to some powerful wrapped functions. So the script loads a text file of all the hash to name values, cycles through them all one at a time, does a search, adds comments, then scans the next call eax and adds a xref there so its graphable. Screen shot below with the core logic from the vbs script. Worked out pretty spiffy and all specialized code is in the vbs script which can be easily written per task.

Comments: (4)

On 05.12.10 - 7:37am Dave wrote:
Glad i asked Ilfak about the call eax type graphing.
No need to add comments, just add an xref and it will be taken into account when drawing the graph. Some pointers: API: add_cref/add_dref IDC: AddCodeXref/AddDataXref Python: add_cref/add_dref User interface: View, Subviews, Cross-references, Ins Do not forget to use XREF_USER.

On 05.12.10 - 7:54am Dave wrote:
So in light of the conversation with Ilfak, and his tip on the call eax xref being graphable, he had another suggestion that would being it together nicely. Instead of a bulk hash search/comment add, (generic tool) Do a search for the hash value, then scan the disasm for the next call eax (or whatever technique they use) and then auto add an xref there to teh proper api. More specialized code, but much more useful indeed. I have a generic bulk commenter, I think I will adapt a copy for the scan and xref and give it a shot.

On 05.12.10 - 8:34am Dave wrote:
one more random note. So AddCodeXRef prototype is

void AddCodeXref(long From,long To,long flowtype);

Since the Api doesnt have any address in the current database, you have to find a null pad, rename that offset to be the API name, then go back and do a AddCodeXref. Probably also want to add a comment with teh name because the xref doesnt show up in disasm text, but it does indeed show up when graphing the xrefs from that function which the primary thing. example
seg000:814F0B49 RtlInitUnicodeString db 0 seg000:814EBF31 push 5E35B3F4h ; nt!RtlInitUnicodeString seg000:814EBF4A call eax AddCodeXref(0x814EBF4A,0x814F0B49, XREF_USER | fl_CN );

On 01.10.11 - 6:10am Dave wrote:
also check out the newer blog posts on this topic

Api Hash Gen
Api Hash Gen Redux

Leave Comment:

Name:

Email: (not shown)

Message: (Required)

Math Question: 23 + 19 = ? followed by the letter: N