Author: David Zimmer
Date: 08.25.19 - 8:37am
Next lets look at a pcode that accepts a variable number of arguments. Lets start with my favorite pick up line.
Print #f, "hi", "you" 401881 FF0E 08001000 PrintFile Print #f, "hi", "you", "smell" 401898 FF0E 09001400 PrintFile Print #f, "hi", "you", "smell", "good" 4018B3 FF0E 0A001800 PrintFile Print #f, "hi", 1, myCurrency, myDouble 401868 FF0E 07002000 PrintFileWith these variations we can see the arg bytes changing. If we assume two words, the last argument is growing 4 bytes every time we add a string to the call. So that is our stack cleanup or stack check variable as seen in the previous instruction. The last call with a Currency and Double gives more confirmation each with size 8.
The first word is incrementing by one each time we add an arg, and its not the arg count. Every string we add to a function will have an entry in the const pool. You can leave the print call the same, and add a string constant above it and see if that first word changes. If it does, then its a pool index. That does happen in this case, so lets take a look at the pool entries using the const pool viewer form.
Each pool entry is an address that contains the following embedded data: (in same order as above)
02 00 08 88 00 03 00 08 08 88 00 04 00 08 08 08 88 00See the pattern? 08 is VT_BSTR. Let me break the last one down for you: (note: 04 00 is the little endian 2 byte word 0004 and the runtime actually does an AND 0x3f on all the value, (and havent bothered to look at what the remaining high bits of the final byte represent yet))
Print #f, "hi", 1, myCurrency, myDouble 04 00 08 02 06 85 00 - 4 entries: BSTR, I2, VT_CY, VT_R8There are other opcode which take a variable number of bytes. FFreeAd for example can free multiple strings all in one opcode. That one uses the following format:
464576 29 [12 bytes] FFreeAd var_98 var_150 var_154 46457F 36 [24 bytes] FFreeVar var_AC var_CC var_EC var_10C var_12C var_14C 46458E 00 03 ErrNext loc_464591 464576 29 06 00 68 FF B0 FE AC FE 36 46457F 36 0C 00 54 FF 34 FF 14 FF F4 FE D4 FE B4 FE 00 03This time the variable byte lengths are embedded right into the opcode byte stream itself and not in a referenced const pool entry. You can see the first opcode byte, then a 2 byte length in little endian format, then XX bytes representing the var_XX (all ebp- negative values starting with FF or FE) then at teh end I also included the first opcode of the next instruction that you can also see in the disassembly dump.
The reason i point out the different methods they use is so you can get insight into their heads and see the various techniques they were using when they designed their pcode implementation. The more we see the faster the reversing gets.
These two different opcode implementations were probably coded by different people or at different times. I would assume MS developers had their own pcode disassembler and probably even a pcode debugger too.
Using the const pool for variable bytes is a bit more complex, but also cleaner. Also in this case since they are doing AND operations on the embedded const pool data, the data is more complex. Both hint at it being a latter language feature. Freeing strings and variants would have been one of the first requirements. Updating PrintFile to accept more complex datatypes an extension.
Sometimes I wonder if anyone understands wtf I am talking about in these posts lol. I know its a very very small crowd who would understand and even a smaller subset of those who would care but it tickles my fancy so heeerree weeee arrreeeee.
I guess this is my CompSci 601 class in Virtual Machine Design.