Understanding UDT's


Author: David Zimmer
Site: http://sandsprite.com

Environment: Visual Basic 6

Understanding UDTs



Download code samples in article

Allot of us use UDTs in our programming, but how are they are really laid out in memory?

As my programming progressed, I went through a definite phase where I used UDT's allot. They were a nice organized way to clump together a bunch of settings / variables into a nice discreet package.

While UDTs do have thier downfalls, there are still a handful of programming tasks they are perfectly suited for. This article is going to take you under the cover some on just what UDTs are, how they are laid out in memory and some handy manipulations you can do to get the most out of them.

For our first example, lets look at a simple udt defined with 4 numeric elements.

Type test
	a as Long
	b as Long 
	c as Long
	d as Integer
end type
In VB, the long datatype takes up 4 bytes of memory, the integer, 2. This puts our total UDT size at 14 bytes of memory. We can confirm this with the following code

Dim i As Integer, l As Long, myUdt As test
MsgBox "Integers are " & Len(i) & " bytes long"
MsgBox "Longs are    " & Len(l) & " bytes long"
MsgBox "My UDT w/ 3 longs and an int is: " & Len(myUdt) & " bytes long"
To really dig behind the scenes, we are going to need a way to look at the actual UDT held in memory. Unfortunately VB does not provide a memory dump window for us, so we will have to illicit some outside help. There are many programs out there that will let you read a processes memory, everything from game cheats to system debuggers. What I will use for this demo is the debugger that comes with Visual C++ 6. (you can also use windbg which is free and can be downloaded from the MS site)

First we create a simple vb project which loads up our test UDT with some values. After the UDT is loaded, we will display some of its properties.

dim myUdt as test
with myUdt
	...    'load udt with values
end with 
MsgBox "Address: " & hex( varptr(myUdt) ) & " ByteSize:" & len(myUdt)
When the MessageBox fires, it will tell us the memory address that the UDT resides at, as well as the size in bytes of the structure.

The MessageBox also performs a second function for us. Since MessageBoxes are modal, they pause the execution of the code and give us time to attach the debugger and check out the memory location of the UDT.

To attach the debugger, first compile the exe, then manually start it up. You will see the MessageBox with the info we need. Now start up your debugger and attach to the process. In Visual Studio this is done through the Build -> Debug -> Attach to Process menu.

After our debugger is attached to the program, we want to probe the memory location of the UDT. Make sure the memory window is open (View - > Debug Windows -> Memory) Now enter the hex value of the address of the UDT.

Here is a screen shot of the code that loaded the UDT and the actual UDT in memory. (Code is included in sample project download)



In the screen above, I have highlighted the 14 bytes that make up the UDT created by the code. I have also color coded each member of the UDT based on its variable type length. (Remember that Longs are 4 bytes long, integers 2 etc)

One thing that looks kind of strange is the value of member A. In the code it was loaded with the value 258, however in the memory we see it stored as 02 01 00 00. Is this right?

When numbers are stored in memory, they are stored in the little endian format. This is really just a fancy way of saying..their bytes are read from right to left. 02 01 00 00 in memory is actually the number 00 00 01 02. Which in hex is &h102 ( 258 decimal. )

Ok, so a simple UDT is just a block of sequential memory, with its numbers stored in a funny format...how does that help me?

Well, I could of just made you the 10,000 jackpot winner on jeopardy.. but in more immediate terms, this background knowledge can help us understand UDT's some more and let us do some kind of unconventionally things with them.

Lets say we wanted to make a copy of a UDT, with CopyMemory Api we could literally clone the structure to another one.


Dim udt1 as test, udt2 as test
CopyMemory udt2, udt1, len(udt)
Of course this is not that helpful, after all we could assign udt2 = udt1, but this example proves our point that since we know how the UDT is laid out in memory, we can use that to our advantage and manipulate it from the raw memory layout.

Lets say we had another UDT structure that had the exact same layout of its first 4 elements, since these are both just blocks of memory of known size, we could actually load up this seconds UDTs first 4 members from our other UDT type!

That could be handy, handy to know anyway (we will get into just where latter on :)

How about if we wanted to store a UDT as an array of bytes? Now that is something handy. ( Actually..that is what drove me to this research. )

We know the size of our UDT, we know how many bytes it is and that they are all sequential. So lets store them into a byte array.






Note: I used a base 1 byte array because the length returned from len() is base 1. You could just as easily use a base 0 array, but would have to subtract 1 from the len(udt) return when you dimensioned your array)
If you look at the values displayed in the immediate window, you can see that the byte array now holds the same contents as the UDT was shown to hold in memory from above.

From here..you can either save them in some alternative format, easily pass the byte array to other functions, store it in a database , reconstitute it with another call to CopyMemory, or even use your knowledge of the byte layout to perform functions on it by its byte values.

One such use of these byte level manipulations is to use this technique to help you get the low and hi bytes of an integer. Knowing that an integer is composed of 2 bytes can you think of how to put this all together to extract the hi and low bytes from it?





Ok, so that all looks pretty straight forward doesn't it? But I have noticed you have not included any examples with strings or arrays in the UDT does this same method hold true for more complex UDT's ?

Unfornatly no it doesnt :(

Lets try another experiment.

type test
   a as long
   b as string
end type

dim t as test
t.a = 1
t.b = "this is my string"
msgbox len(t)


This code tells us that the length of the test structure is 8! How can that be? Humm, lets dig behind the scenes some more. Using the debugging and memory dumping techniques we went through at the top of the article, lets see what is happening to our structure that contains a string.



8 bytes is the same as 2 longs...looking at the memory dump window, we can see that the myString variable is actually a long pointer to a string. If we then look up the memory address it points at, we find our string. (confirmed by the MessageBox value of strPtr(mystring) )

Because of the way UDTs that contain strings and arrays are handled, this does knock out some of our previous bags of tricks on these more complex setups. Then again...knowing this limitation and how it all works, we are still better off than we were. So we know what we have to work with anyway right.

Its clear that our CopyMemory tricks cannot transfer the contents of the strings over (because as soon as the udt that owned the string goes out of scope, there goes our valid pointer to the string)

Is there any other way we can dump a more complex UDT type out and either save it as an array of bytes or reconstruct it from an array of bytes?

As luck would have it there is :) When I was browsing through some old C documentation I noticed a technique they were using to dump their structures straight to disk. This would allow you to save your configured object to a file with a single line of code. Curious to see if VB had implemented such a feature for its UDT handling, I whipped up the following tidbit of code.





This code is actually from an earlier paper I wrote available here The papers main focus was on describing how you can dump even complex UDTs to disk and then easily reload them with only a couple lines of code.

That is pretty handy, and a good thing to know. It is also interesting that VB's Put and Get command were built to be pretty smart. We know that complex UDTs aren't stored in memory as a continuous block, however the VB Put command is kind enough to pack the whole structure and data into a new format for us so that it is complete when dumping it to disk.

If you look at the file dump and read the other article, you will notice that preceding each string stored in the file there is a length counter for how many bytes are in the string. This reflects the fact that VB uses an OLE type called a BSTR to store all of its strings. BSTRs are Unicode strings prefixed by a long value (4 byte) length counter.

This is how VB knows where the string ends. Some other languages hold strings as null terminated. that is they read the string up until the first null character (byte 0) . Since VB strings are Unicode, and every other character is usually a null, that just wouldn't work for us :)

Two cool side effects of this is that our strings can contain embedded nulls without penalty, and it is very fast for VB to tell us the length of the string because it only has to read this prefix counter value. Some other languages actually have to loop through each and every character in the string incrementing a counter variable until they hit that null terminator to find out how long a string is!

Now, if we look at the above memory dump of the string above, we see the purple highlighted area. This is actually the string length variable we were just discussing. Its is the first 4 bytes just before the strptr(mystr)

Strings are just about always better off manipulated as strings, but with this lil tidbit of trivia...at least now you know how you could locate the string in memory and determine its byte length from the raw UDT data.

Ok, I guess that is enough for now. Hopefully this paper was a good description of UDTs for you, and will give you some insight into some other tricks you can do with them, as well as the things you probably don't want to try to do with them!

Also I hope you got a good idea how to use the debugger to probe through and do your own research to figure out your own burning questions for yourself. Once you have someone walk you through it once, I think you are going to find that knowledge never goes out of scope and will end up building like a snowball rolling downhill.

Downloads

Download demo project - 22k

History

Date Posted: [July 18th 2003]