SRC: GITHUB (local copy) BINARY: RELEASES (local copy) (Looks like C++ 11/VS 2010 + QT 4.8 -dz)
It’s the 2019 year now but there’s still something to be discovered about Visual Basic 5/6. From the realistic point of view, official compilers were discontinued a long time ago. Maybe the authors of malware cryptors or open source projects (PDFStreamDumper) will disagree, but I think the platform is now dead. Still, you can write programs and they will most likely work normally under latest Windows 10 builds.
This article was created when I was asked «to identify the source of some strange information in VB6 binaries and to extract it reliably». This task was given by professor Igor Yurin from the university I was studying/working at (Saratov State University). So he was the first who noticed this anomaly.
(I was unable to find any other research on this topic, so if you know about any other – contact me and I will add links here)
The research itself was done about three years ago and remained unpublished due to being incomplete and not useful practically. Until the moment delete or publish happened. I decided to summarize all I have.
I organized some information about internal structures of VB5/6 files, it is not very important for detecting the main topic, but I still include everything. If you are not interested just read next section and jump to the end (Vulnerability section).
Sorry for the grammar (please, contact me on twitter @sysenter_eip if something needs to be fixed).
While VB5/6 compiler is doing his job (compiling and stuff), it uses memory from the process heap to build objects that later are to be saved into output binary. This memory is not zeroed hence some objects can contain uninitialized values. This memory can be statically extracted and it can contain some interesting data (like source code and environment variables).
Compiled binary for windows and its source code can be found HERE.
Code quality is low, but it’s working. I will probably make a console utility for Linux/Windows so researchers with large malware collection can scan their files to see if anything interesting can be extracted. (once done link will appear HERE)
First of all we need to acknowledge important research on this topic. First one is an old research by Alex Ionescu [1]. Second good source of information is the source code of Semi-VBDecompiler [2].
I was also able to obtain some debug symbols for Virtual Machine and compiler. It looks like it was supplied with Visual Basic 6.0 Enterprise Edition.
Name | Version | Size | md5 |
---|---|---|---|
msvbvm60.dbg | 6.0.8268 | dbg : 2 826 996 dll : 1 409 024 |
dbg: 8D4E57DC2A426CA2FB79BC1900F7B544 dll : 8D4E57DC2A426CA2FB79BC1900F7B544 |
vb6.dbg | 6.0.8176 | dbg :3 747 408 exe: 1 880 064 |
dbg: C84ADEFDE7428C6C31F1BF780DA87A4C exe: 037C4B5B4CD2809DB33A09BBB37CC693 |
vba6.dbg | 6.0.8169 | dbg :1 179 760 exe: 1 701 648 |
dbg: 48552645E93A2FD65F8683147822AA5E exe: C3D0CA107B96837748373563088B73E8 |
Table 1. Debug symbols
I was not able to use vba6.dbg file (most likely it doesn’t contain anything useful at all), while other two were pretty helpful since they contain function names with prototypes.
In this article I will be referring to the names from debug information if possible. If no symbol are present then use the decompiler [2] notation.
Most things described here are for Visual Basic 6, but it should be also applicable to VB5.
Sometimes if no symbols are present then I will use VA (Virtual Address) as if file was loaded at default ImageBase (no ASLR, so it is more convenient).
All programs created by VB5/6 compiler are 32-bit PE format applications. No compilers were created for 64-bit, since platform was deprecated long before 64 become popular.
The first important difference from the usual PE files (Unmanaged VisualC++, Delphi) is the import table. It usually contains only one library (MSVBVM60.DLL for VB6 or MSVBVM50.DLL for VB5) – the runtime and virtual machine. This library contain functions actively used from VB program. Besides it will also do an environment initialization and pass the control to program’s real entry point (think of it as Main) if binary is Native. For the P-Code this runtime will be also interpreting the bytecode.
Figure 1. On the left import of the program compiled with MSVC++, on the right – VB6.
Every VB5/6 file has a structure EXEPROJECTINFO, which is essential for the runtime to do an initialization part.
Locating this structure can be extremely hard in general, but if no modifications were made to the executable this task is simple.
For exe files pointer to this structure can be found at the EntryPoint. It always looks like shown on the Figure 2.
Figure 2. EntryPoint of VB6 application. Highlighted bytes can be used as signature for locating EXEPROJECTINFO at EP.
A little different with VB5/6 libraries (dll files). At the EntryPoint there are no pointers to this structure. However such library will always have at least 4 exported functions: DllUnregisterServer, DllGetClassObject, DllRegisterServer, DllCanUnloadNow. Each one of them have a PUSH with desired pointer.
Figure 3. This is actually a code template compiler is using while building the libraries. Take a note that this template is continuous and could be found with exactly same layout in the output file. This means we could use it as a pattern for signature search to locate EXEPROJECTINFO without parsing an export table.
Compiler also have templates for the EntryPoint of exe and dll.
Figure 4. Template of the EntryPoint for exe and dll.
Nothing is stopping user from obfuscating the EntryPoint making extraction of this structure a very hard task in static (body of the structure itself can also be decrypted only in runtime). This leads us to the fact that the only way to locate it generically is by hooking the runtime itself. This is too complicated so we will rely only on signature scanning (which will work in most cases).
Next sections will describe structures which compiler is writing inside VB executable files. Each has a lot of field, some of them completely useless, some required to move deeper. I will be giving short summary of fields to highlight the most useful ones.
EXEPROJECTINFO (VBHeader unofficially) is the first main file structure (starting point), it contains links to other structures and also some information related to VB project (like user specified project name and description). Generation code is located in vb6.exe :: WriteRubyExeData.
Figure 5. EXEPROJECTINFO
szVbMagic usually contains magic signature «VB5!» which is 0x21354256 in HEX. This marker can be used to statically detect this structure (official VB 5/6 compiler always fills this field), but runtime never actually checks for it, so in practice it can contain anything.
Another interesting field wRuntimeBuild contains version identifier of runtime used to build this file (it unique identifies compiler’s vb6.exe). This allows to identify exact compiler version. For example, value 9782 (decimal) corresponds to latest Visual Basic 6 SP6.
Figure 6. Field wRuntimeBuild is set inside WryteRubyExeData.
szLangDll – byte array of size 14. In case the first byte is not «****» (0x2a), field contains name of the language library (for example, «vb6ko.dll» or«*VB6ES.DLL»). Can be used to identify localization of the system.
szSecLangDll – another 14 bytes array. This field contains either «*****» or «~». It might contain something besides mentioned, but no real samples were found.
lpProjectData is a pointer to structure The Project Information, original name is missing from debug symbols (unofficially: tProjectInfo).
lpGuiTable – pointer to an array of EXEFORMINFO (tGuiTable) structures. EXEFORMINFO is stored one after another in count of wFormCount.
lpExternalTable – pointer to the first structure of EXEOCXINFO (tComponent), which contains information about OCX modules used by this program. Count is at wExternalCount. Next EXEOCXINFO structure can be found at offset EXEOCXINFO::dwStructSize counting from the current one.
lpComRegisterData – pointer to tagREGDATA structure, which describes used COM objects.
bSZ* this group of fields store an offset to NULL terminated string (base at the beginning of the described structure). Each of the fields have self-describing names. For example **SZProjectExeName contains name of the file without extension which compiler used to save output file. You should be careful while parsing this strings, since it’s single byte encoding there can be chars which are specific to user’s locale (ex. cp1251 charset for Russian locale).
Some of bit fields can be used to restore compiler’s flags. Sources at [2] will be helpful in that.
Filled at vba6.dll :: 0x0FB11783.
Figure 7. The Project Information
lpObjectTable – pointer to a table of program objects (The Object Table).
lpNativeCode – if this field != 0 then the project is Native, other way it is P-Code.
lpExternalTable – pointer to array of External API Descriptor, which describe WinAPI functions used by the app.
Next fields are ignored by runtime, but always filled by compiler.
dwVersion – contains 500 (0x1F4).
lpCodeStart – pointer to marker - DWORD, value 0xE9E9E9E9.
lpCodeEnd – pointer to marker - DWORD, value 0x9E9E9E9E.
wsPrimitivePath – originally part of wsPathInformation. It determines what data is stored in wsProjectPath. In all files I had checked it either contain zeroes (\0), making next field invalid, or contain UNICODE string «*\A». Code that was writing values to this field was removed in SP6 update.
wsProjectPath – Full path to *.vbp file, if wsPrimitivePath is valid.
By parsing this structure you can extract compiled FORMS from application. Field lpFormBody points to a blob of data. Decompilation of this data is possible by using «vb3 binary to text» tool [2]. Blob size is stored in dwSizeOfFormBody.
Filled inside vba6.exe :: WriteFormExeData.
Figure 8. EXEFORMINFO
Descriptor of OCX files used. Purpose of the binary blobs contained in this structure are unknown.
You can find how compiler is working with it at msvbvm60 :: CreateOcxDefFromExe.
Figure 9. EXEOCXINFO
All offsets are from the base of tagREGDATA.
tagREGDATA::bRegInfo – offset to tagRegInfo.
tagREGDATA::bSZ*** – offsets to strings.
tagRegInfo::bNextObject – offset to next tagRegInfo, last object will have it zeroed.
tagRegInfo::fObjectType – one of the values from Table 2.
tagRegInfo::fIsDesigner – reserved on VB5.
tagRegInfo::bDesignerData – offset to The Designer Info, if tagRegInfo::fIsDesigner != 0. Reserved on VB5.
Value | Name | Description |
---|---|---|
0x2 | Designer | A Visual Basic Designer for an Add-In |
0x10 | Class Module | A Visual Basic Class |
0x20 | User Control | A Visual Basic Active X User Control (OCX) |
0x80 | User Document | A Visual Basic User Document |
Table 2. fObjectType values
The Designer Info::cbStructSize – size of some fields of The Designer Info (look at Figure 12 for details).
You can notice that at Figure 12 information about Addin Specific Data is missing. This structure was not required for this research, so I only determined the conditions on which it gets filled.
Window that can be used to put some values into structure can be seen at Figure 10 (numbers are showing the order of windows to be opened).
Figure 10. Addin Specific Data in GUI
After compilation contents of Addin Specific Data can be found in the way like at Figure 11.
Figure 11. Addin Specific Data in raw
Figure 12. Schema
lpProjectInfo2 – pointer to The Secondary Project Information.
dwTotalObjects – count of objects in lpObjectArray.
dwCompiledObjects – value >= dwTotalObjects.
dwObjectsInUse – same value as dwTotalObjects.
lpObjectArray – pointer to array of The Public Object Descriptor. Depending on the source different size fields are used for this field. Most likely correct one is dwTotalObjects. Usage of dwCompiledObjects usually leads to errors.
Figure 13. The Object Table
Figure 14. The Secondary Project Information
szProjectDescription and szProjectHelpFile are pointer to strings with self-explanatory contents. It is common to find this fields filled with unique information (different from the one at EXEPROJECTINFO).
lpObjectInfo – pointer to The Object Info
lpszObjectName – pointer to C-string of the object name.
lpMethodNames – pointer to array of method names . Count is stored in dwMethodCount.
lpPublicBytes – pointer to RESDESCTBL.
lpStaticBytes – pointer to RESDESCTBL (different from previous one, but with the same format).
fObjectType – bit field, describes type of the object. For us it’s important to check the bit 1 (HasOptInfo), if it is set we can find The Optional Object Info after The Object Info.
Figure 15. The Public Object Descriptor
Figure 16. The Public Object Descriptor :: fObjectType values from [2]
Figure 17. The Object Info
Figure 18. The Method Info
Figure 19. The Private Object Descriptor
lpGuid – pointer to GUID of the current control.
lpszName – name of the control.
lpEventTable – pointer to table of control’s methods. Description can be found in IDC script for IDA [3], code is simple but quite big (a lot of different event names).
Figure 20. The Control Info
Despite the fact that VB5/6 files have no PE import entries besides runtime library, any program can freely use WinAPI functions it wants. This implemented using another internal structure.
As was already said before there are External API Descriptor structures present in the file. dwType of which can be one of two values:
6 – imported by GUID.
7 – import by library + function name (the most common one for PE files).
Figure 21. External API Descriptor
Used for «packed» storage of different types of variables. RESDESCTBL is a header (fixed size), followed by multiple RESDESC (variable size).
In VBParser source code you can find a function which is able to calculate size of this structure. It was recovered from the runtime code.
By looking at this function we can tell that last 4 bits of field wTypeFlags identifies the type. Later we will be referring to this numbers as Type 5 and Type 9.
Total block size can be found at RESDESCTBL::wTotalBytes. Use it to check the correctness of parsing.
Figure 22. RESDESCTBL / RESDESC
If you ever looked at VB files in HEX view you may notice that it sometimes contain path for files with *.olb, *.tlb, *.oca and other extensions.
These files contains information about external types used by the program.
.ocx - ActiveX Control
.oca - Extended type library/custom control cache file that goes along with a .ocx
.tlb - COM interface definition.
.olb - A Microsoft Object Library file that contains information referenced by Microsoft Office components.
We would probably want to detect all these paths without use of regular expressions (as text strings can have unexpected values, which we want to detect, too). As far as I know there are at least two structures that is pointing to this paths. But both have zero references leading to them (one of the fields – “ideData” is pointing to External Library Header, but this link is valid only during compilation).
Figure 23. GUID references
It was noticed that this structure can be located indirectly. There is a GUID which is referenced by both desired structure and some other (The Control Info). This means that we could collect all GUIDs from the control description and then find all places in the program which have a reference for each of these GUIDs (results will contain among the others pointers from External Library Header structure).
Described method is working in most cases, but in general it can miss. GUID which is referenced by desired structure may not be present in the control list. In this case there will be links to it from the program code itself (as an argument for VB specific functions).
While reversing VB5/6 programs we noticed that in some cases executable files contain parts of file system path. We all know that something similar is possible with informative assert strings or PDB paths. But in this case we were not able to tell what is it exactly. Lack of file format documentation and public researches forced us to dig into compiler.
Figure 24. One of the first discovered samples
First of all we downloaded a bunch of source codes and started to compile it one by one, checking every generated file for such anomaly. Quickly we got first results which revealed us that not only path can appear in file, but also environment variables and even source code.
It became obvious that such trash is part of the uninitialized memory and compiler is simply forgot to memset it. There were few possibilities of what memory is leaking (stack and different ways of heap allocation).
After some tests, a special library was created that hooks RtlAllocateHeap and fills all allocated memory with predictable values. I also made code to hook every function prolog (push ebp, mov ebp, esp) in suspected module to memset negative stack the same way. (quick test lib -dz)
Finally there were 5 different leaks found on heap. I will describe each one of them in the next sections. Stack leaks were not found.
As were said before this structure stores some packed values. Format is quite complicated, so instead of making a complete parser I made a few masks for each of possible cases to extract leaked memory.
Type 9. According to size calculation function, this structure occupy 4 (header) + 24 bytes. But usually only 6 bytes are used. Remaining 18 will most likely leak some memory.
HEADER
WORD wUnused1; // leak
WORD wUnknown; // usually 0xffff
DWORD lpSubResDscrTbl;
WORD wUnused2; // leak
WORD wUnused3; // leak
WORD wUnused4; // leak
WORD wUnused5; // leak
WORD wUnused6; // leak
WORD wUnused7; // leak
WORD wUnused8; // leak
WORD wUnused9; // leak
lpSubResDscrTbl – value is actually a pointer to another RESDESCRTBL. Recursion occurs, so it does make sense to go deeper searching for more leaks.
So we have the mask (1 letter = 1 byte, X – Leak, ? - Valid): X X ? ? ? ? ? ? X X X X X X X X X X X X X X X X
Figure 25. Grayed out bytes are valid, other are suspected leak.
Type 5. Let’s also require condition wTypeFlags == 0x0005. If satisfied, following structure can be used:
HEADER
WORD wUnused1; // leak
WORD wUnused2; // leak
RESDESCFLAGS wType2;
WORD wUnused3; // leak
WORD wUnused4; // leak
WORD wUnused5; // leak
BYTE SaBase1[16]; // SAFEARRAY
BYTE SaBase2[16]; // SAFEARRAY
Mask for structure with size < 38:
X X X X ? ? X X X X X X ? ? ? ? ? ? ? ? ? ? ? ? …
Mask for structure with size >= 38:
X X X X ? ? X X X X X X ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? X X
Notice: We found a file that contains unreferenced RESDESCTBL, which includes Type 9 leak. It’s not possible to tell if this is a result of file modification or compiler can generate this by itself.
Pointers to this structure can be found in fields The Public Object Descriptor::PublicBytes and The Public Object Descriptor::StaticBytes for each of the programs objects (object in Visual Basic sense).
Figure 26. Type 5 leak
All COM objects descriptions were build in continuous memory block, which was allocated once, so all leaks should be counted as single one.
Most leaks are located on align bytes of structures and strings.
There are also some fields that is not filled depending on the object type.
Please refer to the source code in vbfile.cpp for details, as I left no detailed notes about detection of this type.
First of all determine whether the file was built as P-Code or Native.
If its Native, all method pointers are leaks.
For P-Code you need to check each of the pointers and see if address it points to present in the file. If yes, you need to check also if lpObjectInfo points to a parent (The Object Info).
Everything that is not valid on above conditions - leaks.
This buffer is allocated once, so it essential to determine its boundaries. To archive that we will crawl all the objects and record lowest and highest possible addresses of the array of pointers to object methods. After that we get the global minimum and maximum address.
Next step is to detect valid pointers, this could be done by checking if address exists in the binary. All other can be marked as leaks.
This leak is 2 bytes at most, so it completely useless. Still, it deserves a mention
At the build time Resource Directory structure’s field Major (WORD) will remain unset hence we have 2 bytes leak from original buffer.
Fun fact: this structure is duplicated many times (for each of Resource Directory). This means that leaked 2 bytes are the same for all such structures in file. This can be used to detect tampering.
A small remark about specifics of VB5/6 files in the matter of PE file format.
As far as I can tell, VB5/6 is the only compiler that fills “Timestamp” field in Resource Directory. This can help to determine real compilation time, since “Timestamp” in FileHeader is the first thing to tamper with.
Starting unknown version, compiler includes RICH Signature in output files. Together with the field wRuntimeBuild of EXEPROJINFO it may help to detect file tampering and exact version of compiler environment used.
As a result of research I created a GUI tool to extract interesting information from VB5/6 files. This includes leaks and some interesting fields.
Figure 27. VBParser interface.
Let’s look at examples I was able to find in the wild.
Figure 28 shows us username (Administrator) and computer name (WWWFOX-NET) of the machine used to build this sample.
Figure 28. Environment variables leak
Sometimes when PATH environment variables leaks, we are able to see which programs were installed. For example Figure 29 shows that user had GnuPG and OpenVPN on his system.
Figure 29. PATH leaked
Next example gives us information about installed devices. PCI pair of “VendorID-DeviceID” corresponds to “USB host controller”.
Figure 30. PCI device info leaked
Next few samples had system drive UUID leaked. Maybe it even can uniquely identify the computer used to build this file.
Figure 31. System drive UUID leaked #1
Figure 32. System drive UUID leaked #2
Another interesting one, looks like someone was building his project in the Virtual Machine and has File Sharing enabled.
Figure 33. Virtual Box path has leaked.
Sometimes we might be interested in user locale. It’s common to get different paths leaks that contain non Latin characters (“Sik Kullanilan» = «Favorites» in Turkish).
Figure 34. Locale specific path
And even parts of the source code can be found (maybe comments too?).
Figure 35. Form source code can be visible inside detected leak.
I hope this article will make you look again on your personal collection of VB5/6 malware and check if anything fun can be found in there. Share you findings!