# TASK: Export all project functions to a SQL database

PREREQUISITE: the "vbdec Operator Briefing" must already be loaded in this
context (see ./../_claude_vbdec_ai_instructions.txt).

GOAL: enumerate every function (CCodeBody) in the open vbdec project and write
its attributes to a SQL table `functions`. Two CSV columns capture call edges:
  - `callees`   = INTERNAL project functions called directly (depth-1 xrefs from)
  - `api_calls` = EXTERNAL calls: Win32 APIs, rtc* runtime, COM, late-bound

## Procedure

1. Bind the ROT object: Set o = GetObject("vbdec.vbp")
2. Build a name/VA -> CCodeBody lookup over o.CodeObjects -> .Methods2
   (needed to resolve callees). Also record each CodeObject's .Name and .sType.
3. For each CodeObject, for each CCodeBody:
   a. Collect the column values (see schema).
   b. Scan .Disasm line by line for direct-call mnemonics:
        \w*VCallHresult\w*   |   ImpAdCall\w+   |   LateMemCall
      Classify each callee operand per the briefing:
        - operand matches Proc_([0-9A-Fa-f]+) -> CodeBodyForVA2(<hex>) -> INTERNAL
        - operand is Module.Name resolvable in the step-2 lookup -> INTERNAL
        - anything else (rtc*, kernel32.*, COM, late-bound) -> EXTERNAL
      For LateMemCall, label the external call `late:<quoted method name>`.
      For other externals, use the operand trimmed at the first '(' or '['.
      `callees`   = sorted, de-duplicated CSV of INTERNAL callee display names.
      `api_calls` = sorted, de-duplicated CSV of EXTERNAL call labels.
      `callee_count` / `api_count` = distinct counts of each.
   c. Emit one INSERT row.
4. Write a CREATE TABLE + all INSERTs to a portable .sql script.
5. Build the .db: use `sqlite3` CLI if on PATH; otherwise build it with Python
   stdlib  ->  sqlite3.connect(db).executescript(open(sql).read())
   (sqlite3 ships with Python; no CLI or driver install needed).

## Schema  (table: functions)

  va            INTEGER PRIMARY KEY   -- virtual address, decimal
  va_hex        TEXT                  -- e.g. '471F38'
  display_name  TEXT                  -- resolved, e.g. 'CPdfParser.ParseFile'
  full_name     TEXT                  -- raw, e.g. 'CPdfParser.Proc_471F38'
  module        TEXT                  -- owning CodeObject name (.Name)
  module_type   TEXT                  -- 'Form' | 'Class' | 'Module' | 'Usercontrol' (.sType)
  sub_name      TEXT                  -- raw proc name
  public_name   TEXT                  -- recovered public name, if any
  user_name     TEXT                  -- analyst-assigned name, if any
  file_offset   INTEGER
  raw_addr      INTEGER
  size          INTEGER               -- bytes
  asm_lines     INTEGER
  frame_size    INTEGER
  const_pool    INTEGER
  idx           INTEGER               -- .index
  is_event      INTEGER               -- 0/1
  is_patched    INTEGER               -- 0/1
  is_obfuscated INTEGER               -- 0/1  (.isObsfuscated)
  truncated     INTEGER               -- 0/1  (disasm incomplete)
  marked        INTEGER               -- 0/1
  byte_crc      TEXT
  callee_count  INTEGER
  callees       TEXT                  -- CSV of internal callee display_names; '' if none
  api_count     INTEGER
  api_calls     TEXT                  -- CSV of external call labels; '' if none

## Output

- {Desktop}\vbdec_functions.sql   (portable script - always written)
- {Desktop}\vbdec_functions.db    (SQLite DB)

Print a summary: total functions, count with callee_count = 0 (leaf functions),
and the DB/SQL paths.

## Constraints

- Single `cscript //nologo` pass builds the .sql; delete the temp .vbs after.
- SQL string escaping: double every single quote ( ' -> '' ).
- Store the VA as BOTH integer (decimal, for the PK) and hex text.
- `callees` = INTERNAL functions only; `api_calls` = EXTERNAL only; '' when none.
- Booleans as 0/1 integers.
- Flag (don't drop) any callee operand you could not classify.

## Bonus query ideas (run these after building)

  -- who uses a given API (substring match on the CSV)
  SELECT display_name FROM functions WHERE api_calls LIKE '%RtlMoveMemory%';
  -- who shells out / spawns processes
  SELECT display_name FROM functions
   WHERE api_calls LIKE '%Shell%' OR api_calls LIKE '%CreateProcess%';
  -- who reaches the network
  SELECT display_name FROM functions WHERE api_calls LIKE '%URLDownload%';
  -- recursion: functions that list themselves among their own callees
  SELECT display_name FROM functions
   WHERE ',' || callees || ',' LIKE '%,' || display_name || ',%';
  -- who calls a given internal function
  SELECT display_name FROM functions
   WHERE ',' || callees || ',' LIKE '%,CPdfParser.ParseFile,%';
  -- biggest functions / busiest hubs
  SELECT display_name, asm_lines FROM functions ORDER BY asm_lines DESC LIMIT 20;
  SELECT display_name, callee_count + api_count AS fanout
    FROM functions ORDER BY fanout DESC LIMIT 20;
  -- leaf / pure-plumbing functions
  SELECT count(*) FROM functions WHERE callee_count = 0 AND api_count = 0;

  -- CAUTION: LIKE '%substr%' over-matches. '%Reg%' also hits GetFo[reg]roundWindow
  -- and I[Reg]Exp. Anchor the needle: use '%RegRead%', '%.Reg%', or wrap the
  -- column as ',' || api_calls || ',' and match a full token.
