Dumping packed executables using minidumps

misc

15 Dec 2014

I’m using this method regularly for about a year now and found it to be very reliable against most common packers that unpack the binary to memory while being very convenient at the same time. Minidumps can be created programmatically using MS’ DbgHelp API or simply using a right-click on a process in the task-manager. In case a packer would ever decide to detect the minidump creation using, for example, a ring3 rootkit, you could still run the target executable in a VM and cause a bluescreen to obtain a crash-dump (a few additional steps would be required here). The obtained minidump can then be processed for further analysis using WinDBG.

Dumping

Create minidump using task-manager (Details -> Rightclick on target process -> Create Dump File)
Load dump into kd/WinDBG (in case you do not have it at hand already, obtain it here)
lmm <module name>
.writemem C:\my\cool\bin\path.dll <start-address> L?(<end-address> - <start-address>)

After successfully extracting the desired modules from the dump, you will have to open them in a PE-editor of your choice and set all SizeOfRawData entries in the section table to match the value of the corresponding VirtualSize field. This is required because many packers tend to set incorrect values for SizeOfRawData as it is rarely accessed after the loader is done with it’s work, however when remapping or loading it into software like IDA, incorrect values result in truncated/messed up sections if not fixed.

Fixing up the Executable

To perform some automation for more convenience, I wrote the following PyKD-script that extracts arbitrary modules from a dump:

py
from pykd import dbgCommand, module
import os
import sys
import struct
if len(sys.argv) < 2:
    print 'Usage: %s output_dir module1 [module2] [...]' % sys.argv[0]
    exit()
output_dir = sys.argv[1]
dump_mods = sys.argv[2:]
# dump modules
for cur_mod in dump_mods:
    print 'Dumping', cur_mod, '...'
    mod = module(cur_mod)
    dbgCommand('.writemem ' + os.path.join(output_dir, mod.name()) + ' '
        + format(mod.begin(), '08x') + ' L?(' + format(mod.end(), '08x') + ' - '
        + format(mod.begin(), '08x') + ')'
    )
    # fix PE file
    fullpath = os.path.join(output_dir, cur_mod)
    with open(fullpath, 'rb') as f:
        pe = bytearray(f.read())
    # 3c    offsetof(IMAGE_DOS_HEADER, e_lfanew) DWORD
    # f8    sizeof(IMAGE_NT_HEADERS) size_t
    # 6     offsetof(IMAGE_NT_HEADERS, FileHeader.NumberOfSections) WORD
    # 34    offsetof(IMAGE_NT_HEADERS, OptionalHeader.ImageBase) DWORD
    # a0    offsetof(IMAGE_NT_HEADERS, OptionalHeader.DD[<BASERELOC>]) DWORD
    # 8     sizeof(DATA_DIR) size_t
    # 0     offsetof(IMAGE_SECTION_HEADER, Name) CHAR[8]
    # 28    sizeof(IMAGE_SECTION_HEADER) size_t
    # 10    offsetof(IMAGE_SECTION_HEADER, SizeOfRawData) DWORD
    # 14    offsetof(IMAGE_SECTION_HEADER, PointerToRawData) DWORD
    # 8     offsetof(IMAGE_SECTION_HEADER, Misc.VirtualSize) DWORD
    # 0c    offsetof(IMAGE_SECTION_HEADER, VirtualAddress) DWORD
    lfanew = struct.unpack('<I', pe[0x3c:0x3c + 4])[0]
    # fix image base
    imagebase_ptr = lfanew + 0x34
    pe[imagebase_ptr:imagebase_ptr + 4] = bytearray(struct.pack('<I', mod.begin()))
    print 'New imagebase:', format(mod.begin(), '08x')
    # unlink reloc directory, if any
    pe[lfanew + 0xa0:lfanew + 0xa0 + 8] = '\x00' * 8
    num_sections = struct.unpack('<H', pe[lfanew + 6:lfanew + 6 + 2])[0]
    sec_ptr = lfanew + 0xf8
    print 'Sections:'
    for i in xrange(num_sections):
        try:
            print struct.unpack('8s', pe[sec_ptr:sec_ptr + 8])[0]
        except UnicodeDecodeError:
            # trash section, zero entry..
            pe[sec_ptr:sec_ptr + 0x28] = '\x00' * 0x28
            print 'Dropped trash section ...'
        else:
            print 'Fixing SizeOfRawData and PointerToRawData ...'
            # SizeOfRawData = VirtualSize
            pe[sec_ptr + 0x10:sec_ptr + 0x10 + 4] = pe[sec_ptr + 8: sec_ptr + 8 + 4]
            # PointerToRawData = VirtualAddress - ImageBase
            virtual_address = struct.unpack('<I', pe[sec_ptr + 0xc: sec_ptr + 0xc + 4])[0]
            pe[sec_ptr + 0x14:sec_ptr + 0x14 + 4] = struct.pack('<I', virtual_address)
        sec_ptr += 0x28
    with open(fullpath, 'wb') as f:
        f.write(pe)
print 'Done!'

Usage:

ps
> .load pykd.pyd
> !py C:\my\script\path\dump_mods.py <output-dir> <module1> [<module2> [...]]

This script also performs some other actions like dropping trash-sections generated by vicious packers and some other things to make loading dumps into IDA easier. However, most packers also wreck the IAT, so I wrote another script that runs in WinDBG and creates an IDC script to be launched in IDA, renaming the IAT entries to the correct corresponding API. You will have to locate the IAT’s boundaries manually using IDA.

py
import sys
import re
from pykd import ptrDWord, dbgCommand, disasm, SymbolException, findSymbol
if len(sys.argv) < 3:
    print 'Usage: %s IAT_start_addr IAT_end_addr [idc_batch_target_path]' \
          % (sys.argv[0])
    exit()
# http://stackoverflow.com/a/7771363/1075818
def int_overflow(val):
  if not -sys.maxint-1 <= val <= sys.maxint:
    val = (val + (sys.maxint + 1)) % (2 * (sys.maxint + 1)) - sys.maxint - 1
  return val
# findSymbolAndDisp seems to be glitchy
def get_symbol_and_offs(addr):
    raw = dbgCommand('ln 0x' + format(addr, '08X')).replace('`', '')
    if len(raw.strip()) == 0:
        raise SymbolException('unable to find symbol\'s name')
    m = re.search(
        r'\s*\(([a-f0-9]{8,16})\)\s+(.+?)\s*\|\s*\(([a-f0-9]{8,16})\)\s+(.+?)\s*\n',
        raw
    )
    assert m, 'parsing of "ln" response failed'
    def process_data(funcaddr, sym):
        name_m = re.search(r'(.*?)\+0x([a-f0-9]+)', sym)
        disp = 0
        if not name_m is None:
            sym = name_m.group(1)
            disp = int(name_m.group(2), 16)
        return sym, (int(funcaddr, 16) - addr) + disp
    symbols = (
        process_data(m.group(1), m.group(2)),
        process_data(m.group(3), m.group(4))
    )
    return min(symbols, key=lambda x: abs(x[1]))
def get_export(addr):
    sym = None
    try:
        sym = get_symbol_and_offs(addr)
    except SymbolException:
        pass
    if sym is None or sym[1] != 0:
        return None
    return sym[0]
def resolve_entry(addr):
    # check if procedure is exported already
    export = get_export(addr)
    if not export is None:
        return export
    d = disasm(addr)
    rel_jmp_target = None
    cur_inst = None
    while rel_jmp_target is None        \
            or rel_jmp_target < 1000    \
            or cur_inst == 'call':
        # TODO: fix op2
        m = re.match(
            r'(?P<addr>[0-9a-f]{8})\s+' +
            r'(?P<bytecode>[0-9a-f]+)\s+' +
            r'(?P<inst>\S+)\s*' +
            r'(?:' +
                r'(?P<op1>[^,]+)' +
                r'(?:\s*,\s*' +
                    r'(?P<op2>.+)' +
                r')?' +
            r')?\s*' +
            r'(?:(?:e|c|s|d|f|g)s:[0-9a-f]{4}.*)?', # strip annotations
            d.disasm()
        )
        if m is None:
            raise RuntimeError('couldn\'t dissect instruction')
        cur_inst = m.group('inst')
        bytecode = []
        raw_bytecode = m.group('bytecode')
        for i in range(0, len(raw_bytecode), 2):
            bytecode.append(int(raw_bytecode[i:i + 2], 16))
        if bytecode[0] in (0xE9, 0xE8):
            inst_addr = int(m.group('addr'), 16)
            rel_jmp_target = int(
                ''.join(format(cur, '02x') for cur in bytecode[-1:0:-1]),
                16
            )
            jmp_target = inst_addr + 5 + int_overflow(rel_jmp_target)
            # do not follow API calls
            if rel_jmp_target < 1000:
                d = disasm(jmp_target)
        else:
            rel_jmp_target = None
        if bytecode[0] in (0xC2, 0xC3):
            raise RuntimeError('function is inlined')
    try:
        return get_symbol_and_offs(jmp_target)[0]
    except SymbolException:
        raise RuntimeError('couldn\'t resolve symbol (%s)' % (
            findSymbol(jmp_target)
        ))
start_addr = int(sys.argv[1], 0)
end_addr = int(sys.argv[2], 0)
iat_syms = []
try:
    dbgCommand('.symopt-2')
    dbgCommand('!sym quiet')  # mute kd's stupid symbols errors
    while start_addr < end_addr:
        addr = ptrDWord(start_addr)
        if addr != 0:
            try:
                name = resolve_entry(addr)
            except RuntimeError as e:
                print 'Unable to unobfuscate function at %08X: %s' % (
                    addr, str(e)
                )
            else:
                if not name is None:
                    print '%08X %s' % (start_addr, name)
                    iat_syms.append((start_addr, name.split('!')[1]))
                else:
                    print 'Unable to unobfuscate function at %08X' % (addr)
        start_addr += 4
finally:
    dbgCommand('.symopt+2')
if len(sys.argv) > 3:
    with open(sys.argv[3], 'w') as f:
        f.write('#include <idc.idc>\nstatic main() {')
        f.write('\n'.join('MakeName(0x%08X, "%s");' % x for x in iat_syms))
        f.write('}')
print 'Leaving ..'

Example usage:

ps
> .load pykd.pyd
> !py C:\my\script\path\fix_iat.py <IAT-start-addr> <IAT-end-addr> C:\my\fancy\out\path\batch.idc

After that, simply batch the generated IDC script into IDA using “File -> Script File” from the menu bar. The script also performs a limited amount of code-flow tracing that defeats Themida’s API-redirects if they are not completely inlined in the redirect stub (what is this Themida feature called again?). However, this only works because Themida currently only generates unconditional branches in these stubs. It’s also possible to completely defeat the API redirects, however this is more complex and beyond the scope of this article.

Update 16.Dec.2014

Fixed dumps_mods.py to also work with WinDBG x86.

Update 12.Jan.2015

Updated dump_mods.py to be more generic.