Zydis v4.0 released!

release

20 Nov 2022

Today we are happy to announce the release of Zydis v4.0!

Simplified Disassembler API

We introduced new all-in-one functions for decoding and formatting an instruction in a single call without the need to initialize a decoder or formatter up-front. This simplifies using Zydis in cases where performance isn’t of utmost concern and you just want to decode a few instructions on the quick.

c
// Decode and format instruction to human-readable assembly in one call.
ZydisDisassembledInstruction insn;
ZydisDisassembleIntel(
    /* machine_mode:    */ ZYDIS_MACHINE_MODE_LONG_64,
    /* runtime_address: */ 0,
    /* buffer:          */ "\x55",
    /* length:          */ 1,
    /* instruction:     */ &insn
);
assert(insn.info.mnemonic == ZYDIS_MNEMONIC_PUSH);
assert(strcmp(insn.text, "push rbp") == 0);

For those who prefer AT&T syntax, just swap out Intel with ATT.

Encoding Instructions

Zydis now supports encoding instructions, allowing users to generate new code on the fly or to rewrite existing one with less effort than ever before.

Generate New Instructions

c
ZydisEncoderRequest req = 
{
    .machine_mode = ZYDIS_MACHINE_MODE_LONG_64,
    .mnemonic = ZYDIS_MNEMONIC_MOV,
    .operand_count = 2,
    .operands = 
    {
        {
            .type = ZYDIS_OPERAND_TYPE_REGISTER,
            .reg.value = ZYDIS_REGISTER_RAX,
        },
        {
            .type = ZYDIS_OPERAND_TYPE_IMMEDIATE,
            .imm.u = 0x1337,
        }
    }
};
ZyanU8 insn_bytes[ZYDIS_MAX_INSTRUCTION_LENGTH];
ZyanUSize insn_length = sizeof(insn_bytes);
ZydisEncoderEncodeInstruction(&req, insn_bytes, &insn_length);
assert(insn_length == 7);
assert(memcmp(insn_bytes, "\x48\xc7\xc0\x37\x13\x00\x00", 7) == 0);

This example code encodes a mov instruction from scratch, writing the instruction bytes into insn_bytes and placing the instruction length into insn_length.

Rewriting Code

c
// mov rcx, qword ptr ds:[0x1234]
ZyanU8 mov_bytes[] = "\x48\x8B\x0C\x25\x34\x12\x00\x00";
// Decode and print the original instruction.
ZydisDisassembledInstruction insn;
ZydisDisassembleIntel(ZYDIS_MACHINE_MODE_LONG_64, 0, mov_bytes, sizeof(mov_bytes), &insn);
assert(strcmp(insn.text, "mov rcx, [0x0000000000001234]") == 0);
// Convert the decoded instruction into an encoder request.
ZydisEncoderRequest req;
ZydisEncoderDecodedInstructionToEncoderRequest(
    &insn.info, insn.operands, insn.info.operand_count, &req);
// Change a few things about it.
req.operands[0].reg.value = ZYDIS_REGISTER_RSI;
req.operands[1].mem.displacement += 0x10000;
// Encode the changed instruction.
ZyanU8 new_insn[ZYDIS_MAX_INSTRUCTION_LENGTH];
ZyanUSize new_insn_length = sizeof(new_insn);
ZydisEncoderEncodeInstruction(&req, new_insn, &new_insn_length);
// Disassemble and print new instruction again.
ZydisDisassembleIntel(ZYDIS_MACHINE_MODE_LONG_64, 0, new_insn, new_insn_length, &insn);
assert(strcmp(insn.text, "mov rsi, [0x0000000000011234]") == 0);

This example decodes a mov instruction, converts the decoded instruction into an encoder request, changes the register and displacement values and then lastly encodes it back to binary code. The final disassembly step is included merely for demonstration purposes, showing that the instruction was indeed changed as requested.

Split Operand Decoding

Zydis allows users to not only inspect explicit operands, but also implicit ones. Implicit operands are operands that are not printed in the human-readable assembly generated by the formatter, but are still inspected or changed by the CPU when the instruction is executed. A prominent example for an instruction with many implicit operands is pushad which essentially pushes all general purpose registers onto the stack despite having zero explicit operands.

In v3, the ZydisDecodedInstruction structure contained a field

c
ZydisDecodedOperand operands[ZYDIS_MAX_OPERAND_COUNT];

where ZYDIS_MAX_OPERAND_COUNT was defined as 10, the worst-case assumption for the instruction with the maximum number of implicit operands. While this was convenient, it also caused significant avoidable bloat of the instruction structure, sometimes causing issues when it was allocated on stack in environment where stack space is restricted (e.g. kernel threads). Oftentimes, users were not even interested in the visible operands, only wanting to inspect the mnemonic and instruction length in the majority of cases.

In Zydis v4, operand decoding is now optional:

c
ZyanU8 jmp_bytes[] = "\xE9\xAB\x00\x00\x00";
ZydisDecoder decoder;
ZydisDecoderInit(&decoder, ZYDIS_MACHINE_MODE_LONG_64, ZYDIS_STACK_WIDTH_64);
ZydisDecodedInstruction insn;
ZydisDecoderContext ctx;
ZydisDecoderDecodeInstruction(&decoder, &ctx, jmp_bytes, sizeof(jmp_bytes), &insn);
// Only decode operands if we're actually interested in the mnemonic.
if (insn.mnemonic == ZYDIS_MNEMONIC_JMP) 
{
    // Only decoder visible operands.
    ZydisDecodedOperand operands[ZYDIS_MAX_OPERAND_COUNT_VISIBLE];
    ZydisDecoderDecodeOperands(&decoder, &ctx, &insn, operands, 
        ZYDIS_MAX_OPERAND_COUNT_VISIBLE);
        
    assert(operands[0].imm.value.s == 0xAB);
}

For users that are looking for a way to achieve something close to the previous behavior, the convenience function ZydisDecoderDecodeFull is offered:

c
ZyanU8 jmp_bytes[] = "\xE9\xAB\x00\x00\x00";
ZydisDecoder decoder;
ZydisDecoderInit(&decoder, ZYDIS_MACHINE_MODE_LONG_64, ZYDIS_STACK_WIDTH_64);
ZydisDecodedInstruction insn;
ZydisDecodedOperand operands[ZYDIS_MAX_OPERAND_COUNT];
ZydisDecoderDecodeFull(&decoder, jmp_bytes, sizeof(jmp_bytes), &insn, operands);
assert(insn.mnemonic == ZYDIS_MNEMONIC_JMP);
assert(operands[0].imm.value.s == 0xAB);

Simplified Formatter API

The formatter API previously had an Ex variant of each function whose only difference was that it had an additional user_data argument. This resulted in unnecessary duplication and bloat of the public interface, so we decided to just add the user_data argument to the regular functions. Users that don’t wish to pass additional context to the formatter can simply pass NULL.

Amalgamated Builds

We are now publishing amalgamated builds of the library for every version. These builds essentially combine all header files into a single Zydis.h and all source files into a single Zydis.c, making it very easy to link against Zydis by just copying it into your project.

Amalgamated builds can also be created manually by running the assets/amalgamate.py script in the Zydis repository.

Porting Guide

Because v4 contains a range of breaking changes to the API, we offer a porting guide explaining the required changes to help making the migration process less painful.

Credits

A huge thanks goes to Mappa, who contributed pretty much the entire implementation of the instruction encoder!