The idea to write self-modifying code

21 July 2020 — Written by Eiffel
#C#code

My memories are not that clear, but I remembered to hear that self-modifying code was forbidden because it was used to write viruses. On one hand, this was not false because the data, heap and stack parts of an executable cannot be executed. Indeed, the pages used for these data are marked specially so they cannot be executed. Note that, on x86_64, this was made possible by the No eXecution (NX) bit introduced by AMD with the K8.

On the other hand, Just In Time (JIT) compilation can be viewed as self-modifying code. So, how does JIT compiler generate code on the fly and execute it with the NX bit? They know what they do and they use mmap! With this function, it is possible to create an anonymous mapping which can be executed.

Today, I will present you a toy snippet that writes and executes self-modifying code. The idea is to first define the code to execute. Then, we need to allocate memory with mmap and write the opcode of instructions into it to finally jump on the memory address to get the instructions decoded and executed by the CPU.

Code generation

The generated code will just set rax to a given value, say 42! First, we need to define it:

#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <string.h>

// This opcode is needed to use MOV with 64 bits registers.
#define REX_W 0x48

// This opcode translates to: mov rax,
#define MOV_RAX 0xb8

// Immediate value is 42
#define IMM_VALUE_0 0x2a
#define IMM_VALUE_1 0x00
#define IMM_VALUE_2 0x00
#define IMM_VALUE_3 0x00
#define IMM_VALUE_4 0x00
#define IMM_VALUE_5 0x00
#define IMM_VALUE_6 0x00
#define IMM_VALUE_7 0x00
#define IMM_VALUE IMM_VALUE_0

/*
 * This opcode permits returning from function.
 * We need this otherwise we will receive a SEGFAULT.
 */
#define RET 0xc3

/*
 * The whole opcode is 11 bytes long:
 * one for REX_W,
 * one for MOV_RAX,
 * eight for IMM_VALUE since it is ont 64 bits
 * and one for RET.
 */
#define NR_BYTES 11

The code value is "48b82a00000000000000c3". It actually contains two opcodes, so two instructions:

  1. The first opcode is "48b82a00000000000000". It corresponds to instruction mov rax, 42 which stores value 42 into register rax. The opcode of this instruction can be divided in 3 parts:

    1. The first part is "48", it corresponds to REX.W flag, taking a look at section 3.1.1.1 of Intel documentation informs us that we need to set this to use “old instruction”, like move, with 64 bits registers.
    2. The second one is "b8", page 4-35 of the documentation shows this opcode must be used with REX.W and a 64 bits immediate.
    3. The last part, "2a00000000000000", is the 64 bits immediate which translates to value 42, in small endian.
  2. The second one is "c3", this translates to ret instruction, as depicted page 4-553 of the documentation, which return from a function. This part is really important because we will call the code as a function. Without this, the function will not return and continue into forbidden access part which leads to a SEGFAULT.

The whole code is 11 bytes long, one for REX_W, one for MOV_RAX, eight for value and one for RET.

Create the mapping

Now, that we have the code, we need to allocate memory with mmap. Here comes the rest of the code:

/**
 * Store opcode into mmaped memory and execute it.
 * @return EXIT_SUCCESS if the opcode was executed, EXIT_FAILURE if there was a
 * problem.
 */
int main(void){
	int ret;

	char *addr;

	long rax;

	unsigned int i;

	ret = EXIT_SUCCESS;

	if((addr = mmap(NULL, NR_BYTES, PROT_READ | PROT_WRITE | PROT_EXEC,
		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0)) == MAP_FAILED){
		perror("mmap");

		ret = EXIT_FAILURE;

		goto end;
	}

	// Set RAX to zero.
	zero_rax();

	rax = read_rax();

	// RAX must be 0.
	if(rax != 0){
		fprintf(stderr, "Before code execution, rax must be 0 but is %ld\n", rax);

		ret = EXIT_FAILURE;

		goto clean;
	}

	printf("Before code execution, rax is: %ld\n", rax);

	i = 0;

	// Store instructions into memory.
	addr[i++] = REX_W;
	addr[i++] = MOV_RAX;
	addr[i++] = IMM_VALUE_0;
	addr[i++] = IMM_VALUE_1;
	addr[i++] = IMM_VALUE_2;
	addr[i++] = IMM_VALUE_3;
	addr[i++] = IMM_VALUE_4;
	addr[i++] = IMM_VALUE_5;
	addr[i++] = IMM_VALUE_6;
	addr[i++] = IMM_VALUE_7;
	addr[i++] = RET;

	// Cast this address function pointer and call it.
	((void (*)(void)) addr)();

	// Read RAX just after our code executed.
	rax = read_rax();

	// RAX must be 42.
	if(rax != IMM_VALUE){
		fprintf(stderr, "After code execution, rax must be %d but is %ld\n",
						IMM_VALUE, rax);

		ret = EXIT_FAILURE;

		goto clean;
	}

	printf("After code execution, rax is: %ld\n", rax);

clean:
	if(munmap(addr, NR_BYTES)){
		perror("munmap");

		ret = EXIT_FAILURE;
	}

end:
	return ret;
}

To gain some place, I skip the definition of functions zero_rax, which set rax to 0, and read_rax, to get the register value. These function uses the asm volatile syntax that I already covered in a previous post.

First, we call mmap with MAP_ANONYMOUS flag and PROT_EXEC protection flag to get an anonymous executable mapping. After that, I write the opcodes into this mapping. Once the code is written, we cast our mapping to a function pointer and call it! The code executes and once the function returns, thanks to RET instruction, rax contains 42! Finally, we remove the mapping with munmap before exiting.

Before I conclude, I would like to address some problems with my code. The first is how I write the code to the mapping. The way I did that is not spotless, but this was the first, and dirty, idea that crosses my mind... Another better solution can be to use variadic function as done in this code. The other problem is the cast from a data pointer to a function pointer, this is not advised by the standard but implemented by compilers.

To conclude, this is possible to write self-modifying code if you know that you want to do to that. This is actually used by JIT compiler. If you want to go farther, I advise you check this post and this repository.