Introduction

Previously I explored how to create and inject a DLL, but it was mainly using LoadLibrary for injection. For a better understanding of what’s happening behind the scenes I decided to manually implement this. This also has an added advantage: a manually mapped DLL will never be registered in the module list, making it a stealthier method when combined with other evasion techniques.

The full implementation is available on GitHub

Thoughts

I would like to go through this blog covering some of the things I had a hard time understanding, in a Q&A format, because I think having a solid understanding of manual mapping can help a lot with reverse engineering, malware analysis, and game hacking.

Why does architecture mismatch (64-bit injector → 32-bit target) produce garbage PE values?

This was one of my biggest debugging struggles and a practical lesson for anyone building this. I didn’t separate the architecture checks at the beginning.

When you compile the injector as 64-bit, IMAGE_NT_HEADERS and IMAGE_OPTIONAL_HEADER default to their 64-bit variants (IMAGE_NT_HEADERS64, IMAGE_OPTIONAL_HEADER64). Reading a 32-bit PE file through 64-bit structures misaligns every field.

What is OriginalFirstThunk vs FirstThunk and why do we need both?

One of the trickiest parts of the shellcode is mapping the import table to resolve functions. Both point to a list of functions the DLL needs from an imported module. The difference is:

OriginalFirstThunk — read only, contains the function names/ordinals. This is what you read to know which functions to look up. FirstThunk — the actual IAT (Import Address Table). This is where you write the resolved function addresses.

Why do we need shellcode at all? — Why can’t the injector just fix imports and relocations itself before writing to the target?

The shellcode runs inside the target process where it can resolve imports and call DllMain — operations that must happen in-process. The injector can’t do this from outside because any function addresses it calls would be in its own address space, not the target’s. This is being done automatically when LoadLibrary is called in regular Dll injection.

Why does the shellcode need LoadLibraryA and GetProcAddress passed to it? — Why can’t it just call them directly?

The shellcode has no import table of its own, so it has no way to resolve APIs at runtime. Pointers to LoadLibraryA and GetProcAddress are passed in explicitly so the shellcode can use them to load dependencies and resolve all other function addresses.

Why do relocations need to be applied at all? — What actually breaks if you skip them?

Relocations are needed when the DLL’s preferred base address is unavailable. The compiler bakes absolute addresses into the binary assuming it loads at ImageBase. If Windows places it elsewhere, those hardcoded addresses are wrong and the DLL will crash. The relocation table records every location that needs to be patched and by how much.

Why do we need RVA-to-file-offset conversion? — Why isn’t the RVA just usable directly in the file buffer?

From

typedef struct _IMAGE_SECTION_HEADER {
    BYTE  Name[8];              // ".text", ".data"
    DWORD VirtualSize;          
    DWORD VirtualAddress;       
    DWORD SizeOfRawData;       
    DWORD PointerToRawData;     
    // ... relocation fields ...
    DWORD Characteristics;      
} IMAGE_SECTION_HEADER;

The data directory gives us an RVA but doesn’t tell us which section contains it, so we loop through all section headers comparing each section’s VirtualAddress range until we find the one that owns our RVA. In this case we pass in the relocation data directory’s RVA, which the loop finds lives inside the .reloc section. Once found, PointerToRawData tells us where the section starts in the file, but not where our specific data is within it. rva - VirtualAddress gives us the relative position — how far into the section our data is, in bytes. That distance is the same whether you’re measuring from VirtualAddress or PointerToRawData, so adding it to PointerToRawData gives us the exact file offset.

Why do we process relocations before copying sections to the target process?

The .reloc section contains a list of locations inside other sections that have hardcoded absolute addresses baked in by the compiler. Those addresses assume the DLL loads at its preferred ImageBase. If Windows places it at a different address those hardcoded values are all wrong and the DLL will crash. We patch them in pSrcData first, then copy the corrected data to the target. If we copied first and patched later, the wrong addresses would already be written into the target process.

Why does blockVA + offset point into other sections and not .reloc itself?

Because .reloc is purely a map — it describes locations inside .text, .data, .rdata etc. where the compiler baked in absolute addresses. The patching happens at those locations, not inside .reloc. You read from .reloc to know what to fix, then write to wherever the entry points:

Reading:  pRelocData (inside .reloc)     ← where the map lives
Writing:  pSrcData + blockVA + offset    ← inside .text, .data etc.

Conclusion

I made sure to implement this manually (with a little co-pilot assistance) and understand what’s going on at each step. Although I still had questions when writing this article, working through them gave me a much better understanding of how the Windows loader works under the hood.

Manual DLL Injection