Search This Blog

Friday, September 16, 2011

Using APCs to inject your DLL

A few blogs ago I discussed remote threads on Windows 7. This topic was targeted towards the goal of injecting one's DLL into processes.

There are several ways of injecting your code inside any other processes:

  • Via SetWindowsHookEx
  • By using the APP_Init key in the registry
  • With CreateRemoteThread & NtCreateThreadEx
  • You could also do it from a driver by replacing the main thread's entry point with your shell code.
Those techniques are increasingly difficult to put in practice. They do work but sometimes, it's not enough or it may not work on some OS. For instance CreateRemoteThread only works half of the time on Windows 7 because of the different sessions used by applications and services.

Hooking NtCreateThread in a driver and inject your DLL that way won't work on Windows 7 either since NtCreateThread is no longer used by NtCreateProcess. Not to mention that some of that stuff won't work on Windows 64. For instance, NtCreateThreadEx() doesn't use the same structure in 32bit and 64bit. It took me half a day to figure out the proper way of injecting a DLL on Win64.

That time could have been saved if I had used a proper/cleaner way of injecting my DLL.

Enters APC.

In this article you can see how APCs are used in Windows (2000, XP, 7) so I won't go over it again.

The basic idea is that in order to inject our DLL, we will use an APC and queue it for the process. Quite obviously, this has to be done in a driver.

When the target program starts, our driver can be notified via a callback (See PsSetCreateProcessNotifyRoutine) and it can also be notified whenever a module loads (see PsSetLoadImageNotifyRoutine ).

As the module loading callback is called, we can wait for NTDLL.DLL to be loaded since it is the first DLL that will be automatically loaded for every process on the system.

Another reason to wait for NTDLL to be loaded is because we can parse the PE headers and find out the user mode address for LdrLoadDLL. You could do a GetProcAddress(NULL, "LoadLibraryA") and pass it down to your driver but with ASLR this could potentially cause problems.

So, in the callback, we wait for the NTDLL to load and then we obtain the address of LdrLoadDLL.

Here's the code to find out the address of a function in a given DLL.

/**
  * This function is like GetProcAddress()
 * ImageBase is the address of the mapped DLL
  * ImageSize is the size of the DLL
  * FunctionName is the API we are looking for
 *
  * Before looking through the PE headers, we need to map the DLL in memory because
  * it may not be fully mapped. By creating a MDL, we can take care of this.
  */

PVOID GetProcAddress(PVOID ImageBase, DWORD ImageSize, const char* FunctionName)
{    PVOID pFunc = NULL;
    PIMAGE_DOS_HEADER DosHeader = NULL;
    PIMAGE_NT_HEADERS NtHeader = NULL;
    PIMAGE_EXPORT_DIRECTORY pIed = NULL;
    PIMAGE_DATA_DIRECTORY ExportDataDir;
    PIMAGE_EXPORT_DIRECTORY ExportDirectory;
    PVOID LoadAddress = NULL;
    PULONG FunctionRvaArray;
    PUSHORT OrdinalsArray;
    PULONG NamesArray;
    ULONG Index;
    PMDL vMem = NULL;

    __try {
        vMem = IoAllocateMdl(ImageBase, ImageSize, FALSE, FALSE, NULL);
        if (vMem != NULL)
        {
            ULONG ByteCount = 0;
            LoadAddress =  MmGetMdlVirtualAddress(vMem);
            DosHeader = (PIMAGE_DOS_HEADER) LoadAddress;
            ByteCount = MmGetMdlByteCount(vMem);

            MmProbeAndLockPages(vMem, UserMode,  IoReadAccess);
        }
    }
    __except(EXCEPTION_EXECUTE_HANDLER)
    {
        if (vMem != NULL)
            IoFreeMdl(vMem);

        DbgPrint("Unable to read memory");
        return NULL;
    }

    //
    // Peek into PE image to obtain exports.
    //
    NtHeader = ( PIMAGE_NT_HEADERS ) PtrFromRva( DosHeader, DosHeader->e_lfanew );
    if( IMAGE_NT_SIGNATURE != NtHeader->Signature )
    {
        //
        // Unrecognized image format.
        //
        return NULL;
    }

    ExportDataDir = &NtHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
    ExportDirectory = ( PIMAGE_EXPORT_DIRECTORY ) PtrFromRva(LoadAddress, ExportDataDir->VirtualAddress);

    if ( ExportDirectory->AddressOfNames == 0 ||
         ExportDirectory->AddressOfFunctions == 0 ||
         ExportDirectory->AddressOfNameOrdinals == 0 )
    {
        //
        // This module does not have any exports.
        //
        return NULL;
    }

    FunctionRvaArray = ( PULONG ) PtrFromRva(LoadAddress, ExportDirectory->AddressOfFunctions);
    OrdinalsArray = ( PUSHORT ) PtrFromRva(LoadAddress, ExportDirectory->AddressOfNameOrdinals);
    NamesArray = ( PULONG) PtrFromRva(LoadAddress, ExportDirectory->AddressOfNames);

    for ( Index = 0; Index < ExportDirectory->NumberOfNames; Index++ )
    {
        //
        // Get corresponding export ordinal.
        //
        USHORT Ordinal = ( USHORT ) OrdinalsArray[ Index ] + ( USHORT ) ExportDirectory->Base;

        //
        // Get corresponding function RVA.
        //
        ULONG FuncRva = FunctionRvaArray[ Ordinal - ExportDirectory->Base ];

        if ( FuncRva >= ExportDataDir->VirtualAddress && 
             FuncRva < ExportDataDir->VirtualAddress + ExportDataDir->Size )
        {
            //
            // It is a forwarder.
            //
        }
        else
        {
            //
            // It is an export.
            //
            ULONG FunctionNamePointer = (ULONG) LoadAddress + NamesArray[Index];
            const char* pszName = (const char*) FunctionNamePointer;
            if (strcmp(pszName, FunctionName) == 0)
            {
                pFunc = (PVOID) ((ULONG) LoadAddress + FuncRva);
                break;
            }
        }
    }

    __try {
        if (vMem != NULL)
        {
            MmUnlockPages(vMem);                            
            IoFreeMdl(vMem);
        }
    }
    __except(EXCEPTION_EXECUTE_HANDLER)
    {
        DbgPrint("Unable to read memory");
    }

    return pFunc;
}

By now, your callback knows that NTDLL has been loaded and with the help of the code above, we obtained the address for LdrLoadDLL.

Notice in the function above that the begining consists of mapping the DLL into memory. The DLL is loaded but may or may not be fully mapped in memory. Failure to do it will result into not finding the API we want since the PE headers are not in memory yet.

Now, we need to prepare for the APC.
First, we need a dummy Kernel APC routine since it is required by the queuing call.

/**
  * This is dummy Kernel APC Routine
  *
  */
VOID KernelApcRoutine (
    IN PKAPC Apc,
    IN PKNORMAL_ROUTINE *NormalRoutine,
    IN PVOID *NormalContext,
    IN PVOID *SystemArgument1,
    IN PVOID *SystemArgument2)
{
    UNREFERENCED_PARAMETER( SystemArgument1 );
    UNREFERENCED_PARAMETER( SystemArgument2 );

    DbgPrint("User APC is being delivered - Apc: %p\n", Apc);
    if (PsIsThreadTerminating( PsGetCurrentThread() ))
    {
        *NormalRoutine = NULL;
    }

    ExFreePoolWithTag(Apc, DIRECT_KERNEL_ALLOC_TAG);
}


That code will be called when the APC is processed and it will delete the memory allocated for the APC object.

Next, we need a function to create the APC and queue it. First our new function needs to allocate some memory in the target process. Since the callback is called in the context of the target, we can simply use NtCurrentProcess() to specify what process the memory will be allocated into.

...
    ZwAllocateVirtualMemory( NtCurrentProcess(),
                                               &context,
                                               0,
                                               &contextSize,
                                               MEM_COMMIT, PAGE_READWRITE);

...

The {context} is a structure that you will define. You can store the address of LdrLoadDLL inside of it as well as the name of the DLL you want to inject.

Then you need to allocate memory for the APC object, which you can do using:

    ExAllocatePoolWithTag(NonPagedPool, sizeof(KAPC), 'tag');

Now, you must initialize the APC, which you do with the following call:

...
     KeInitializeApc(apc, KeGetCurrentThread(),
                              OriginalApcEnvironment,
                              (PKERNEL_ROUTINE) KernelApcRoutine,
                              NULL,
                              InjectionShellCode,
                              UserMode, context);
...

The context is the one that we just allocated and which will be passed as a parameter to the user mode routine.

For the actual routine, you can go two ways. You can create some shell code in assembly and insert the opcodes inside some array of memory, or you can create a function in your own code. If you write a function inside your source code, you will have to make sure that it does not call any windows APIs.

The only thing your function should do is to create a UNICODE string manually (meaning, no call to RtlUnicodeStringInit() ).

Once the APC is initialized, you can queue it using:

...
     KeInsertQueueApc( apc, NULL, NULL, 0);
...

Here's how LdrLoadDLL is called:

...
     pfnLdrLoadDll(NULL,           // No name
                             0,                
                             &pDLL,         // full path here as a unicode string
                             &handle);
...

Basically, what will happen is the following:

  1. Process is created
  2. Module Load Callback is called
  3. Callback check if the module is NTDLL.DLL
  4. Callback retrieve the address of LdrLoadDll() from NTDLL.DLL
  5. Allocate the APC user mode routine context
  6. Allocate memory in the target process to hold the shellcode
  7. Allocate memory for the APC
  8. Initialize the APC (user mode routine points to the shell code)
  9. Queue the APC
  10. The rest of the modules get loaded and your APC gets processed
  11. The user mode routine, loads the DLL using LdrLoadDll.
  12. Main thread is created
  13. Process starts
Hopefully, all this should get you going.

Happy coding!

6 comments:

  1. That's a really interesting post. Could you share the source for that?

    ReplyDelete
  2. I wrote the code while working in a previous company therefore the code is theirs. I might have some personal code that I kept around and I'll look for it.
    There is a very interesting article that I read at the time and that helped me a lot: http://www.rsdn.ru/article/baseserv/InjectDll.xml

    If you don't speak Russian, you can translate via Google (although via Bing works a bit better). This should give you a rough idea and there is some code in there as well.

    ReplyDelete
  3. Hey Manu, don't know if you're still active on this blog but I wanted to know is there any other alternative as opposed to having to resort to shellcode for the NormalRoutine? Just adding a function, in the code of your driver, will obviously be at a kernel address. I don't really get the point, on papers I've read, how malware can "cleanly" inject a DLL from KernelMode using APCs. This is anything but clean. The Russian blog post you referenced uses an interesting technique, where he sets up his NormalRoutine in a DLL, exports it, and retrieves it but this is also far from clean (requires at least 2 exrternal dlls). Was just curious if that or shellcode isn't the only option for the NormalRoutine

    ReplyDelete
  4. The shell is only necessary for 32bit processes. For 64bit processes, your APC routine can be written directly in your driver code:

    64bit routine:
    /* User mode routine
    * This is allocated in the process' memory and copied
    */
    void NTAPI ApcLoadDLL(LPLDR_CONTEXT ctx, PVOID SystemArgument1, PVOID SystemArgument2) {
    UNREFERENCED_PARAMETER(SystemArgument1);
    UNREFERENCED_PARAMETER(SystemArgument2);
    HANDLE Module = NULL;

    ctx->LdrLoadDll(NULL, 0, &ctx->dllPath, &Module);
    return;
    }

    32bit routine:

    UCHAR x86shellCode[] = {
    //"\xcc" // Break Point
    "\x55" // push ebp
    "\x8b\xec" // mov ebp, esp
    "\x8b\x45\x08" // mov eax, ULONG32 ptr [ebp+8]
    "\x83\xc0\x0c" // add eax,0Ch
    "\x8b\xf4" // mov esi,esp
    "\x50" // push eax
    "\x8b\x4d\x08" // mov ecx, ULONG32 ptr[ebp+08]
    "\x83\xc1\x04" // add ecx, 4
    "\x51" // push ecx
    "\x6a\x00" // push 0
    "\x6a\x00" // push 0
    "\x8b\x55\x08" // mov edx, ULONG32 [ebp+8]
    "\x8b\x42\x10" // mov eax, ULONG32 [edx+8]
    "\xff\xd0" // call eax Note: No need to clean the stack after the call
    "\x5d" // pop ebp
    "\xc3" // ret
    "\x90\x90\x90" // NOP
    };

    The code will run in the context of the process (in user mode obviously). In 64bit you can just copy the code verbatim (RtlCopyMemory() works well for that or just memcpy()).

    Since the driver is compiled in 64bit, you can't just use that for the 32bit normal routine because WOW64 will interpret the assembly incorrectly.

    Essentially, you have no choice but having shell code as the normal routine. I agree that it's not clean per se but CreateRemoteThread() works more or less the same way.

    ReplyDelete
    Replies
    1. Thanks for your reply. I was doing it on an x64 system and my NormalRoutine looks almost identical to yours. I would also RtlCopyMemory it to an allocated region in the target process. But every time it is about to get executed, the real ntdll!LrdLoadDll would cause a memory access violation. Well, I made some adjustments and now LdrLoadDll runs, but the process doesn't start (I'm doing all of this in an ImageCallback notify routine. The callback looks like this:

      if (Pid) {

      if (ImageName->Buffer && ImageName->Length > 2) {
      UNICODE_STRING Ntdll = RTL_CONSTANT_STRING(L"\\SystemRoot\\System32\\ntdll.dll");
      if (RtlEqualUnicodeString(ImageName, &Ntdll, TRUE)) {
      auto ProcessName = PsGetProcessImageFileName(IoGetCurrentProcess());
      if (strcmp(ProcessName, "notepad.exe") != 0)
      return;

      DWORD Rva;
      Status = FILEOPS::GetNtdllFnAddress("LdrLoadDll", &Rva);
      if (!NT_SUCCESS(Status))
      return;

      PLdrContext UserContext {};
      SIZE_T szLdrContext { sizeof(LdrContext) };
      Status = ZwAllocateVirtualMemory(NtCurrentProcess(),
      (PVOID*)&UserContext,
      0,
      &szLdrContext,
      MEM_COMMIT,
      0x40);
      if (!NT_SUCCESS(Status))
      return;

      PBYTE InjectionShellcode {};
      szLdrContext = 0x1000;
      Status = ZwAllocateVirtualMemory(NtCurrentProcess(),
      (PVOID*)&InjectionShellcode,
      0,
      &szLdrContext,
      MEM_COMMIT,
      0x40);
      if (!NT_SUCCESS(Status))
      return;
      RtlCopyMemory(InjectionShellcode,
      (PBYTE) LdrInjectDll,
      0x500);


      UserContext->Pid = Pid;
      UserContext->NtdllBase = ImageInfo->ImageBase;
      UserContext->LdrLoadDll = (_LdrLoadDll) ((ULONG_PTR) UserContext->NtdllBase + Rva);
      DbgPrint("LdrLoadDll: 0x%p\n", UserContext->LdrLoadDll);

      RtlInitUnicodeString(&UserContext->DllName,
      L"\\SystemRoot\\System32\\HelloDll.dll");


      Status = InitializeAndQueueApc(KeGetCurrentThread(),
      UserMode,
      KernelCleanup,
      (PKNORMAL_ROUTINE)InjectionShellcode,
      nullptr,
      UserContext);
      }
      }


      }

      Any ideas as to why the process won't load? I guess I'll have to trace from LdrLoadDll in ntdll fully to see what's causing the problem

      Delete
  5. This comment has been removed by the author.

    ReplyDelete