Debugging Tools for Windows

Debugging an Itanium Firmware Failure

If a computer running an Itanium-based processor crashes, it is possible that the cause is a firmware failure.

Use the following procedure to analyze this problem:

  1. Get a stack trace. If the stack trace hangs partway through, that is a sign that a firmware failure may have occurred:

    kd>  kb 
    ...
    e0000165`993b2c50 e0000165`993b4840 e0000000`8306e8c0 ntkrnlmp!DbgBreakPointWithStatus+0x8
    e0000165`993b2c50 e0000165`993b4840 e0000000`8306ffe0 ntkrnlmp!KiBugCheckDebugBreak+0x40
    e0000165`993b2c90 e0000165`993b4718 e0000000`83070640 ntkrnlmp!KeBugCheck2+0x980
    e0000165`993b3800 e0000165`993b46e0 e0000000`83090310 ntkrnlmp!KeBugCheckEx+0x20
    e0000165`993b3800 e0000165`993b4680 e0000000`83087d40 ntkrnlmp!KiMemoryFault+0x3f0
    e0000165`993b3800 e0000165`993b4678 e0000165`985b63a4 ntkrnlmp!KiGenericExceptionHandler+0x290
    e0000165`993b3b40 e0000165`993b4648 e0000165`985b6360 0xe0000165`985b63a4
    e0000165`993b3b40 e0000165`993b4648 e0000165`985b6360 0xe0000165`985b6360

    DBGHELP: Can't find runtime function entry info for e0000165`985b6360, results might be unreliable!

    In this example, the stack trace failed in the middle of a C++ runtime function.

  2. Take a look at the bug check information:

    kd> .bugcheck 
    Bugcheck code 000000D1
    Arguments 60000165`97ec42d5 00000000`0000000f 00000000`00000001 e0000165`985b63a0

  3. Look up the bug check code parameters in the Bug Check Code Reference section. Pay particular attention to any parameters that specify the address of the fault. In this case, you have bug check code 0xD1 (DRIVER_IRQL_NOT_LESS_OR_EQUAL), whose fourth parameter indicates the address that caused the failure.

    Notice that this address, 0xE0000165`985B63A0, is within a few bytes of the address at which the stack trace failed.

    Also notice that the DbgHelp error message indicates that symbol information is unavailable at this same address.

  4. To further investigate this address, use the ln (List Nearest Symbols) command:

    kd> ln e0000165`985b63a0 

    In this example, the ln command has returned no nearby symbols.

  5. Use the !lmi extension to investigate the module containing this address:

    kd> !lmi e0000165`985b63a0 
    Loaded Module Info: [e0000165`985b63a0]
    e0000165985b63a0 is not a valid address.

    This address does not appear to be contained in a loaded module.

  6. Use the lm (List Loaded Modules) command to list all of the loaded modules, sorted by address. From the resulting display (not shown here), you see that this address is not present in a loaded module but falls between the following two modules:

    e0000165`9793c000 e0000165`979b2fe0   ks           (deferred)
    e0000165`98644000 e0000165`98693e60   mup          (deferred)

    By subtracting the end address of the ks module from the starting address of the mup module, you see that the size of the gap between these two modules is over 1 MB, which is large.

  7. Determine if the memory contains valid instructions by unassembling the code with the u (Unassemble) command:

    kd> u e0000165`985b63a0 L3 
    e0000165`985b63a0        mov.m  ar.ccv=r0 ;;
    e0000165`985b63a4        cmpxchg1.acq ret1=[r36], ret0, ar.ccv
    e0000165`985b63a8        nop.i  0 ;;

    The address does contain valid instructions. In particular, it contains a memory dereference. Unassemble the commands prior to this address:

    kd> ub e0000165`985b63a0 L30 
    ...
    e0000165`985b6340        alloc  r34=ar.pfs, 5, 0, 1, 0
    e0000165`985b6344        nop.f  0
    e0000165`985b6348        mov    r33=rp ;;
    ...

    You have found a function entry point.

Thus, you see that valid instructions in kernel address space that do not belong to any loaded module were executing. This can be explained in one of the following ways:

Because the first option is extremely improbable, the cause must be an Itanium-based-firmware failure.

Build machine: CAPEBUILD