How to Debug a Hard Fault on an Arm Cortex-M
Here are a few tips for identifying the cause of a hard fault and correcting it.
June 24, 2022
In my opinion, one of the worst, most-annoying faults to debug on an Arm Cortex-M microcontroller is a hard fault. If you are lucky, the hard fault appears after you’ve made some glaringly obvious mistake, and you can quickly undo it. I recently worked with a colleague who encountered a hard fault, but it was several commits deep, and I had no clue what could cause the fault. In this post, we’ll walk through the process I used to identify the cause and correct the hard fault.
An Imprecise Error Could Lead to a Hard Fault
When a hard fault occurs, embedded system developers have no choice but to dive into the depths of the microcontroller and examine the fault registers. The first register to examine on a deep dive is the Configurable Fault Status Register (CFSR). The CFSR is composed of three fault registers:
The MemManage Fault Status
The BusFault Status
The Usagefault Status
Together, these registers can help us start down the path to understanding why we have a fault.
Unfortunately, the values stored in these registers are not always conclusive or helpful, depending on the hard fault. For example, when I examined the value of the CFSR register, I discovered it was set to 0x400. Arm Developer details what the bits in the CFSR mean, providing a high-level register definition for CFSR. See here for additional details. A value of 0x400 is an imprecise error!
An imprecise error is an asynchronous fault, a bus fault that is forced due to a priority issue, disabling the fault, a memory access issue, or so forth. The problem with an imprecise fault is that you can’t trust that the other fault registers contain any direct or valuable information about the cause of the fault! That’s right, at this point, you’re in for reverting code or guessing and randomly trying different Band-Aids to try and fix the problem.
From Imprecise to Precise Errors
Thankfully, when you encounter an imprecise error causing your hard fault, all is not lost. The imprecise error may be caused by the CPU using an internal buffer to cache instructions. If the buffer is disabled, every instruction executed will be executed linearly. The result will be that the imprecise error turns into a precise error, and all the other fault registers may help identify the fault.
The steps to disable the buffer is straightforward. Developers can disable the write buffer by setting DISDEFWBUF in the ACTLR register. The code to do this looks something like the following:
SCnSCB->ACTLR |= SCnSCB_ACTLR_DISDEFWBUF_Msk;