The curious case of the ghostly modalias

I was finishing my morning coffee at the Fedora ARM mystery department when a user report came into my attention: the tpm_tis_spi driver was not working on a board that had a TPM device connected through SPI.

There was no /dev/tpm0 character device present in the system, even when the driver was built as a module and the Device Tree (DT) passed to the kernel had a node with a "infineon,slb9670" compatible string.

Peter Robinson chimed in and mentioned that he had briefly looked at this case before. The problem, he explained, is that the module isn’t auto-loaded but that manually loading it make things to work.

At the beginning he thought that this was just a common issue of a driver not having module alias information. This would lead to kmod not knowing that the module has to be loaded, when the kernel reported a MODALIAS uevent as a consequence of the SPI device being registered.

But when checking the module to confirm that theory, he found that there were alias entries:

$ modinfo drivers/char/tpm/tpm_tis_spi.ko | grep alias
alias:          of:N*T*Cgoogle,cr50C*
alias:          of:N*T*Cgoogle,cr50
alias:          of:N*T*Ctcg,tpm_tis-spiC*
alias:          of:N*T*Ctcg,tpm_tis-spi
alias:          of:N*T*Cinfineon,slb9670C*
alias:          of:N*T*Cinfineon,slb9670
alias:          of:N*T*Cst,st33htpm-spiC*
alias:          of:N*T*Cst,st33htpm-spi
alias:          spi:cr50
alias:          spi:tpm_tis_spi
alias:          acpi*:SMO0768:*

Since the board uses DT to describe the hardware topology, the TPM device should had been registered by the Open Firmware (OF) subsystem. And should cause the kernel to report a "MODALIAS=of:NspiTCinfineon,slb9670", which should had matched the "of:N*T*Cinfineon,slb9670" module alias entry.

But when digging more on this issue, things started to get more strange. Looking at the uevent sysfs entry for this SPI device, he found that the kernel was not reporting an OF modalias but instead a legacy SPI modalias: "MODALIAS=spi:slb9670".

But how come? a user asked, the device is registered using DT, not platform code! Where is this modalias coming from? Is this legacy SPI device a ghost?

Peter said that he didn’t believe in paranormal events and that there should be a reasonable explanation. So armed with grep, he wanted to get to the bottom of this but got preempted by more urgent things to do.

Coincidentally, I had chased down that same ghost before many moons ago. And it’s indeed not a spirit from the board files dimension but only an incorrect behavior in the uevent logic of the SPI subsystem.

The reason is that the SPI uevent handler always reports a MODALIAS of the form "spi:foobar" even for devices that are registered through DT. This leads to the situation described above and it’s better explained by looking at the SPI subsystem code:

static int spi_uevent(struct device *dev, struct kobj_uevent_env *env)
{
	const struct spi_device		*spi = to_spi_device(dev);
	int rc;

	rc = acpi_device_uevent_modalias(dev, env);
	if (rc != -ENODEV)
		return rc;

	return add_uevent_var(env, "MODALIAS=%s%s", SPI_MODULE_PREFIX, spi->modalias);
}

Conversely, this is what the platform subsystem uevent handler does (which properly reports OF module aliases):

static int platform_uevent(struct device *dev, struct kobj_uevent_env *env)
{
	struct platform_device	*pdev = to_platform_device(dev);
	int rc;

	/* Some devices have extra OF data and an OF-style MODALIAS */
	rc = of_device_uevent_modalias(dev, env);
	if (rc != -ENODEV)
		return rc;

	rc = acpi_device_uevent_modalias(dev, env);
	if (rc != -ENODEV)
		return rc;

	add_uevent_var(env, "MODALIAS=%s%s", PLATFORM_MODULE_PREFIX,
			pdev->name);
	return 0;
}

Fixing the SPI core would be trivial, but the problem is that there are just too many drivers and Device Trees descriptions that are relying on the current behavior.

It should be possible to change the core, but first all these drivers and DTs have to be fixed. For example, the I2C subsystem had the same issue but has already been resolved.

A workaround then in the meantime could be to add to the legacy SPI device ID table all the entries that are found in the OF device ID table. That way, a platform using for example a DT node with compatible "infineon,slb9670" will match against an alias "spi:slb9670", that will be present in the module.

And that’s exactly what the proposed fix for the tpm_tis_spi driver does.

$ modinfo drivers/char/tpm/tpm_tis_spi.ko | grep alias
alias:          of:N*T*Cgoogle,cr50C*
alias:          of:N*T*Cgoogle,cr50
alias:          of:N*T*Ctcg,tpm_tis-spiC*
alias:          of:N*T*Ctcg,tpm_tis-spi
alias:          of:N*T*Cinfineon,slb9670C*
alias:          of:N*T*Cinfineon,slb9670
alias:          of:N*T*Cst,st33htpm-spiC*
alias:          of:N*T*Cst,st33htpm-spi
alias:          spi:cr50
alias:          spi:tpm_tis_spi
alias:          spi:slb9670
alias:          spi:st33htpm-spi
alias:          acpi*:SMO0768:*

Until the next mystery!

A lethal spurious interrupt

A big part of my work on Fedora/RHEL is to troubleshoot and do root cause analysis across the software stack. Because many of these projects are decades old, this usually feels like being stuck somewhere between being an archaeologist and a detective.

Many bugs are boring but some are interesting, either because the investigation made me learn something new or due to the amount of effort that was sunk into figuring out the problem. So I thought that it would be a nice experiment to share a little about the ones that are worth mentioning. This is the first of such posts, I may write more in the future if have time and remember to do it:

It was a dark and stormy night when Peter Robinson mentioned a crime to me, a Rockpro64 board was found to not boot when the CONFIG_PCIE_ROCKCHIP_HOST option was enabled. He also already had found the criminal, it was the CONFIG_DEBUG_SHIRQ option.

I have to admit that I only knew CONFIG_DEBUG_SHIRQ by name and that it was a debug option for shared interrupts, but didn’t even know what this option was about. So the first step was to read the help text of the Kconfig symbol to learn more on this option.

Enable this to generate a spurious interrupt just before a shared interrupt handler is deregistered (generating one when registering is currently disabled). Drivers need to handle this correctly. Some don’t and need to be caught.

This was the smoking gun: a spurious interrupt!

We now knew what was the weapon used but we still had questions, why would triggering an interrupt lead to the board being hung? The next step then was to figure out where exactly this was happening, it certainly would have to be somewhere in the driver’s IRQ handler code path.

By looking at the pcie-rockchip-host driver code, we see two IRQ handlers registered: rockchip_pcie_subsys_irq_handler() for the "pcie-sys" IRQ and rockchip_pcie_client_irq_handler() for the "pcie-client" IRQ.

Adding some debug printouts to both would show us that the issue was happening in the latter, when calling to rockchip_pcie_read(). This function just hangs indefinitely and never returns.

Peter wrote in the filled bug that this issue was reported before and there was even an RFC patch posted by a Rockchip engineer, who mentioned the assessment of the problem:

With CONFIG_DEBUG_SHIRQ enabled, the irq tear down routine
would still access the irq handler register as a shared irq.
Per the comment within the function of __free_irq, it says
"It’s a shared IRQ — the driver ought to be prepared for
an IRQ event to happen even now it’s being freed". However
when failing to probe the driver, it may disable the clock
for accessing the register and the following check for shared
irq state would call the irq handler which accesses the register
w/o the clk enabled. That will hang the system forever.

The proposed solution was to check in the rockchip_pcie_read() function if a rockchip->hclk_pcie clock was enabled before trying to access the PCIe registers’ address space. But that wasn’t accepted because it was solving the symptom and not the cause.

But it did confirm our findings, that the problem was an IRQ handler being called before it was expected and that the PCIe register access hangs due to a clock not being enabled.

With all of that information and reading once more the pcie-rockchip-host driver code, we could finally reconstruct the crime scene:

  1. "pcie-sys" IRQ is requested and its handler registered.
  2. "pcie-client" IRQ is requested and its handler registered.
  3. probe later fails due to readl_poll_timeout() returning a timeout.
  4. the "pcie-sys" IRQ handler is unregistered.
  5. CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
  6. "pcie-client" IRQ handler is called for this spurious interrupt.
  7. IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
  8. the machine hangs because rockchip_pcie_read() call never returns.

The root cause of the problem then is that the IRQ handlers are registered too early, before all the required resources have been properly set up.

Our proposed solution then is to move all the IRQ initialization into a later stage of the probe function. That makes it safe for the IRQ handlers to be called as soon as they are registered.

Until the next mystery!