The curious case of the ghostly modalias

I was finishing my morning coffee at the Fedora ARM mystery department when a user report came into my attention: the tpm_tis_spi driver was not working on a board that had a TPM device connected through SPI.

There was no /dev/tpm0 character device present in the system, even when the driver was built as a module and the Device Tree (DT) passed to the kernel had a node with a "infineon,slb9670" compatible string.

Peter Robinson chimed in and mentioned that he had briefly looked at this case before. The problem, he explained, is that the module isn’t auto-loaded but that manually loading it make things to work.

At the beginning he thought that this was just a common issue of a driver not having module alias information. This would lead to kmod not knowing that the module has to be loaded, when the kernel reported a MODALIAS uevent as a consequence of the SPI device being registered.

But when checking the module to confirm that theory, he found that there were alias entries:

$ modinfo drivers/char/tpm/tpm_tis_spi.ko | grep alias
alias:          of:N*T*Cgoogle,cr50C*
alias:          of:N*T*Cgoogle,cr50
alias:          of:N*T*Ctcg,tpm_tis-spiC*
alias:          of:N*T*Ctcg,tpm_tis-spi
alias:          of:N*T*Cinfineon,slb9670C*
alias:          of:N*T*Cinfineon,slb9670
alias:          of:N*T*Cst,st33htpm-spiC*
alias:          of:N*T*Cst,st33htpm-spi
alias:          spi:cr50
alias:          spi:tpm_tis_spi
alias:          acpi*:SMO0768:*

Since the board uses DT to describe the hardware topology, the TPM device should had been registered by the Open Firmware (OF) subsystem. And should cause the kernel to report a "MODALIAS=of:NspiTCinfineon,slb9670", which should had matched the "of:N*T*Cinfineon,slb9670" module alias entry.

But when digging more on this issue, things started to get more strange. Looking at the uevent sysfs entry for this SPI device, he found that the kernel was not reporting an OF modalias but instead a legacy SPI modalias: "MODALIAS=spi:slb9670".

But how come? a user asked, the device is registered using DT, not platform code! Where is this modalias coming from? Is this legacy SPI device a ghost?

Peter said that he didn’t believe in paranormal events and that there should be a reasonable explanation. So armed with grep, he wanted to get to the bottom of this but got preempted by more urgent things to do.

Coincidentally, I had chased down that same ghost before many moons ago. And it’s indeed not a spirit from the board files dimension but only an incorrect behavior in the uevent logic of the SPI subsystem.

The reason is that the SPI uevent handler always reports a MODALIAS of the form "spi:foobar" even for devices that are registered through DT. This leads to the situation described above and it’s better explained by looking at the SPI subsystem code:

static int spi_uevent(struct device *dev, struct kobj_uevent_env *env)
{
	const struct spi_device		*spi = to_spi_device(dev);
	int rc;

	rc = acpi_device_uevent_modalias(dev, env);
	if (rc != -ENODEV)
		return rc;

	return add_uevent_var(env, "MODALIAS=%s%s", SPI_MODULE_PREFIX, spi->modalias);
}

Conversely, this is what the platform subsystem uevent handler does (which properly reports OF module aliases):

static int platform_uevent(struct device *dev, struct kobj_uevent_env *env)
{
	struct platform_device	*pdev = to_platform_device(dev);
	int rc;

	/* Some devices have extra OF data and an OF-style MODALIAS */
	rc = of_device_uevent_modalias(dev, env);
	if (rc != -ENODEV)
		return rc;

	rc = acpi_device_uevent_modalias(dev, env);
	if (rc != -ENODEV)
		return rc;

	add_uevent_var(env, "MODALIAS=%s%s", PLATFORM_MODULE_PREFIX,
			pdev->name);
	return 0;
}

Fixing the SPI core would be trivial, but the problem is that there are just too many drivers and Device Trees descriptions that are relying on the current behavior.

It should be possible to change the core, but first all these drivers and DTs have to be fixed. For example, the I2C subsystem had the same issue but has already been resolved.

A workaround then in the meantime could be to add to the legacy SPI device ID table all the entries that are found in the OF device ID table. That way, a platform using for example a DT node with compatible "infineon,slb9670" will match against an alias "spi:slb9670", that will be present in the module.

And that’s exactly what the proposed fix for the tpm_tis_spi driver does.

$ modinfo drivers/char/tpm/tpm_tis_spi.ko | grep alias
alias:          of:N*T*Cgoogle,cr50C*
alias:          of:N*T*Cgoogle,cr50
alias:          of:N*T*Ctcg,tpm_tis-spiC*
alias:          of:N*T*Ctcg,tpm_tis-spi
alias:          of:N*T*Cinfineon,slb9670C*
alias:          of:N*T*Cinfineon,slb9670
alias:          of:N*T*Cst,st33htpm-spiC*
alias:          of:N*T*Cst,st33htpm-spi
alias:          spi:cr50
alias:          spi:tpm_tis_spi
alias:          spi:slb9670
alias:          spi:st33htpm-spi
alias:          acpi*:SMO0768:*

Until the next mystery!

A lethal spurious interrupt

A big part of my work on Fedora/RHEL is to troubleshoot and do root cause analysis across the software stack. Because many of these projects are decades old, this usually feels like being stuck somewhere between being an archaeologist and a detective.

Many bugs are boring but some are interesting, either because the investigation made me learn something new or due to the amount of effort that was sunk into figuring out the problem. So I thought that it would be a nice experiment to share a little about the ones that are worth mentioning. This is the first of such posts, I may write more in the future if have time and remember to do it:

It was a dark and stormy night when Peter Robinson mentioned a crime to me, a Rockpro64 board was found to not boot when the CONFIG_PCIE_ROCKCHIP_HOST option was enabled. He also already had found the criminal, it was the CONFIG_DEBUG_SHIRQ option.

I have to admit that I only knew CONFIG_DEBUG_SHIRQ by name and that it was a debug option for shared interrupts, but didn’t even know what this option was about. So the first step was to read the help text of the Kconfig symbol to learn more on this option.

Enable this to generate a spurious interrupt just before a shared interrupt handler is deregistered (generating one when registering is currently disabled). Drivers need to handle this correctly. Some don’t and need to be caught.

This was the smoking gun: a spurious interrupt!

We now knew what was the weapon used but we still had questions, why would triggering an interrupt lead to the board being hung? The next step then was to figure out where exactly this was happening, it certainly would have to be somewhere in the driver’s IRQ handler code path.

By looking at the pcie-rockchip-host driver code, we see two IRQ handlers registered: rockchip_pcie_subsys_irq_handler() for the "pcie-sys" IRQ and rockchip_pcie_client_irq_handler() for the "pcie-client" IRQ.

Adding some debug printouts to both would show us that the issue was happening in the latter, when calling to rockchip_pcie_read(). This function just hangs indefinitely and never returns.

Peter wrote in the filled bug that this issue was reported before and there was even an RFC patch posted by a Rockchip engineer, who mentioned the assessment of the problem:

With CONFIG_DEBUG_SHIRQ enabled, the irq tear down routine
would still access the irq handler register as a shared irq.
Per the comment within the function of __free_irq, it says
"It’s a shared IRQ — the driver ought to be prepared for
an IRQ event to happen even now it’s being freed". However
when failing to probe the driver, it may disable the clock
for accessing the register and the following check for shared
irq state would call the irq handler which accesses the register
w/o the clk enabled. That will hang the system forever.

The proposed solution was to check in the rockchip_pcie_read() function if a rockchip->hclk_pcie clock was enabled before trying to access the PCIe registers’ address space. But that wasn’t accepted because it was solving the symptom and not the cause.

But it did confirm our findings, that the problem was an IRQ handler being called before it was expected and that the PCIe register access hangs due to a clock not being enabled.

With all of that information and reading once more the pcie-rockchip-host driver code, we could finally reconstruct the crime scene:

  1. "pcie-sys" IRQ is requested and its handler registered.
  2. "pcie-client" IRQ is requested and its handler registered.
  3. probe later fails due to readl_poll_timeout() returning a timeout.
  4. the "pcie-sys" IRQ handler is unregistered.
  5. CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
  6. "pcie-client" IRQ handler is called for this spurious interrupt.
  7. IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
  8. the machine hangs because rockchip_pcie_read() call never returns.

The root cause of the problem then is that the IRQ handlers are registered too early, before all the required resources have been properly set up.

Our proposed solution then is to move all the IRQ initialization into a later stage of the probe function. That makes it safe for the IRQ handlers to be called as soon as they are registered.

Until the next mystery!

Collabora contributions to Linux kernel 4.1

Linux 4.1 was released last week and like previous kernel releases, this version again contains contributions made by Collabora engineers as a part of our current projects.

On this release, not only Collabora contributed several patches for different subsystems but also for the first time made it to LWN list of most active employers for a Linux kernel release and Tomeu Vizoso was listed as one of the most active developers by changed lines.

In total 68 patches were contributed to the 4.1 release. These were for:

  • Fixes and improvements to the DRM core atomic support.
  • Fix for Exynos DRM FIMD buffer size calculation.
  • More cleanup and fixes for Exynos DRM in preparation to finish porting the driver to support Atomic Mode Settings for v4.2
  • Add MMC/SDIO power sequencing to make the WiFi chip work on Snow and Peach Pit/Pi Chromebooks.
  • Various fixes and improvement for the ChromeOS Embeded Controller drivers.
  • Add default console serial port configuration for Exynos Chromebooks to avoid having to define a tty in the kernel command line.
  • Enable needed drivers in the Exynos default configuration file.
  • Fix error code propagation in the MMC power sequencing core.
  • Restore clocks needed during suspend for Exynos5 machines to prevent failing to resume.
  • Make the Mwifiex chip on Exynos Chromebooks to keep power during suspend to prevent the driver not allow the system to enter into a suspend state.
  • Fix output disable race on the Samsung PWM driver.
  • Split out touchpad initialisation logic on the Atmel MaxTouch driver.
  • Various enhancements for the Tegra ASoC driver.
  • Add a Device Tree to support the Nyan Blaze Chromebook and factor out common snippets with the Nyan Big Chromebook Device Tree.
  • Add trackpad, WiFi and GPIO restart support for the Nyan Chromebooks.
  • Add support for the Tegra124 Activity Monitor (ACTMON).
  • Fill EMC timings for Tegra Nyan Chromebooks.
  • Many fixes and improvements for the Tegra device frequency scaling driver.
  • Fix the Tegra DRM driver by resetting the SOR to a known state.
  • Enable many drivers in multi_v7 default configuration, that are needed by Tegra Chromebooks.
  • Fix the cros_ec keyboard driver to avoid loosing the key pressed during suspend to resume the system.
  • Fix USB not working on Tegra124 based boards.

Following is the complete list of patches merged in this kernel release:

Collabora contributions to Linux kernel 4.0

Linux 4.0 was released a couple of weeks ago and like previous kernel releases, this version again contains contributions made by Collabora engineers as a part of our current projects.

In total 59 patches were contributed to the 4.0 release. These were for:

  • Fix graphics DP and HDMI display for Exynos DRM.
  • More preparation work to add Atomic Mode Settings support to the Exynos DRM driver.
  • Add support for Power, Lid keys and built-in USB camera to Peach Pi/Pit and Snow Chromebooks.
  • Configure regulators operating modes on suspend for Peach Pi/Pit Chromebooks to reduce power consumption.
  • Add DISP1 power domain and related clocks to have proper display support on Exynos5420 machines.
  • Extend the MMC simple power sequencing provider to support a reference clock and more than one reset GPIO.
  • Fix various regressions in the common clock framework exposed by the per-user clock changes.
  • Fix S3C Real-Time-Clock that was not working on many Exynos SoCs.
  • Fix a bug in the regulator framework that tried to enable regulators that were already enabled.
  • Fix a reboot and poweroff hang on Exynos machines cause by a hang in the samsung serial driver.
  • Various fixes to the Samsung MFC driver.
  • Add support for the Exynos5422 Odroid XU3 board.
  • Enable needed Kconfig symbols on the exynos, omap2plus and multi_v7 defconfigs.
  • Add a devfreq driver for the Tegra Activity Monitor

Following is the complete list of patches merged in this kernel release:

Collabora contributions to Linux kernel 3.19

Linux 3.19 was released last week and this version again contains contributions made by Collabora engineers as a part of our current projects.

In total 60 patches were contributed to the 3.19 release. These were for:

  • Work in the Intel i915 DRM driver to support atomic plane updates.
  • Fixes and removal of unnecessary layers in the Exynos DRM driver as a preparation to add atomic mode settings support.
  • Add support to the regulator framework to define regulators initial and suspend operating modes.
  • Changes to the max77802 regulator driver to support regulator operating modes changes.
  • Fix the Real-Time-Clock in the Snow, Peach Pit and Pi Chromebooks.
  • Enable kernel config options to have display panel working on Exynos boards.
  • Allow the regulator drivers to be enabled and disabled when Exynos system enters and leave suspend states.
  • Many cleanup and fixes to the common clock framework and clock drivers as a preparation for the per-user clock API change.

Following is the complete list of patches merged in this kernel release:

Collabora contributions to the Linux kernel 3.18

Linux 3.18 was released this week and like in previous kernel releases, it contains contributions made by Collabora engineers as a part of the projects involving the Linux kernel.

In total 45 patches were contributed to the 3.18 release. These were for:

  • Cleanups for the Intel i915 DRM driver
  • Various cleanups for the max77686 clock, rtc and regulator drivers.
  • Adding max77802 clock, rtc and regulator drivers.
  • Fixing regmap DT endianess parsing logic.
  • Improving the power model in the Exynos5 Peach Pit and Pi Chromebooks DT.
  • Using the regulator_get_voltage() function to get the mmc OCR mask.
  • Adding max77802 PMIC, ISL29018 sensor and atmel touchpad for Peach boards DT.
  • Setting the correct clock rate for i2c7 in Exynos5 Peach Pit and Pi DT.
  • Enable atmel touchpad, cgroups, sbs battery and atmel touchpad in exynos defconfig.
  • Fixing variable initialization for different regulator drivers.
  • Fixing MFC v5 support in the s5p-mfc driver.
  • Making module autoloading to work for i2c cros-ec-tunnel and cros_ec_keyb drivers.
  • Explicitly configure USB dual role mode as host for Exynos boards.
  • Adding a PM_QOS_MEMORY_BANDWIDTH pm_qos class.
  • Enabling gcov-based kernel profiling for ARM

Following is the complete list of patches merged in this kernel release:

An update on Device Tree support for IGEP boards

Linux v3.13 has been released a couple of days ago and this is the first kernel version that has complete Device Tree support for IGEP boards.

Since the initial DT support that I talked before, the following peripherals were added:

Ethernet
NAND flash
– USB HOST and OTG
Wifi/BT combo (support added by Enric Balletbo)

The only remaining bit in DT is video since the DT bindings for the OMAP Display Sub-System (DSS) have not landed in mainline yet so display support is still provided using the OMAP platform data quirk infrastructure.

So, this make the IGEP the first OMAP3 device to complete the transition to Device Tree based booting and the board file could finally be removed.

The migration from board files to Device Tree booting was not trivial as I initially thought since OMAP GPMC and GPIO DT support was not mature enough so I had to add support for GPMC DT ethernet and fix some annoying bugs.

U-Boot backend support for OSTree

Recently I had to work with OSTree which I think is a very promising and interesting project.

The OSTree site advertises it as

a tool for managing bootable, immutable, versioned filesystem trees

and it uses a combination of chroot and hard links to provide atomic upgrades and rollback of filesystem trees. You can refer to the project’s online manual and README for information about the motivation behind OSTree and its implementation.

When updating the kernel and initial RAM disk images OSTree creates Boot Loader Specification entries. These are drop-in snippets that define a kernel and init ramdisk that have to be used and boot arguments that the bootloader has to pass to the kernel command line on boot.

Currently only gummiboot and GRUB have support for these boot loader snippets.

We wanted to use OSTree on embedded devices and since most boards use U-Boot as their boot loader I needed to figure out what was the best approach to add a U-Boot support for OSTree.

Obviously this would have been to add boot loader spec entries support to U-Boot but there are two issues with this approach:

a) U-Boot does not currently support multi-boot

Since U-Boot is designed for embedded it just boots a single default operating system while the boot loader specification was designed for multi-boot. You can drop any number of snippets under /boot/loader/entries and the boot loader should populate its boot menu.

b) Not every vendor allows you to replace the boot loader

Some vendors do not allow to replace the boot loader binaries (i.e: store it on a read-only memory), implements DRM that requires binaries to be signed or are using a very old U-Boot version which would require to backport this support.

So, the solution was to make OSTree to generate boot information instead that could be used by any already deployed U-Boot. The same approach is used on OSTree to support syslinux.

U-Boot allows to modify boards default boot commands by reading and executing a bootscript file (boot.scr) or importing a plain text file (uEnv.txt) that contains environment variables that could parameterize its default boot command or a bootscript.

So, on deploy or upgrade OSTree reads the Boot Loader Specification snippets files and generates a uEnv.txt file that contains the booting information in a format that could be understood by U-Boot.

The uEnv.txt env var file is created in the path /boot/loader.${bootversion}/uEnv.txt. Also, a /boot/uEnv.txt symbolic link to loader/uEnv.txt is created so U-Boot can always import the file from a fixed path.

Since U-Boot does not support a menu to list a set of Operative Systems, the most recent boot loader entry from the list is used.

To boot an OSTree using the generated uEnv.txt file, a board has to parameterize its default boot command using the following variables defined by OSTree:

    ${kernel_image}:  path to the Linux kernel image
    ${ramdisk_image}: path to the initial ramdisk image
    ${bootargs}:      parameters passed to the kernel command line

Alternatively, for boards that don’t support this scheme, a bootscript that overrides the default boot command can be used.

An example of such a bootscript could be:

    setenv scriptaddr 0x40008000
    setenv kernel_addr 0x40007000
    setenv ramdisk_addr 0x42000000
    load mmc 0:1 ${scriptaddr} uEnv.txt
    env import -t ${scriptaddr} ${filesize}
    load mmc 0:1 ${kernel_addr} ${kernel_image}
    load mmc 0:1 ${ramdisk_addr} ${ramdisk_image}
    bootz ${kernel_addr} ${ramdisk_addr}

For more information you can refer to the commit that added U-Boot as backend and OStree test for U-Boot.

I would like to thank my employer Collabora for sponsoring this work.

mmap support for Raspberry Pi bcm2835 ALSA driver

Because I work on an awesome company I spent last week improving the Raspberry Pi ALSA driver.

A long standing issue was that the driver did not support memory-mapped I/O mode for audio stream transfers.

ALSA supports two transfers methods for PCM playback: Read/Write transfer where samples are written to the device using standard read and write functions and Direct Read/Write transfers where samples can be written directly to a mapped memory area and the driver is signaled once this has been done.

ALSA provides an API for both cases and each application using ALSA to access the audio device can choose which API to use. But since mmap was not supported by the bcm2835 driver, applications using the mmap API did not work on the Raspberry Pi.

To bypass hardware limitation such as the lack of mmap support, ALSA provides a set of PCM plug-ins that can be used to extend functionality and features of PCM devices.

These plug-ins are used by defining a custom /etc/asound.conf or .asoundrc and one of them is the mmap emulation plug-in (mmap_emul) which emulates mmap access using a set of read/write transfers. This plug-in was needed on the Raspberry Pi to allow applications using ALSA mmap API to work. But of course it introduces some latency and requires a custom configuration to enable it.

So, the best approach was to just add mmap support to the ALSA driver and get rid of the mmap emulation plug-in. Normally this is straightforward since the kernel allows memory areas to be mapped to user-space address space so applications can have direct access to device memory.

This was not the case on the Raspberry Pi since the PCM samples are processed by its VideoCore IVI GPU. The ARM11 CPU communicates with the VideoCore co-processor using a message passing interface (vchiq) which means that the CPU is not able to directly address the audio device hardware buffers and audio samples have to be sent to the device using vchiq messages.

Since hardware buffers can’t be directly mapped to user-space memory, an intermediate buffer is needed. This intermediate buffer is mapped to user-space so applications can store the audio samples there. Once user-space has finished writing the PCM samples they are pushed to VideoCore as vchiq messages.

Fortunately, the kernel ALSA subsystem provides a PCM indirect API which are a set of helper functions and a data structure to manage the intermediate and hardware buffers. It is really helpful so you don’t have to write all the buffer management.

So, after figuring out all of this I wrote a patch to add mmap support to the Raspberry Pi ALSA driver and send a pull request that also contained other improvements to the driver.

Since it has been merged on the Raspberry Pi kernel, this feature will be available on the next raspbian release but if you want to use it now you can do the following:

$ git clone git://github.com/raspberrypi/tools.git
$ git clone git://github.com/raspberrypi/linux.git
$ cd linux
$ git checkoub -b rpi-3.6.y origin/rpi-3.6.y
$ export ARCH=arm
$ export CROSS_COMPILE=../tools/arm-bcm2708/arm-bcm2708-linux-gnueabi/bin/arm-bcm2708-linux-gnueabi-
$ make bcmrpi_defconfig
$ make prepare modules_prepare
$ make M=sound/arm
$ sudo cp sound/arm/snd-bcm2835.ko /media/rootfs/lib/modules/3.6.11+/kernel/sound/arm/
$ sudo rm /media/rootfs/etc/asound.conf

The last step is very important since using both a mmap capable ALSA driver and the PCM mmap emulation plug-in does not work.

I want to thank my employer Collabora for allowing me to work on this: