Every once in a while I have to explain these concepts to someone so I thought that it could be something worth to write about.
Device drivers and devices registration
The Linux kernel documentation covers quite well how device drivers and devices are registered and how these two are bound. But the summary is that drivers and devices are registered independently and each of these specify their given bus type. The Linux kernel device model then uses that information to bind drivers with devices of the same bus type.
Drivers and devices are registered using the driver_register() function which is usually called from either the drivers’ module_init() function or platform code.
Devices are registered using the register_device() function which is usually called by subsystems that parses a list of devices from some hardware topology description, some enumerable bus or platform code that hardcodes the devices to be registered.
Drivers and device matching (binding)
When a driver is registered for a given bus, the list of devices registered for that bus is iterated to find a match.
In the same manner, when a device is registered, the list of drivers registered for the same bus is iterated to find a match.
That way, it doesn’t matter the order in which drivers and devices are registered. They a device will be bound to a driver regardless of which one was registered first.
Drivers’ probe callback
If a match is found, the driver’s probe callback is executed. This function handler contains the driver-specific logic to bind the driver with a device and any setup needed for this. The probe function returns 0 if the driver could be bound to the device successfully or a negative errno code if the driver was not able to bound the device.
Probe deferral
A special errno code -EPROBE_DEFER
is used to indicate that the bound failed because a driver could not provide all the resources needed by the device. When this happens, the device is put into a deferred probe list and the probe is retried again at a later time.
That later time is when a new driver probes successfully. When this happens, the device deferred probe list is iterated again and all devices are tried to bind again with their matched driver.
If the newly probed driver provides a resource that was missing by drivers whose probe was deferred, then their probe will succeed this time and their bound devices will be removed from the deferred list.
If all required resources are provided at some point, then all drivers should probe correctly and the deferred list should become empty.
It is a simple and elegant (albeit inefficient) solution to the fact that drivers and devices registration are non-deterministic. This leads to drivers not having a way to know if a resource won’t ever be available or is just that the driver that would provide the resource has just not probed yet.
But even if the kernel probed the drivers in a deterministic order (i.e: by using device dependency information), the driver would have no way to know if for example the missing resource would be provided by a driver that was built as a kernel module and would be loaded much later by user-space or even manually by an operator.
Module device tables
Each driver provides information about what devices can be matched against and usually this information is provided on a per firmware basis. For example, a driver that supports devices registered using both ACPI and Device Tree hardware descriptions, will contain separate ID tables for the ACPI and OpenFirmware (OF) devices that can be matched.
To illustrate this, the drivers/input/touchscreen/hideep.c has the following device ID tables:
static const struct i2c_device_id hideep_i2c_id[] = {
{ HIDEEP_I2C_NAME, 0 },
{ }
};
MODULE_DEVICE_TABLE(i2c, hideep_i2c_id);
#ifdef CONFIG_ACPI
static const struct acpi_device_id hideep_acpi_id[] = {
{ "HIDP0001", 0 },
{ }
};
MODULE_DEVICE_TABLE(acpi, hideep_acpi_id);
#endif
#ifdef CONFIG_OF
static const struct of_device_id hideep_match_table[] = {
{ .compatible = "hideep,hideep-ts" },
{ }
};
MODULE_DEVICE_TABLE(of, hideep_match_table);
#endif
Module aliases
If information defined in these tables are exported using the MODULE_DEVICE_TABLE()
macro, then these will be in the drivers kernel modules as alias entries in the module information.
For example, one can check the module aliases for a given module using the modinfo
command, i.e:
$ modinfo drivers/input/touchscreen/hideep.ko | grep alias
alias: i2c:hideep_ts
alias: acpi*:HIDP0001:*
alias: of:N*T*Chideep,hideep-tsC*
alias: of:N*T*Chideep,hideep-ts
Here are listed the legacy I2C platform, ACPI and OF devices that are exported by MODULE_DEVICE_TABLE(i2c, hideep_i2c_id)
, MODULE_DEVICE_TABLE(acpi, hideep_acpi_id)
and MODULE_DEVICE_TABLE(of, hideep_match_table)
respectively.
Module autoloading
The module aliases information is only used by user-space, the kernel uses the actual device tables to match the driver with the registered devices. In fact, the MODULE_DEVICE_TABLE()
is a no-op if the driver is built-in the kernel image and not built as a module.
The way this work is that when a device is registered for a bus type, the struct bus_type.uevent
callback is executed and the bus driver reports a uevent
to udev
to take some actions. The uevent
contains key-value pairs and one of them is the device MODALIAS
.
For example, on my laptop when the PCI bus is enumerated and my GPU registered the following uevent MODALIAS
will be sent (as shown by udevadm monitor -p
):
KERNEL[189823.929341] add /devices/pci0000:00/0000:00:02.0 (pci)
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:02.0
SUBSYSTEM=pci
...
MODALIAS=pci:v00008086d00003EA0sv000017AAsd00002292bc03sc00i00
...
This information is then used by udev
and pass to kmod
to load the module if needed. It will do something like:
$ modprobe pci:v00008086d00003EA0sv000017AAsd00002292bc03sc00i00
Since mod{probe,info}
can also take a module alias besides the module name. For exapmle, the following should tell the module that matches this alias:
$ modinfo pci:v00008086d00003EA0sv000017AAsd00002292bc03sc00i00 | grep ^name
name: i915
This information is also present in sysfs, i.e:
$ cat /sys/devices/pci0000\:00/0000\:00\:02.0/uevent
DRIVER=i915
PCI_CLASS=30000
PCI_ID=8086:3EA0
PCI_SUBSYS_ID=17AA:2292
PCI_SLOT_NAME=0000:00:02.0
MODALIAS=pci:v00008086d00003EA0sv000017AAsd00002292bc03sc00i00
$ /sys/devices/pci0000\:00/0000\:00\:02.0/modalias
pci:v00008086d00003EA0sv000017AAsd00002292bc03sc00i00
In theory, drivers should only define device ID tables for the firmware interfaces that they support. That is, a driver that supports devices registered through let’s say ACPI should only need a struct acpi_device_id
. And also the same table should be used to match the driver with a device and send the MODALIAS
information. For example if a device was registered through OF, only the struct of_device_id
should be used for both matching and module alias reporting.
But in practice things are more complicated and there are exceptions in some subsystems, although that’s a topic for another time since this post got already too long.
If you are curious about the possible pitfalls though, I wrote about a bug chased some time ago in Fedora where the cause was a driver not reporting the MODALIAS
that one would expect.
Happy hacking!
Pingback: How to troubleshoot deferred devices issues in Linux | Blog | Javier Martinez Canillas