I have written before on how to troubleshoot drivers’ deferred probe issues in Linux. And while having a /sys/kernel/debug/devices_deferred debugfs entry is quite useful to know the list of devices whose drivers failed to probe due being deferred [0], there are also some debug command line parameters that could help with the troubleshooting.
But first, an explanation of what these parameters do:
Devices require different resources in order to be operative, for example a display controller could need a set of clocks, power domains and regulators to be enabled.
Ideally, if these are managed by Linux, these should be declared in the hardware description that is handled to the kernel (e.g: Device Tree Blobs) but sometimes this isn’t the case. And devices may be working just because the firmware that booted Linux left these required resources enabled.
But of course this has the disadvantage that Linux can’t manage those, and can’t for example disable a clock or power domain when is not used.
It is common though to add support for a platform incrementally and so it could be that at the beginning this relies on the setup made by the firmware (e.g: some required clocks left enabled and Linux not knowing about them). But later, support for these clocks could and the Common Clock Framework (CCF) will now be aware of them.
But it could be that the Device Tree was not updated to define the dependency between some device that requires these introduced clocks. If that is the case, the clocks may appear to be unused by the system and the CCF rightfully decide to disable them [1].
If that is the case, this message is printed in the kernel log buffer:
[ 5.189930] clk: Disabling unused clocks
This will of course make the device to not work anymore because one of its required clock has been gated.
To prevent this, there is a debug clk_ignore_unused command line parameter that can be used to prevent the CCF to gate unused clocks. When this is used, the unused clocks won’t be disabled an the following message be printed instead:
[ 5.200758] clk: Not disabling unused clocks
And is the same for power domains and regulators, the frameworks will disable unused ones and the pd_ignore_unused and regulator_ignore_unused command line parameters can be used to prevent this. When using them, the following messages are printed in the kernel log:
[ 5.186559] PM: genpd: Not disabling unused power domains
[ 35.298779] regulator: Not disabling unused regulators
The {clk,pd}_ignore_unused parameters have been present in Linux for a long time, but we didn’t have one for regulators until recently.
Before, to prevent an unused regulator to be disabled by the regulator framework, it had to be marked in the Device Tree as always-on by using the regulator-always-on property. But this requires to modify the Device Trees which is more cumbersome just to troubleshoot drivers regressions.
So after talking with with Mark Brown (the regulator framework maintainer), I proposed to add a regulator_ignore_unused debug parameter which landed in Linux kernel v6.8.
I hope these debug parameters can be useful when facing regressions in drivers.
Happy hacking!
[0] Read this older post if want to learn more about about drivers and devices registration, matching and probe deferral.
[1] Brian Masney pointed me out that the list of clocks being disabled can be shown by using the tp_printk trace_event=clk:clk_disable cmdline param as explained in the kernel docs.