PMIC ADC and Thermals

In this short post, I’ll share what I learned setting up ADC (analog-digital converter, used for measuring temp and voltage) and Thermal Monitor (notify kernel when stuff gets hot).

Setup

From what I can gather this applies to QCom SoCs/PMICs, so keep that in mind before you fry your non-QCom device.

  • SoCs have a few PMICs (Power Management ICs) on-board. In the case of sm4250 (sm6115), these are the pm6125 and pmi632;
  • PMICs mainly handle the power needs of the device (i.e have a bunch of regulators);
  • most have additional features like PON (power on), GPIOs, ADC for volt/temp sensors, TM (temperature monitors)
  • in addition they might handle the fuel gauge (how full is the battery) and the battery charging (supply the optimum amount of volt/amps to charge the battery quickly and safely)

Sub-Components

Power On (PON)

Restarting the device when it is stuck normally requires holding one or two buttons (like Power+VolumeUp or Power+VolumeDown). In order for this to work when the kernel is a goner, these keys need to be handled on another chip (i.e not on the main CPU). So what better place than the PMIC, which can also cut power to the device.

Additionally the OS also needs to know when these keys are pressed (because they are used for more mundane reasons like putting device to sleep and changing the volume). So the pon driver’s job (qcom,pm8916-pon in our case), is to create input devices (i.e like a keyboard), so these keys can reach the kernel.

There are plenty of examples in mainline code. Normally the PMIC has the power key (pwrkey) and another key (resin), and the PMIC itself doesn’t really know which key it is, so the linking between resin and keycode happens in the device DTS, and not the PMIC DTSI.

Analog-Digital Converter (ADC)

So the PMIC handles power, and with great power comes great responsibility not to fry stuff up. So it’s only natural for the PMIC to offer temp/voltage/power sensing as part of the package. The idea is that if the temp of critical components gets too hot, the device can shut down immediately (well, normally the PMIC notifies the kernel with an interrupt, and gives some time for the kernel to clean up, but when the times up the lights go out).

There are several different interfaces that deal with that in a modern QCom PMIC. Let’s have a look in order.

ADC5 IIO

The sensing part can be accessed directly to answer questions like what is the temperature/voltage/power over there. In our case the compatible string is qcom,spmi-adc5.

Every single item that is measured is dubbed a channel, and channels are defined in both the driver and DT. Among other things what can be configured for each channel is:

  • channel id (reg) – the identification of the channel (used to match driver and DT channels). Some minor options could be tweaked by DT, but some can’t (like scaling function). So if you need to set a different scaling function for a channel already present in ML, you have to add a new compatible string and copy over existing channels… ugh;
  • pre-scale (nominator, denominator) – multiply the result by a given fraction;
  • scaling – non-linear scaling (several options)
  • calibration – how to calibrate the sensor
  • hardware settle time – time to wait after requesting the measurement and actual measurement

Good news is that mainline driver and downstream driver are pretty close, you might have to add a channel or two to the driver, but copying over with minor modifications should do the trick.

When all is set and done, the driver exposes an iio (industrial io) interface. Interested parties can read files looking like this: /sys/bus/iio/devices/*/in_*_input. Basically driver-processed sensor data.

ADC-TM5

These same channels that could be read by the above-mentioned driver could also be used for TM (Temperature Monitoring). The basic idea is to set a lower and upper bound (trip points), and get a notification (interrupt) from the PMIC once a trip point is crossed (violated). Less power is consumed if a dedicated IC monitors the temps, and it’s also more reliable.

The job of the IC/driver (compatible qcom,spmi-adc-tm5), is to integrate the temperature sensor with the thermal zone kernel APIs. In order to do that, it should be able to a) read the temperature and b) set trip points (Please check thermal zone section for details). The reading part is handled by the iio IC/driver, and the trip points are handled in this one.

The TM chip can monitor a number of signals (but normally much less than the number of signals that are measured by the iio IC. In my case 4 vs 7), so you should choose wisely which signals to handle (or just copy DS, probably they chose well). Each thermal signal is defined in DT as a child node of the device, referencing via io-channels the source IIO sensor.

Something that keeps me up at night, is that some of the properties that can be configured on IIO IC, can also be configured on the TM IC, so you can end up with TM tripping a point, but then if you read the temp (via IIO) it could have not violated the trip. Well it could also be time related 🙂

Thermal Alarm

Whereas the ADC-TM5 is mostly for shits and giggles (it can down-clock stuff and run fans), there is another IC that is tasked to monitor a single critical temperature — the die temp. I guess it’s a great pun, because it monitors the die temperature, so it can die gracefully. This IC has a lot more options, multiple trip points, configurable time to give OS to cleanup before pulling the plug etc.

Interestingly, the driver supports an io-channels input, so thermal zones that reference this alarm can also read the temp. There is also a hacked-up version that figures out the temp without relying on iio, probably for older chips. My guess is this is the oldest interface.

Thermal Zones and Cooling devices

This is just scratching the surface, if you want to scratch harder check the official docs, or dt bindings.

So here are the participants:

  • sensor — something that measures temperature
  • trip point — a particular temperature that you want something to happen (turn fan on, down-clock CPU, emergency shutdown)
  • cooling device — you guessed it. However, CPUs are also considered cooling devices, I guess this is where down-clocking (and other similar techniques) come into play
  • cooling maps — a mapping between trip point and cooling device + level. I.e if you reach 60 deg C (trip point), put fan at level 2
  • thermal zone — a way of tying together a sensor, a set of trip points and a cooling map

You might find useful

  • Trip points type: active, passive, hot, critical:
    • active: start a fan to cool things down (first step, if there is fan of course);
    • passive: bring down the computing power (i.e down-clock, idle-insertion etc);
    • hot: sends a message to the thermal driver;
    • critical: trigger system shutdown;
  • Checking trips. Some sensors support interrupt-driven trip checking (i.e ADC-TM5 shown above), others do not. For the ones that do offer interrupts, the polling-delay can be 0, otherwise non-zero number would specify how often (in ms) should the kernel check for trip violations;
  • Using IIO as sensor. So the kernel has this beautiful API for something generating a stream of data (iio), but unfortunately there is no clean way to say: “use this iio stream as temperature sensor for a thermal zone”. In Downstream some clever engineer tweaked the adm-tm5 (naming it adm-tm5-iio) driver to support sensor registration with the thermal zone, without the interrupt driven trip checks (what adm-tm5 is actually doing). Maybe not so cleverly, they made it look like an actual sub-device (with a made up register) which got me scratching my head…