Building stock kernel

The next step in mainlining is to build the original/stock/downstream kernel. We’ll use it as a reference point and to extract additional details when the need arises. The stock kernel’s source is provided by the manufacturer, so building it should be relatively easy.

Overview

This shouldn’t be complicated, here is the battle plan:

  • find and download manufacturer provided kernel
  • extract kconfig from running android
  • attempt to build kernel with given kconfig
  • fix all compiler errors (not warnings)
  • smoke test the kernel to make sure it’s good

Build setup

We already went through finding manufacturer’s kernel and extracting kconfig is simly done by `adb pull /proc/config.gz config.gz && zcat config.gz > kconfig`.

I want to remind you to update your device to latest OTA update (well unless they fix some exploit that you want to use), or at least make sure not to run the bone-stock firmware the device came with, because these tend to be buggy. Read roadblocks section if you want to have a laugh.

Another thing to consider is which compiler to use. There are two options: gcc and clang. As most things in life, some choices are already made for you. In our case, the stock kconfig indicates the use of clang:

# kconfig from running android
CONFIG_CC_IS_CLANG=y
CONFIG_CLANG_VERSION=80016

This does not mean that you can’t use gcc to build the kernel, but doing so will likely involve fixing more issues. Here are the basic steps:

cd path/to/kernel
# make an output directory, it will hold all
# compilation products
mkdir .output
# get the kconfig from the running kernel
cp path/to/kconfig .output/.config
# fix any kconfig inconsistencies
ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
    make \
    -C $PWD O=$PWD/.output \
    CC=clang olddefconfig
# compare path/to/kconfig with .output/.config to
# ensure it's not changed too much
meld path/to/kconfig .output/.config
# now build the kernel
ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
    make \
    -C $PWD O=$PWD/.output \
    CC=clang -j8

# enjoy your kernel responsibly
% ls .output/arch/arm64/boot
dts  Image  Image.gz

## building with gcc
ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
    make \
    -C $PWD O=$PWD/.output \
    CC=aarch64-linux-gnu-gcc HOSTCC=gcc

If the above seems straight forward you can jump to the next section. If you’re reaching for a strong drink — don’t despair, I’ll explain it in more detail.

make output directory

This is actually built-in make, so you can use it in any project that uses make. The idea is to separate the code from all the build artifacts (object files, generated code, executables etc). To tell make to use an output directory specify O=path/to/output_dir. In my case I also specify -C path/to/source but if you’re already there, it’s redundant. The beauty is that you can maintain separate output directories for different purposes (like switching the compiler), and then you can turn back to it. Also it keeps the repo clean, you don’t get object files next to source files.

fix kconfig inconsistencies

Normally you’d configure your kernel by using make menuconfig, browse through all options and select the once you need. In our case all the right options are already picked — we have the options the kernel in use was compiled with. Unfortunately you can’t really trust the manufacturers (or, in fact, anybody) that the code they provide and the code running on the phone is exactly the same. Maybe they put a backdoor in, maybe they got drunk on release day and forgot to update the repo. This is one of the reasons people insist on being able to compile the kernel themselves — to be (more) sure that the code that runs is the one in the repo.

In any case, if you have a kconfig from somewhere, and what to make sure the kernel is completely happy with it (kernel configuration has a lot of requirements, like some options depend on others, new mandatory options are added all the time), you have to run make olddefconfig. What this does is tell the make system, “hey, I have this config which was just fine last week (or last year), I want it updated”. It basically checks all requirements, assigns default values to new options, and deletes options it doesn’t know about.

As you might have guessed, this process might make zero changes, or it might make a ton of changes. It depends. So, especially in our scenario, where we have no idea whether the code/kconfig are actually married together, it is nice to verify what olddefconfig did. So I suggest you diff the original and the modified version, meld is a nice graphical diff tool, but you can use any diff tool. If you forgot the ARCH=arm64 for example, you’ll see a lot of changes 🙂 Try it!

make line

There’s not much left to explain, but here it is: The ARCH env variable controls the target architecture (i.e where will the kernel run). If you’re wondering what are the accepted values (like arm64 vs aarch64), just look under arch/ folder in your linux repo.

CROSS_COMPILE is another rather important variable. Unless you have an M1 apple laptop (or a Pine Book Pro), then you’re likely still on a rusty old x86_64, and normally your compiler knows how to compile for x86_64 targets (i.e for your machine). To remedy this you’d have to install a cross compiler. Search your distribution for available packages, on mine the package was called aarch64-linux-gnu-gcc. clang actually has cross compiling built-in, but it still needs the CROSS_COMPILE variable for the linker.

CC specifies the C Compiler. When you compile with gcc, the name of the cross compiler is the rather wordy aarch64-linux-gnu-gcc, whether for clang it’s just … clang (because it can basically cross compile to everything, by design). When your CC differs from your HOST CC (i.e the compiler you use on your compiling machine to compile code for it), you’d have to specify HOSTCC.

And last, but not least there is -j8. This tells the compiler to do 8 things in parallel (mostly compile sources). The right setting depends on your machine, how many cpu cores it has. You can just use the number of cpus on your system as a starting point.

Testing the kernel

If your mantra is “If it compiles, ship it!”, then you’ll become an excellent vendor kernel developer. I have a bit higher standard for this blog, so I wanted to make sure it also runs. That is why this post is delayed with 4 days.

The most trouble-free way to test the kernel (and believe me, you want to start here), is to get the boot.img from your vendor OTA update, and repackage it to use your kernel. There are 3 tools you can use for the job, abootimg, mkbootimg-osm0sis, and mkbootimg (android). In short, abootimg is a simpler script that lets you extract, create, and replace stuff in a boot img, mkbootimg-osm0sis is a fork of the official tool from android (used in postmarketos), and then there is the official android tool.

% mkdir boot-dir
% # this is the android tool
% unpack_bootimg --boot_img path/to/boot.img --out boot-dir > boot-dir/info
% # check what happened
% ls boot-dir
dtb  info  kernel  ramdisk
% cp path/to/kernel/.output/arch/arm64/boot/Image.gz boot-dir/
% cd boot-dir
% # now the fun part
% mkbootimg --base 0x0 --kernel kernel --ramdisk ramdisk \
  --dtb dtb --cmdline 'androidboot.hardware=qcom androidboot.console=ttyMSM0 androidboot.memcg=1 lpm_levels.sleep_disabled=1 video=vfb:640x400,bpp=32,memsize=3072000 msm_rtb.filter=0x237 service_locator.enable=1 swiotlb=2048 loop.max_part=7 buildvariant=user' \
  --kernel_offset  0x8000 --ramdisk_offset  0x1000000 \
  --second_offset 0xf00000 --dtb_offset  0x1f00000 \
  --os_version  10.0.0 --os_patch_level  2020-12 \
  --header_version 2 --pagesize 4096 -o ../boot-new.img

Holy smokes! Well, you can read more in the roadblocks section, but it suffices to say that you have to pass all addresses and other bits that the unpack command above spit out (and we helpfully redirected in the info file).

Now you can just restart to fastboot, and flash boot-new.img via:

% fastboot flash boot path/to/boot-new.img

If you don’t like to move fast and break things, especially since they are your things, you might want to use the A/B functionality of your device. In short, newer devices might have two copies of the important partitions, so you can install an update while the system is running and rollback to the old version if the new one is not kosher. As you might have guessed we’re more interested in the rollback functionality.

There are two slots called A and B, and you can switch between the two. These slots each have a boot, system, and a variety of other partitions. We’ll be focusing on boot partition here.

% # list all fastboot variables
% fastboot getvar all
% # for my device, the current slot is in current-slot
% fastboot getvar current-slot
a
% # change the active slot
% fastboot set_active b
Setting current slot to 'b'                        OKAY [  0.043s]
Finished. Total time: 0.045s
% fastboot flash boot path/to/new/boot.img

Now, after your build fails, you can go to fastboot (on the OnePlus Nord N100 you hold power+volup and prey), switch the active slot, and get back to whatever you had before.

Roadblocks

I’m sure you’ve all been waiting for this section. So let’s dive in:

Building

I first tried to build the kernel using the original kconfig (that was on the phone when it arrived, before the update). I quickly realized (after diffing the kconfig after oldmenuconfig) that there are a bunch of OPPO related options, and my phone is 1) not an Oppo, and 2) does not contain those options. So I tracked down the Oppo repository of a phone with the same SoC as the N100, and I realized there was a bunch of code in one repo, that was basically search-replaced in the other with oppo being replaced with oneplus. Then I read online that the two companies have basically the same R&D team, you can think of them as one doing badge-engineering versions of the other, or vice versa.

So I started copying code from the oppo repo, and fixing this and that. After an hour later of more fixing, I realize some somewhat important bits of the charging infrastructure are not aligning up properly. I started to consider what will happen if my brand new 30W fast charging phone decides to use this power to make … fire?

Then a great idea hit me — to update my phone and hope there is no more Oppo messing around in the internal organs of my OnePlus. So I did, and I was immediately relieved when I saw the Oppo config options were gone and replaced by OnePlus.

So I stashed my changes, moved the .output folder to .output-here-are-dragons, and started anew. When I was almost over with fixing the same issues again, I stumbled upon a weird error:

WARNING: EXPORT symbol "gsi_write_channel_scratch" [vmlinux] version generation failed, symbol will not be versioned.
aarch64-linux-gnu-ld: warning: -z norelro ignored
aarch64-linux-gnu-ld: drivers/platform/msm/gsi/gsi.o: relocation R_AARCH64_ABS32 against `__crc_gsi_write_channel_scratch' can not be used when making a shared object
drivers/platform/msm/gsi/gsi.o:(.data+0x0): dangerous relocation: unsupported relocation
drivers/platform/msm/gsi/gsi.o:(.data+0x28): dangerous relocation: unsupported relocation
drivers/platform/msm/gsi/gsi.o:(.data+0x50): dangerous relocation: unsupported relocation
make[1]: *** [/home/iskren/src/pmos/android_kernel_samsung_msm8974/Makefile:1124: vmlinux] Error 1

I searched around, didn’t find much, then I searched in the postmarketos porting channel specifically and found this link, following some LKML discussion about genksyms issues and fixes. I was really baffled for a while, but after I read it a few times, I got an inspiration to try and tweak the code around the error slightly to see what happens. So I made a dummy exported function, and put it in various places before the error, and I figured out something is wrong with the previous function. You can read slightly more details in my SO answer here, and you can upvote it 😀 (this is the equivalent of smash that subscribe button).

In the mean time, I switched to clang, because I was hoping the issue will go away, but it didn’t … so I had to actually fix it. If you read the error carefully you’ll see that the linker (aarch64-….-ld) is complaining, so changing the compiler won’t help. Eh, hindsight is 20:20!

patching boot.img

First some background. boot.img contains the kernel and initfs, and in one form or another, the dtb (there is also a second stage, not sure what that is, and it’s not relevant for this explanation). The accepted practice some time ago, was to append the dtb to the end of the kernel, and build the kernel with the relevant option, so in the end there was only kernel-dtb and initfs in boot.img.

The way these are strung together is with some ad-hoc binary format, where there is a header, which lists where all the relevant sections (kernel, initfs, second stage) start, and their sizes.

If you look into abootimg --help output, you’ll see there is no mention of dtb. Because the dtb was supposed to be appended to the kernel. I guess that changed at some point, because both the osm0sis and stock android tools accept a separate dtb image, and apparently the boot.img header supports that (at least if it’s a new enough version).

I wanted to be really careful with building this first boot.img containing my newly built kernel, so I decided to inspect the build img (by unpacking it once again), and compare the information with the original boot. This is how I became certain that abootimg does not handle dtb properly. Another thing I figured, was that if you pass a base address for second stage, but no second stage, the android mkbootimg script just ignores it, but the stock boot.img clearly had a second stage offset (with size 0, but still). This might sound paranoid, but I’ve seen bootloaders do weird stuff based on magic numbers, so I really wanted to get this right. Thankfully osm0sis version of the mkbootimg tool was happy to stick whatever value I pass on the cmdline in the header, even if the second stage was empty.

So I ended up wanting to use abootimg (because you just say abootimg -u current-boot.img -k new-kernel), but ended up using android’s unpack_bootimg (because it has cleaner output) and osm0sis mkbootimg (because it let me put the second stage offset). Then I verified using both android and osm0sis unpackers, to make sure no values were changed.

At least I learned how to use the bootimg tools, that will be handy in the upcoming steps.

I haven’t tried to boot an image where the second stage offset is 0. All good stories end with a bit of mystery.

pmos with downstream kernel

This is what I initially tried to do, but after it became apparent that it won’t come easy, I switched to booting stock android with modified (recompiled) kernel. So far it looks like the kernel starts booting (not rejected by bootloader) but hangs. Next post will likely cover more of that, so stay tuned 🙂