The journey through time and (disk)space

In this post I'm telling my journey which started with my blog post about using Github Action as my main CI provider - in case you missed it see here - on how to do a full yocto/poky build with just ~14GB of disk space

Typically disk space is cheap nowadays and can be used without thinking too much about it - which leads to disk usage of 50GB and more for a yocto/poky build.

So what are the options, when disk space becomes precious?

Constraints, constraints, constraints

While Github Actions is free for open source project (like mine) it is highly limited in regards of resources.

Currently (2020/04/11) you get (source)

2-core CPU
7 GB of RAM memory
14 GB of SSD disk space
maximum of 6 hours per pipeline

that's not very much when you think about doing a poky/yocto build on that, so every byte actually counts.

Constant pain

As the involved layer are constantly growing one has to care about every byte that could be saved, without giving up the overall aim of doing a from scratch build of yocto/poky.

Real big chunks on your HDD are

intermediate files (like object files) created while building
downloaded sources
native tools (as they are not packaged like every other tools)

Lets start with the simple things

rm_work

The rm_work class can be used.

Pros:

removes all intermediate file from a recipe workspace
Is a huge disk space saver

Cons:

isn't really compatible if you're using testimage.bbclass to 'smoke' test your distribution

To use it insert

INHERIT += "rm_work"

into your local.conf

rm_work_and_downloads

This class does the same but also deletes all downloads of that recipe - actually comes with the downside that sources shared across multiple recipes being downloaded more than once.

To use it insert

INHERIT += "rm_work_and_downloads"

into your local.conf.
I was curious about it and did some tests, unfortunately it turned out that using it requires more disk space as just using 'rm_work' - because each recipe will get their own download folder, which does imply that when downloads are shared between recipes (clang and llvm is an example for it) they are actually downloaded more than once, leading to more disk space usage.

I decided to stick with just using 'rm_work'

But this is still not enough

enabling (in my case rm_work) did a huge relief in regards of disk usage, but it wasn't enough to finish a build before running out of disk.

Disabling disk monitoring

By default bitbake monitors the available disk space left and automatically terminates the build if it falls below. As every byte is precious lets disable this behavior, which isn't useful when building in onetime VMs anyway

BB_DISKMON_DIRS = ""

inserted into the local.conf did the trick - now one can build using ALL available disk space.

Shrinking the size of the downloads

As every download is stored to the local disk, shrinking these was an additional task I had to deal with.
Most of the downloads bitbake does in a typical build are clones of git repositories - as we all know that cloning a repository does not only include the current set of files (you are mainly interested in) but all history (and future) of the repo + all the change sets - which turns out to be quite heavy, especially for repositories with a lot of changes.

Solution is called shallow clone - basically cloning only the requested revision, without history and all the change information.

Luckily poky does provide support for that out of the box, just add

BB_GIT_SHALLOW = "1"
BB_GIT_SHALLOW_DEPTH = "1"

into the local.conf

Removing layer history

As the layers themselves are mostly git clones, removing the git history from them turns out to be a big disk space saver.

Before building just run in shell

rm -rf <layerdir>/.git

the only downside is, that you may not sue some automatic versioning based on git-hashes in your software

Disabling non-essential features

In the default configuration of bitbake (the one you get by just running 'oe-init-build-env') some things are activated, which are normally useful, but do need some disk space, so lets disable them

sed -i "s/buildstats//g" conf/local.conf

for instance, removes the automatic include of the 'buildstats' class.

echo 'PACKAGE_CLASSES = "package_rpm"' >> conf/local.conf

should enforce usage of rpm-packages, which are much smaller than e.g. deb-packages

And still it isn't enough

As poky is using sstate caches for every successfully build component, you could easily delete all fragments from a component (except the sstate-cache) and in the following only the cache is used.

With this in mind I came up with the idea to split the actual build into 2 parts

Splitting the build

Phase 1

build 'linux-yocto'
remove all fragments not needed anymore afterwards

Phase 2

build the image

as we can stop somewhere in the middle of the build process it is easy to remove things that aren't needed anymore (like logfile, downloads, a.s.o.).

I ended up with the following shell script blocks

source poky/oe-init-build-env
eval $(bitbake busybox -e | grep "^TMPDIR")
eval $(bitbake busybox -e | grep "^DL_DIR")
bitbake linux-yocto
[ -n "${DL_DIR}" ] && rm -rf ${DL_DIR}/*
find ${TMPDIR} -type d -name "temp" -exec rm -rf {} \; || true

for the kernel part and afterwards just simply calling

bitbake my-image

that almost did it, just one more thing.
One of the biggest native tools in the workspace (besides glibc and the kernel) is actually qemu, which is used extensively throughout the build process.

Limiting qemu impact

Normally qemu is build for every possible architecture known to qemu, leading to a huge package.
Fortunately the qemu recipe offers a variable to specify the architectures build, so setting

QEMU_TARGETS = "arm aarch64 i386 x86_64"

in the local.conf did the trick - since that point I'm able to do from scratch build of poky, meta-openembedded and my layer meta-sca with just ~14GB of disk space available.

Conclusion

Poky does offer some really interesting options to limit the usage of disk space, but on the outer edges sometimes it requires some tricks and brute force to make things work.
In an ideal world one could split the build even further lowering the required disk space even more - you only have to be sure in what order the components depend on each other.

Update 2020/04/17

It happens to be, that I was forced to split the pipelines into 3 steps (glibc, kernel, rootfs) as the otherwise the disk usage limit was hit once again - in combination with the previously taken measures this is really a no-brainer...

Speedup python on embedded systems

Have you ever considered to use python as a scripting language in an embedded system? I've been using this on recent projects although it wasn't my first choice. If I had to choose a scripting language to be used in embedded I always had a strong preference for shell/bash or lua, because they are either builtin or designed to have a significant lower footprint compared to others. Nevertheless the choice was python3 (was out of my hands to decide). When putting together the first builds using YOCTO I realized that there are two sides to python. the starting phase, where the app is initializing the execution phase, where the app just processes new data In the 2nd phase python3 has good tradeoffs between maintainability of code vs. execution speed, so there is nothing to moan about. Startup is the worst But the 1st phase where the python3-interpreter is starting is really bad. So I did some research where is might be coming from. Just to give a comparison of ...

Bit-baking with soda

Dieses Blog durchsuchen