Direkt zum Hauptbereich

The journey through time and (disk)space

In this post I'm telling my journey which started with my blog post about using Github Action as my main CI provider - in case you missed it see here - on how to do a full yocto/poky build with just ~14GB of disk space

Typically disk space is cheap nowadays and can be used without thinking too much about it - which leads to disk usage of 50GB and more for a yocto/poky build.

So what are the options, when disk space becomes precious?

Constraints, constraints, constraints

While Github Actions is free for open source project (like mine) it is highly limited in regards of resources.
Currently (2020/04/11) you get (source
  • 2-core CPU
  • 7 GB of RAM memory
  • 14 GB of SSD disk space
  • maximum of 6 hours per pipeline
that's not very much when you think about doing a poky/yocto build on that, so every byte actually counts.

Constant pain

As the involved layer are constantly growing one has to care about every byte that could be saved, without giving up the overall aim of doing a from scratch build of yocto/poky.

Real big chunks on your HDD are 
  • intermediate files (like object files) created while building
  • downloaded sources
  • native tools (as they are not packaged like every other tools)

Lets start with the simple things

rm_work

The rm_work class can be used.

Pros:
  • removes all intermediate file from a recipe workspace
  • Is a huge disk space saver
Cons:
  • isn't really compatible if you're using testimage.bbclass to 'smoke' test your distribution
To use it insert

INHERIT += "rm_work"
into your local.conf

rm_work_and_downloads

This class does the same but also deletes all downloads of that recipe - actually comes with the downside that sources shared across multiple recipes being downloaded more than once.
To use it insert
INHERIT += "rm_work_and_downloads"
into your local.conf.
I was curious about it and did some tests, unfortunately it turned out that using it requires more disk space as just using 'rm_work' - because each recipe will get their own download folder, which does imply that when downloads are shared between recipes (clang and llvm is an example for it) they are actually downloaded more than once, leading to more disk space usage.

I decided to stick with just using 'rm_work'

But this is still not enough 

enabling (in my case rm_work) did a huge relief in regards of disk usage, but it wasn't enough to finish a build before running out of disk.

Disabling disk monitoring

By default bitbake monitors the available disk space left and automatically terminates the build if it falls below. As every byte is precious lets disable this behavior, which isn't useful when building in onetime VMs anyway

BB_DISKMON_DIRS = ""
inserted into the local.conf did the trick - now one can build using ALL available disk space.

Shrinking the size of the downloads

As every download is stored to the local disk, shrinking these was an additional task I had to deal with.
Most of the downloads bitbake does in a typical build are clones of git repositories - as we all know that cloning a repository does not only include the current set of files (you are mainly interested in) but all history (and future) of the repo + all the change sets - which turns out to be quite heavy, especially for repositories with a lot of changes.

Solution is called shallow clone - basically cloning only the requested revision, without history and all the change information.
Luckily poky does provide support for that out of the box, just add
BB_GIT_SHALLOW = "1"
BB_GIT_SHALLOW_DEPTH = "1"
into the local.conf

Removing layer history 

As the layers themselves are mostly git clones, removing the git history from them turns out to be a big disk space saver.
Before building just run in shell
rm -rf <layerdir>/.git
the only downside is, that you may not sue some automatic versioning based on git-hashes in your software

Disabling non-essential features 

In the default configuration of bitbake (the one you get by just running 'oe-init-build-env') some things are activated, which are normally useful, but do need some disk space, so lets disable them

sed -i "s/buildstats//g" conf/local.conf
for instance, removes the automatic include of the 'buildstats' class.

echo 'PACKAGE_CLASSES = "package_rpm"' >> conf/local.conf
should  enforce usage of rpm-packages, which are much smaller than e.g. deb-packages

And still it isn't enough

As poky is using sstate caches for every successfully build component, you could easily delete all fragments from a component (except the sstate-cache) and in the following only the cache is used.

With this in mind I came up with the idea to split the actual build into 2 parts

Splitting the build


Phase 1
  • build 'linux-yocto'
  • remove all fragments not needed anymore afterwards
Phase 2
  • build the image
as we can stop somewhere in the middle of the build process it is easy to remove things that aren't needed anymore (like logfile, downloads, a.s.o.).
I ended up with the following shell script blocks

source poky/oe-init-build-env
eval $(bitbake busybox -e | grep "^TMPDIR")
eval $(bitbake busybox -e | grep "^DL_DIR")
bitbake linux-yocto
[ -n "${DL_DIR}" ] && rm -rf ${DL_DIR}/*
find ${TMPDIR} -type d -name "temp" -exec rm -rf {} \; || true
for the kernel part and afterwards just simply calling
bitbake my-image
that almost did it, just one more thing.
One of the biggest native tools in the workspace (besides glibc and the kernel) is actually qemu, which is used extensively throughout the build process.

Limiting qemu impact

Normally qemu is build for every possible architecture known to qemu, leading to a huge package.
Fortunately the qemu recipe offers a variable to specify the architectures build, so setting

QEMU_TARGETS = "arm aarch64 i386 x86_64"
in the local.conf did the trick - since that point I'm able to do from scratch build of poky, meta-openembedded and my layer meta-sca with just ~14GB of disk space available.

Conclusion

Poky does offer some really interesting options to limit the usage of disk space, but on the outer edges sometimes it requires some tricks and brute force to make things work.
In an ideal world one could split the build even further lowering the required disk space even more - you only have to be sure in what order the components depend on each other.

Update 2020/04/17

It happens to be, that I was forced to split the pipelines into 3 steps (glibc, kernel, rootfs) as the otherwise the disk usage limit was hit once again - in combination with the previously taken measures this is really a no-brainer... 

Kommentare

Beliebte Posts aus diesem Blog

Sharing is caring... about task hashes

The YOCTO-project can do amazing things, but requires a very decent build machine, as by nature when you build everything from scratch it does require a lot of compilation. So the ultimate goal has to be to perform only the necessary steps in each run. Understanding task hashing The thing is that bitbake uses a task hashing to determine, which tasks (such as compilation, packaging, a.s.o.) are actually required to be performed. As tasks depend on each other, this information is also embedded into a hash, so the last task for a recipe is ultimately depending on the variable that are used for this specific task and every task before. You could visualize this by using a utility called bitbake-dumpsig , which produces output like this basewhitelist: {'SOURCE_DATE_EPOCH', 'FILESEXTRAPATHS', 'PRSERV_HOST', 'THISDIR', 'TMPDIR', 'WORKDIR', 'EXTERNAL_TOOLCHAIN', 'FILE', 'BB_TASKHASH', 'USER', 'BBSERVER&

Making go not a no-go

Anyone that dealt with container engines came across go - a wonderful language, that was built to provide a right way of what C++ intended to do. The language itself is pretty straight forward and upstream poky support is given since ages... In the go world one would just run 1 2 go get github.com/foo/bar go build github.com/foo/bar and magically the go ecosystem would pull all the needed sources and build them into an executable. This is where the issues start... In the Openembedded world, one would have  one provider (aka recipe) for each dependency each recipe comes with a (remote) artifact (e.g. tarball, git repo, a.s.o.) which can be archived (so one can build the same software at a later point in time without any online connectivity) dedicated license information all this information is pretty useful when working is an environment (aka company) that has restrictions, such as reproducible builds license compliance security compliance (for instance no unpatched CVE) but when us

Speedup python on embedded systems

Have you ever considered to use python as a scripting language in an embedded system? I've been using this on recent projects although it wasn't my first choice. If I had to choose a scripting language to be used in embedded I always had a strong preference for shell/bash or lua, because they are either builtin or designed to have a significant lower footprint compared to others. Nevertheless the choice was python3 (was out of my hands to decide). When putting together the first builds using YOCTO I realized that there are two sides to python. the starting phase, where the app is initializing the execution phase, where the app just processes new data In the 2nd phase python3 has good tradeoffs between maintainability of code vs. execution speed, so there is nothing to moan about. Startup is the worst But the 1st phase where the python3-interpreter is starting is really bad. So I did some research where is might be coming from. Just to give a comparison of