Direkt zum Hauptbereich

Making go not a no-go

Anyone that dealt with container engines came across go - a wonderful language, that was built to provide a right way of what C++ intended to do.
The language itself is pretty straight forward and upstream poky support is given since ages...

In the go world one would just run

1
2
 go get github.com/foo/bar
 go build github.com/foo/bar

and magically the go ecosystem would pull all the needed sources and build them into an executable.

This is where the issues start...

In the Openembedded world, one would have 
  • one provider (aka recipe) for each dependency
  • each recipe comes with a (remote) artifact (e.g. tarball, git repo, a.s.o.) which can be archived (so one can build the same software at a later point in time without any online connectivity)
  • dedicated license information
all this information is pretty useful when working is an environment (aka company) that has restrictions, such as
  • reproducible builds
  • license compliance
  • security compliance (for instance no unpatched CVE)
but when using go, all that is just present for the very top level repository.

But what so different about go?

Internally go use a file called go.mod - a typical example looks like this
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 module cloud.google.com/go/firestore

 go 1.11

 require (
	cloud.google.com/go v0.81.0
	github.com/golang/protobuf v1.5.2
	github.com/google/go-cmp v0.5.5
	github.com/googleapis/gax-go/v2 v2.0.5
	google.golang.org/api v0.44.0
	google.golang.org/genproto v0.0.0-20210415145412-64678f1ae2d5
	google.golang.org/grpc v1.37.0
 )

first of all a module name, next the minimum required go version and then a larger block of dependencies... 
1 
cloud.google.com/go v0.81.0

this basically says we need the code of the repository cloud.google.com/go tagged version 0.81.0.
So the go compiler suite will pull these sources and unpack them into the build workspace without any further action required from you.
Which also implies that this operation can pull whatever sources, from whatever host in whatever version :-(

go and unknown license information

As you may already can imagine, dependencies have dependencies themselves, so it will be a lot of code being pulled just to make your top level repository compile.

All the metadata of these dependencies and the dependencies of the dependencies is not known to bitbake at all.

Just imagine the following (and yes that is a slightly adapted real world example)
  • you are not allowed to use GPL-v3.0 in your product
  • you are using a go module which is licensed MIT
so everything looks good in the first place - but if you look into the dependency chain
  • github.com/foo/bar [MIT], pulls
  • github.com/bar/baz [Apache-2.0], which pulls
  • github.com/some/other [GPL-v3.0]
as the there is not real way to determine if actually code of github.com/some/other lands in the executable of github.com/foo/bar, one has to consider the license information of both github.com/foo/bar and github.com/bar/baz to be wrong - both are to be considered GPL-v3.0, which is what you can't use...

Without in-depth analysis that would remain completely hidden from you.

go and not-reproducible builds

as the dependencies are stored in just the go.mod file of the repository you're trying to build, there is no way to have the complete set of needed sources before starting the actual compile run.

One could manually create recipes for all dependencies and inject them with the help of DEPENDS, **but** this is not how the go community likes to play this game...

Another real world example
  • golang.org/x/tools requires golang.org/x/net
  • golang.org/x/net requires golang.org/x/text
  • golang.org/x/text requires golang.org/x/tools
I mean we all love circular dependencies, right? :-(

So creating a recipe per dependencies is off the table...

go and the "missing" security

As we all know it usually is best to use the latest greatest of open source when it comes to bugfixes and patched security issues - again something that the go community decided to hide behind some magic super daemon, that does all the magic for you

If we have a look at https://golang.org/ref/mod#minimal-version-selection it is in the end somehow nondeterministic which version of a module is actually pulled for compilation, as
  • a module might have withdrawn a version in the meantime
  • a module replaces an interface with a forked version
all that is usually happening behind the scenes... but be sure it will be in 9/10 cases not the latest greatest version - but as we are building (mostly) for an embedded target, we actually want to know all these tiny nifty details.

Which leaves us with the only possible conclusion, go ecosystem isn't very well suited to be used with openembedded.

But what if I told you it could...

First of all we need to have archivable sources - this is where I came across https://proxy.golang.org, which does provide you with an API to query versions, artifacts and much more.

For instance we can get the available versions
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
 # wget https://proxy.golang.org/cloud.google.com/go/@v/list

 v0.26.0
 v0.36.0
 v0.15.0
 v0.69.1
 v0.45.0
 v0.55.0
 v0.46.3
 v0.68.0
 v0.50.0
 v0.7.0
 v0.37.4
 v0.40.0
 v0.37.2
 v0.3.0
 v0.37.0
 v0.33.1
 v0.52.0
 v0.8.0
 v0.43.0
 v0.6.0
 v0.33.0
 v0.10.0
 ...

or the latest version

1
2
3
 # wget https://proxy.golang.org/cloud.google.com/go/@latest

 {"Version":"v0.81.0","Time":"2021-04-02T19:10:02Z"}

and we can even download a zip file containing the corresponding sources

1
 # wget https://proxy.golang.org/cloud.google.com/go/@v/v0.81.0.zip

so we can put a check on "archivable sources" and even "using the latest greatest" (as we have all the needed information covered by the API)

That leaves us with avoiding any circular dependencies - that was the most tricky part, but I managed to code a script, which
  • analyses any go.mod file
  • double checks the needed dependencies - by running

    1
     go list -f '{{ join .Imports " " }}' ./...
    

  • extracts the found license information (special thanks to the wonderful scancode-toolkit)
  • dumps all this information into a bitbake recipe (including the needed dependencies)

But wait, didn't you wrote earlier that this causes circular depedencies...

you're absolutely right - I had to use a little trick to make it work.

Actually any go module will consist of a bb file, for instance
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 SUMMARY = "go.mod: golang.org/x/tools"
 HOMEPAGE = "https://pkg.go.dev/golang.org/x/tools"

 # License is determined by the modules included and will be therefore computed
 LICENSE = "${@' & '.join(sorted(set(x for x in (d.getVar('GOSRC_LICENSE') or '').split(' ') if x)))}"
 
 # inject the needed sources
 require golang.org-x-tools-sources.inc
 
 GO_IMPORT = "golang.org/x/tools"
 
 inherit gosrc


which does just some very generic metadata, but no actual artifact reference - this will be provided by a corresponding include file
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
 SRC_URI += "https://proxy.golang.org/golang.org/x/tools/@v/v0.1.0.zip;srcoutput=golang.org/x/tools;srcinput=golang.org/x/tools@v0.1.0;downloadfilename=golang-org-x-tools-0.1.0.zip;name=golang-org-x-tools"
 SRC_URI[golang-org-x-tools.sha256sum] = "bb7d50a844ccfbe67a8d51ce04404bddc8cdc46eaf3fe82d84806d61fffc22dd"
 
 GOSRC_LICENSE += "\
    BSD-3-Clause \
 "
 LIC_FILES_CHKSUM += "\
    file://src/golang.org/x/tools/LICENSE;md5=5d4950ecb7b26d2c5e4e7b4e0dd74707 \
    file://src/golang.org/x/tools/cmd/getgo/LICENSE;md5=4ac66f7dea41d8d116cb7fb28aeff2ab \
 "

 GOSRC_INCLUDEGUARD += "golang.org-x-tools-sources.inc"
 
 require ${@bb.utils.contains('GOSRC_INCLUDEGUARD', 'github.com-yuin-goldmark-sources.inc', '', 'github.com-yuin-goldmark-sources.inc', d)}
 require ${@bb.utils.contains('GOSRC_INCLUDEGUARD', 'golang.org-x-mod-sources.inc', '', 'golang.org-x-mod-sources.inc', d)}
 require ${@bb.utils.contains('GOSRC_INCLUDEGUARD', 'golang.org-x-net-sources.inc', '', 'golang.org-x-net-sources.inc', d)}
 require ${@bb.utils.contains('GOSRC_INCLUDEGUARD', 'golang.org-x-sync-sources.inc', '', 'golang.org-x-sync-sources.inc', d)}
 require ${@bb.utils.contains('GOSRC_INCLUDEGUARD', 'golang.org-x-sys-sources.inc', '', 'golang.org-x-sys-sources.inc', d)}
 require ${@bb.utils.contains('GOSRC_INCLUDEGUARD', 'golang.org-x-xerrors-sources.inc', '', 'golang.org-x-xerrors-sources.inc', d)}

that one provides you with the actual sources being pulled and the correct license information.
Also the include file sets the needed dependencies required to build (which are include files of their own).
To avoid pulling the same source more than once a workspace, I used something very common in the C-world: header guard macros - so each source is only included once (which breaks the vicious circular dependency circle)

Assembling

What now happens if any of the bb recipes are build...
  • the top-level bb file, let's call it foo_1.0.bb, includes foo.inc
  • foo.inc sets the main sources fetched as a zip file from proxy.golang.org
  • furthermore it includes bar.inc and baz.inc
  • bar.inc and baz.inc are not versioned so they will pull the latest available sources defined via an recipe in the workspace
  • after all source zip files have been fetched the internal bbclass (gosrc) puts every source zip file extract into a the right place
  • the go compiler find all of the required files and builds an executable, which can be used without any of the dependencies (more or less a static linked executable, ready to be shipped)
In the end you have an executable which is compiled from reproducible sources, the latest greatest and additionally we have all the needed compliance information... mission accomplished!

The actual script can be found here - feel free to use and adjust it to your needs 

Kommentare

Beliebte Posts aus diesem Blog

Sharing is caring... about task hashes

The YOCTO-project can do amazing things, but requires a very decent build machine, as by nature when you build everything from scratch it does require a lot of compilation. So the ultimate goal has to be to perform only the necessary steps in each run. Understanding task hashing The thing is that bitbake uses a task hashing to determine, which tasks (such as compilation, packaging, a.s.o.) are actually required to be performed. As tasks depend on each other, this information is also embedded into a hash, so the last task for a recipe is ultimately depending on the variable that are used for this specific task and every task before. You could visualize this by using a utility called bitbake-dumpsig , which produces output like this basewhitelist: {'SOURCE_DATE_EPOCH', 'FILESEXTRAPATHS', 'PRSERV_HOST', 'THISDIR', 'TMPDIR', 'WORKDIR', 'EXTERNAL_TOOLCHAIN', 'FILE', 'BB_TASKHASH', 'USER', 'BBSERVER&

Speedup python on embedded systems

Have you ever considered to use python as a scripting language in an embedded system? I've been using this on recent projects although it wasn't my first choice. If I had to choose a scripting language to be used in embedded I always had a strong preference for shell/bash or lua, because they are either builtin or designed to have a significant lower footprint compared to others. Nevertheless the choice was python3 (was out of my hands to decide). When putting together the first builds using YOCTO I realized that there are two sides to python. the starting phase, where the app is initializing the execution phase, where the app just processes new data In the 2nd phase python3 has good tradeoffs between maintainability of code vs. execution speed, so there is nothing to moan about. Startup is the worst But the 1st phase where the python3-interpreter is starting is really bad. So I did some research where is might be coming from. Just to give a comparison of