Direkt zum Hauptbereich

Transparency is key

What's the story?

I guess most of you are familiar with the concept of merge or pull requests, so I won't go into details on these - but in case you missed it, here is a brief description what GitLab thinks it is.

There are basically two side involved into a merge request (or MR as the cool kids would say;-))
  • the one who actually provides the code change, lets call them devs
  • the one who's maintaining the repository, where the change should be applied
Both sides might have different objectives.

The dev wants the code, she/he already wrote (hopefully tested) and streamlined, to be part of the repo upstream - as otherwise she/he could skipped all that work and enjoyed life (something we should all do from time to time).
The maintainer usually lacks some implementation detail and is more keen on having the contributed code to be written according to the project style, the code being regression tested, covered by unit tests, so she/he can be sure that the fresh contribution doesn't break stuff, which was working in before.

This is where sometimes worlds collide - as the focus is a different one.

One thing that avoids tension is transparency throughout the whole process of a MR.
And true transparency is based on facts - but as we all know gathering facts can be hard work.

So why not automate the low-hanging fruits of facts gathering? So everybody could put the focus on the tricky things.

Doing it with GitLab


As GitLab is becoming more and more the standard in bigger (corporate) projects, I'm going to describe how you could enhance your MR workflow using some automation.

GitLab does have a fully featured API for automation, so at least that's something, we actually don't have to invent here.

The "typical" GitLab workflow


Typically GitLab does dispatch events internally for every step that is happening at the server - so it does for merge requests.
These events can be distributed via so called "webhooks" to any other computer via network.
That's where we will start.

Getting the scope


One thing that is (at least for me) important in a MR, is that each change should pass the established quality threshold.

Limiting the same


You could now scan a whole (maybe extremely big project) with all the tools you can imagine and just dump all the stuff on the poor user - but I think it isn't a good way, as it's pretty unfair to the user, as most of the findings doesn't has anything to do with the user's changes. Also every involved party is simply drowning in information and most likely will miss the important bits.

Especially if you think of very specialized tools that can be used to gather fact about the quality of a code snippet, it might not be the best idea to pick the biggest possible cannon, if you know what I mean.

Getting an idea...


So I had the idea, what if you could scan changes by a MR with just the tools I currently think are appropriate (and that might change over the time), on just the files changed and (most important) on just the files needed!?
Those automation findings should not be kept secret, they shall be available to the one opened the merge request and the maintainer (basically they should be available to anyone, having access to the project).

That led to the idea of having a bot, which does actually scan the changes and posts back the important findings as public comments to the MR.

As I wanted the solution to be as flexible as possible, I decided to just wrap the actually used tools and grep the output of these tools to a common format, which than will be posted back.

So enough with the theory... off to the lab.

While researching for a proper way how to do it, I came across this article.
After reading it through, it was fairly easy.

  • I needed a webserver, which actually accepts the web requests from GitLab - that would be gidgetlab.
  • Then I needed to put each request into a queue, as each processing usually takes some time and I just don't want the response to GitLab to be pending too long (so we might come into some timeout situation).
  • Now as the request and the response are somehow independent operation, I needed a way to access GitLab with just the information of project ID and merge request ID - this is where python-gitlab comes into play
  • from here on it's fairly simple
    • checkout the code from of the branch that should be merged
    • run a bunch of tools on it
    • use some regex on the output
    • map it to the changes done by the MR
    • post the remaining items as comments
To be honest, there are a few more special cases to be considered, but see for yourself - nittymcpick, as I called the tool, is freely available on GitHub.

Each bot instance can be easily run as a docker container (which is also available through docker-hub) - if you like you can have hundreds of different bots commenting on each MR in a project.

Wait, what does this all have to do with YOCTO?


As this is a blog mostly about YOCTO, I also created a specific container for linting bitbake recipes - called nittymcpick-oelint.
This is using oelint-adv to just lint the bitbake recipe, which are otherwise are hard to lint.

The result could look like this




For me that's the most transparent way of informing all involved parties that there is something to talk about. As the bot itself is highly configurable, you can configure the right level of pickiness for your project needs (the shown screenshot obviously is very picky :-))

Conclusion/Future ideas


For me this excursion into GitLab automation was interesting and fun at the same time, so I had a few more ideas in the meantime.
  • If you're using GitLab in combination with Jenkins, it would be nice if the actually failure or any kind of new occurring warning from the build will be posted to the MR.
    Otherwise these information do require a lot of manual steps, which we all know are avoided, as we are all humans
  • On a paid version of GitLab this bot could also approve a MR in a multi-staged approval process
  • Plenty of other use cases I actually haven't though about yet...

Any thoughts?


First of all thanks for reading and if you have any suggestions, comments, opposing thoughts - feel free to drop these as a comment to this blog post or catch up with me at GitHub, LinkedIn or your local pub (whatever suits you best).

Kommentare

Beliebte Posts aus diesem Blog

Sharing is caring... about task hashes

The YOCTO-project can do amazing things, but requires a very decent build machine, as by nature when you build everything from scratch it does require a lot of compilation. So the ultimate goal has to be to perform only the necessary steps in each run. Understanding task hashing The thing is that bitbake uses a task hashing to determine, which tasks (such as compilation, packaging, a.s.o.) are actually required to be performed. As tasks depend on each other, this information is also embedded into a hash, so the last task for a recipe is ultimately depending on the variable that are used for this specific task and every task before. You could visualize this by using a utility called bitbake-dumpsig , which produces output like this basewhitelist: {'SOURCE_DATE_EPOCH', 'FILESEXTRAPATHS', 'PRSERV_HOST', 'THISDIR', 'TMPDIR', 'WORKDIR', 'EXTERNAL_TOOLCHAIN', 'FILE', 'BB_TASKHASH', 'USER', 'BBSERVER&

Making go not a no-go

Anyone that dealt with container engines came across go - a wonderful language, that was built to provide a right way of what C++ intended to do. The language itself is pretty straight forward and upstream poky support is given since ages... In the go world one would just run 1 2 go get github.com/foo/bar go build github.com/foo/bar and magically the go ecosystem would pull all the needed sources and build them into an executable. This is where the issues start... In the Openembedded world, one would have  one provider (aka recipe) for each dependency each recipe comes with a (remote) artifact (e.g. tarball, git repo, a.s.o.) which can be archived (so one can build the same software at a later point in time without any online connectivity) dedicated license information all this information is pretty useful when working is an environment (aka company) that has restrictions, such as reproducible builds license compliance security compliance (for instance no unpatched CVE) but when us

Speedup python on embedded systems

Have you ever considered to use python as a scripting language in an embedded system? I've been using this on recent projects although it wasn't my first choice. If I had to choose a scripting language to be used in embedded I always had a strong preference for shell/bash or lua, because they are either builtin or designed to have a significant lower footprint compared to others. Nevertheless the choice was python3 (was out of my hands to decide). When putting together the first builds using YOCTO I realized that there are two sides to python. the starting phase, where the app is initializing the execution phase, where the app just processes new data In the 2nd phase python3 has good tradeoffs between maintainability of code vs. execution speed, so there is nothing to moan about. Startup is the worst But the 1st phase where the python3-interpreter is starting is really bad. So I did some research where is might be coming from. Just to give a comparison of