Size really matters

Let that title settle... and now we're getting back to a more serious issue :-).

The issue

When you're using bitbake layers you usually clone them on the fly when working in a cloud based setup, meaning a full clone of a repo, that could be highly expensive (just look at the size of the linux git repo for instance).
As cloud based setups mostly don't supply a good way to sync those resources, unless you invent something yourself or pay for it, every bit counts.
Not only as a matter of time but also as a matter of resulting cost.

The layer meta-sca I maintain, has grown over the time quite much, so it became very very huge.
Also because I made the mistake in the past to put large blobs (in this case tarballs) into the repository.
I learned that lesson but I cannot undo it, as we all know each published git revision should stay untouched for all eternity. Mainly this is because of the linked-list nature of git - if I change one commit at the bottom I will alter any commit that relies on it.

So bottom line - once published a revision will stay as it is forever.

Size matters

Which brings us to the question how can we reduce the size of a git clone for the above mentioned setup, without altering the history of the git repo.

There is basically just one option to choose: it's using shallow clones.
Shallow clone will just clone the repository at the given revision without much of history.
Basically those are like the tarball or zip downloads you know from Github or Gitlab - plain copies of the repository at the point in time of the given revision.

This concept is pretty well established and quite heavily used in various setups.

...or does technique?

BUT there is an alternative way - for instance for all the tools and workflows that don't support shallow clones.

Luckily I use tags (aka releases or versions) in my repository.

After various people asked me how to reduce the size of meta-sca, I came up with the following

get each release of meta-sca
create a diff between each release and apply as a single commit to a different repository

This strips all the "noise" between releases and builds up a fairly small repo one can use for the above mentioned cloud based setup.

and now enter the nerdy stuff...

And as I like to automate stuff, I coded a script which does exactly that.
This script will loop over all branches and tags and extracts the diffs between releases.
In addition it removes stuff that isn't needed in just a "minified" environment, like extensive documentation, CI scripts, testing scripts a.s.o.

The result is astonishing

As I encouraged anyone to use release version only, it's pretty clear what to choose in your CI/CD cloud setup to safe some money and time.

You can find the resulting repo at https://github.com/priv-kweihmann/meta-sca-minified.
And let me know how it's working for you

Bit-baking with soda

Dieses Blog durchsuchen