During building and publishing, werf creates sets of docker layers, however, it does not delete them. As a result, stages storage and images repo are growing steadily and consuming more and more space. What is more, any interrupted build process leaves stalled images. When a git branch or a git tag gets deleted, a set of stages that were built for this image remains in the images repo and stages storage. So, it is necessary to clean up the images repo and stages storage periodically. Otherwise, stale images will pile up.
werf has a built-in efficient multi-level image cleaning system. It supports the following cleaning methods:
Cleaning by policies
The cleaning by policies method allows you to organize the regular automatic cleanup of stuck images. It implies regular gradual cleaning according to cleaning policies. This is the safest method of cleaning because it does not affect your production environment.
The cleaning by policies method involves the following steps:
- Cleaning up the images repo deletes stale images in the images repo according to the cleaning policies.
- Cleaning up stages storage synchronizes the stages storage with the images repo.
These steps are combined in the single top-level command cleanup.
An images repo is the primary source of information about current and stale images. Therefore, it is essential to clean up the images repo first and only then proceed to the stages storage.
Cleaning up the images repo
werf can automate the cleaning of the images repo. It works according to special rules called cleanup policies. These policies determine which images will be deleted while leaving all others intact.
Git history-based cleanup algorithm
The end-image is the result of a building process. It can be associated with an arbitrary number of Docker tags. The end-image is linked to the werf internal identifier aka the stages digest.
The cleanup algorithm is based on the fact that the stages storage has information about commits related to publishing tags associated with a specific digest of image stages (and it does not matter whether an image in the Docker registry was added, modified, or stayed the same). This information includes bundles of commit + digest for a specific
image in the
Following the results of building and publishing of some commit, the end-image may stay unchanged. However, information about the publishing of a digest of image stages because of that commit will be added to the stages storage.
This ensures that the digest of image stages (of an arbitrary number of associated Docker tags) relates to the git history. Also, this opens up the possibility to effectively clean up outdated images based on the git state and chosen policies. The algorithm scans the git history, selects relevant images, and deletes those that do not fall under any policy. At the same time, the tags used in Kubernetes are ignored.
Let’s review the basic steps of the cleanup algorithm:
- Extracting the data required for a cleanup from the stages storage:
- Obtaining manifests for all tags.
- Preparing a list of items to clean up:
- tags used in Kubernetes are ignored.
- Preparing the data for scanning:
- tags grouped by the digest of image stages (1);
- commits grouped by the digest of image stages (2);
- a set of git tags and git branches, as well as the rules and crawl depth for scanning each reference based on user policies (3).
- Searching for commits (2) using the git history (3). The result is digests of image stages for which no associated commits were found during scanning (4). Deleting tags for digests of image stages (4).
The user can specify images that will not be deleted during a cleanup using
keepPolicies cleanup policies. If there is no configuration provided in the
werf.yaml, werf will use the default policy set.
It is worth noting that the algorithm scans the local state of the git repository. Therefore, it is essential to keep all git branches and git tags up-to-date. You can use the
--git-history-synchronization flag to synchronize the git state (it is enabled by default when running in CI systems).
Keeping the data in the stages storage to use when performing a cleanup
werf saves supplementary data to the stages storage to optimize its operation and solve some specific cases. This data includes meta-images with bundles consisting of a digest of image stages and a commit that was used for publishing. It also contains names of images that were ever built.
Information about commits is the only source of truth for the algorithm, so if tags lacking such information werf deletes them.
When performing an automatic cleanup, the
werf cleanup command is executed either on a schedule or manually. To avoid deleting the active cache when adding/deleting images in the
werf.yaml in neighboring git branches, you can add the name of the image being built to the stages storage during the build. The user can edit the so-called set of managed images using
werf managed-images ls|add|rm commands.
The image always remains in the images repo as long as the Kubernetes object that uses the image exists.
werf scans the following kinds of objects in the Kubernetes cluster:
The functionality can be disabled via the flag
Connecting to Kubernetes
werf uses the kube configuration file
~/.kube/config to learn about Kubernetes clusters and ways to connect to them. werf connects to all Kubernetes clusters defined in all contexts of the kubectl configuration to gather information about the images that are in use.
Cleaning up stages storage
Executing a stages storage cleanup command is necessary to synchronize the state of stages storage with the images repo. During this step, werf deletes stages that do not relate to images currently present in the images repo.
If the images cleanup command, — the first step of cleaning by policies, — is skipped, then the stages storage cleanup will not have any effect.
The manual cleaning approach assumes one-step cleaning with the complete removal of images from the stages storage or images repo. This method does not check whether the image is used by Kubernetes or not. Manual cleaning is not recommended for the scheduled usage (use cleaning by policies instead). In general, it is best suited for forceful image removal.
The manual cleaning approach includes the following options:
- The purge images repo command deletes images of the current project in the images repo.
- The purge stages storage command deletes stages of the current project in the stages storage.
These steps are combined in a single top-level command purge.
You can clean up the host machine with the following commands: