During building and publishing, werf creates sets of docker layers, however, it does not delete them. As a result, stages storage and images repo are growing steadily and consuming more and more space. What is more, any interrupted build process leaves stalled images. When a git branch or a git tag gets deleted, a set of stages that were built for this image remains in the images repo and stages storage. So, it is necessary to clean up the images repo and stages storage periodically. Otherwise, stale images will pile up.

werf has a built-in efficient multi-level image cleaning system. It supports the following cleaning methods:

  1. Cleaning by policies
  2. Manual cleaning
  3. Host cleaning

Cleaning by policies

The cleaning by policies method allows you to organize the regular automatic cleanup of stuck images. It implies regular gradual cleaning according to cleaning policies. This is the safest method of cleaning because it does not affect your production environment.

The cleaning by policies method involves the following steps:

  1. Cleaning up the images repo deletes stale images in the images repo according to the cleaning policies.
  2. Cleaning up stages storage synchronizes the stages storage with the images repo.

These steps are combined in the single top-level command cleanup.

An images repo is the primary source of information about current and stale images. Therefore, it is essential to clean up the images repo first and only then proceed to the stages storage.

Cleaning up the images repo

werf can automate the cleaning of the images repo. It works according to special rules called cleanup policies. These policies determine which images will be deleted while leaving all others intact.

Tagging scheme-based cleanup algorithm

In versions prior to v1.2, the cleanup policy is based on tagging schemes. The git history-based cleanup algorithm is going to be the only one available starting with version v1.2. You can use the --git-history-based-cleanup flag to forcefully enable the new algorithm.

Cleanup policies

  • by branches:
    • Each new commit updates an image for the corresponding git branch (in other words, there is the only one docker tag for each published git branch).
    • werf deletes an image from the images repo if the corresponding git branch does not exist. The image is never removed as long as the corresponding git branch exists.
    • The policy covers images tagged by werf with the --tag-git-branch option.
  • by commits:
    • werf deletes an image from the images repo if the corresponding git commit does not exist.
    • For the remaining images, the following policies apply:
      • git-commit-strategy-expiry-days. Keep published images in the images repo for the specified maximum number of days since the image was published. The republished image will be kept for the specified maximum number of days since the new publication date. No days limit is set by default; -1 disables the limit. The value can be specified by --git-commit-strategy-expiry-days option or $WERF_GIT_COMMIT_STRATEGY_EXPIRY_DAYS.
      • git-commit-strategy-limit. Keep the specified max number of published images in the images repo. No limit is set by default; -1 disables the limit. Value can be specified by --git-commit-strategy-limit or $WERF_GIT_COMMIT_STRATEGY_LIMIT.
    • The policy covers images tagged by werf with the --tag-git-commit flag.
  • by tags:
    • werf deletes an image from the images repo when the corresponding git tag does not exist.
    • For the remaining images, the following policies apply:
      • git-tag-strategy-expiry-days. Keep published images in the images repo for the specified maximum number of days since the image was published. The republished image will be kept for the specified maximum number of days since the new publication date. No days limit is set by default; -1 disables the limit. Value can be specified by --git-tag-strategy-expiry-days option or $WERF_GIT_TAG_STRATEGY_EXPIRY_DAYS.
      • git-tag-strategy-limit. Keep the specified max number of published images in the images repo. No limit is set by default; -1 disables the limit. Value can be specified by --git-tag-strategy-limit or $WERF_GIT_TAG_STRATEGY_LIMIT.
    • The policy covers images tagged by werf with --tag-git-tag flag.

Please note that the cleanup affects only images built and published by werf with one of the following arguments: --tag-git-branch, --tag-git-tag or --tag-git-commit. All other images in the images repo stay intact.

Git history-based cleanup algorithm

In versions prior to v1.2, the cleanup policy is based on tagging schemes. The git history-based cleanup algorithm is going to be the only one available starting with version v1.2. You can use the --git-history-based-cleanup flag to forcefully enable the new algorithm.

The end-image is the result of a building process. It can be associated with an arbitrary number of Docker tags. The end-image is linked to the werf internal identifier aka the stages signature.

The cleanup algorithm is based on the fact that the stages storage has information about commits related to publishing tags associated with a specific signature of image stages (and it does not matter whether an image in the Docker registry was added, modified, or stayed the same). This information includes bundles of commit + signature for a specific image in the werf.yaml.

Following the results of building and publishing of some commit, the end-image may stay unchanged. However, information about the publishing of a signature of image stages because of that commit will be added to the stages storage.

This ensures that the signature of image stages (of an arbitrary number of associated Docker tags) relates to the git history. Also, this opens up the possibility to effectively clean up outdated images based on the git state and chosen policies. The algorithm scans the git history, selects relevant images, and deletes those that do not fall under any policy. At the same time, the tags used in Kubernetes are ignored.

Let’s review the basic steps of the cleanup algorithm:

Custom policies

The user can specify images that will not be deleted during a cleanup using keepPolicies cleanup policies. If there is no configuration provided in the werf.yaml, werf will use the default policy set.

It is worth noting that the algorithm scans the local state of the git repository. Therefore, it is essential to keep all git branches and git tags up-to-date. You can use the --git-history-synchronization flag to synchronize the git state (it is enabled by default when running in CI systems).

Keeping the data in the stages storage to use when performing a cleanup

werf saves supplementary data to the stages storage to optimize its operation and solve some specific cases. This data includes meta-images with bundles consisting of a signature of image stages and a commit that was used for publishing. It also contains names of images that were ever built.

Information about commits is the only source of truth for the algorithm, so tags lacking such information are processed separately. Tags assigned by versions lower than v1.1.20 are ignored (and they are deleted starting with version v1.2; you can force this behavior by using the --git-history-based-cleanup-v1.2 flag).

When performing an automatic cleanup, the werf cleanup command is executed either on a schedule or manually. To avoid deleting the active cache when adding/deleting images in the werf.yaml in neighboring git branches, you can add the name of the image being built to the stages storage during the build. The user can edit the so-called set of managed images using werf managed-images ls|add|rm commands.

Whitelisting images

The image always remains in the images repo as long as the Kubernetes object that uses the image exists. werf scans the following kinds of objects in the Kubernetes cluster: pod, deployment, replicaset, statefulset, daemonset, job, cronjob, replicationcontroller.

The functionality can be disabled via the flag --without-kube.

Connecting to Kubernetes

werf uses the kube configuration file ~/.kube/config to learn about Kubernetes clusters and ways to connect to them. werf connects to all Kubernetes clusters defined in all contexts of the kubectl configuration to gather information about the images that are in use.

Cleaning up stages storage

Executing a stages storage cleanup command is necessary to synchronize the state of stages storage with the images repo. During this step, werf deletes stages that do not relate to images currently present in the images repo.

If the images cleanup command, — the first step of cleaning by policies, — is skipped, then the stages storage cleanup will not have any effect.

Manual cleaning

The manual cleaning approach assumes one-step cleaning with the complete removal of images from the stages storage or images repo. This method does not check whether the image is used by Kubernetes or not. Manual cleaning is not recommended for the scheduled usage (use cleaning by policies instead). In general, it is best suited for forceful image removal.

The manual cleaning approach includes the following options:

These steps are combined in a single top-level command purge.

Host cleaning

You can clean up the host machine with the following commands: