When running, werf saves data both to the container registry and on the host.

In the case of the container registry, it deletes all outdated images based on the cleanup policies and the fact that Kubernetes still use that particular image. In the case of the host, all data can be divided into two categories: cache – the temporary data generated by werf that are no longer needed, and local Docker stages – werf creates them if used without the container registry.

There are dedicated commands in werf for cleaning up the container registry and the host. In the case of the container registry, werf only cleans up data related to the specific project, while the host cleanup covers all projects at once.

  Single project All projects
Cleaning up outdated data in the container registry werf cleanup --repo REPO -
Complete cleanup of the container registry werf purge --repo REPO -
Cleaning up outdated data on the host werf host cleanup --project-name PROJECT* werf host cleanup
Complete cleanup of the host werf host purge --project-name PROJECT* werf host purge

, where * indicates partial functionality.

It is worth noting that:

  • You can safely clean up the outdated data at any time, manually or automatically, with no risk of losing critical data that are used in production.
  • Moreover, werf can automatically clean up the outdated data on the host as part of any werf command’s regular operation.
  • werf isn’t supposed to run the complete data cleanup automatically due to the possibility of losing data used in production. The total data cleanup should be run manually and by a knowledgeable person.

Cleaning up the container registry

Cleaning up outdated data

The werf cleanup command is designed to run on a schedule. werf performs the (safe) cleanup according to the specified cleanup policies.

The algorithm automatically selects the images to delete. It consists of the following steps:

  • Pulling the necessary data from the container registry.
  • Preparing a list of images to keep. werf leaves intact:
  • Deleting all the remaining images.

Images in Kubernetes

werf connects to all Kubernetes clusters described in all configuration contexts of kubectl. It then collects image names for the following object types: pod, deployment, replicaset, statefulset, daemonset, job, cronjob, replicationcontroller.

The user can configure werf’s behavior using the following parameters (and related environment variables):

  • --kube-config, --kube-config-base64 set out the kubectl configuration (by default, the user-defined configuration at ~/.kube/config is used);
  • --kube-context scans a specific context;
  • --scan-context-namespace-only scans the namespace linked to a specific context (by default, all namespaces are scanned).
  • --without-kube disables Kubernetes scanning.

As long as some object in the Kubernetes cluster uses an image, werf will never delete this image from the container registry. In other words, if you run some object in a Kubernetes cluster, werf will not delete its related images under any circumstances during the cleanup.

Scanning the git history

werf’s cleanup algorithm uses the fact that the container registry keeps the information about the commits on which the build is based (it does not matter if an image was added to the container registry or some changes were made to it). For each build, werf saves the information about the commit, stage digest, and the image name to the registry (for each image defined in werf.yaml).

Once the new commit triggers the build process, werf adds the information that the stage digest corresponds to a specific commit to the registry (even if the resulting image does not change).

This approach ensures the unbreakable bond between the stage digest and the git history. Also, it makes it possible to effectively clean up outdated images based on the git state and selected policies. The algorithm scans the git history and selects relevant images.

Information about commits is the only source of truth for the algorithm, so images lacking such information will be deleted.

User-defined policies

The user can specify images that will not be deleted during a cleanup using the so-called keepPolicies cleanup policies. If there is no configuration provided in the werf.yaml, werf will use the default policy set.

It is worth noting that the algorithm scans the local state of the git repository. Therefore, it is essential to keep all git branches and git tags up-to-date. By default, werf performs synchronization automatically (you can change its behavior using the gitWorktree.allowFetchOriginBranchesAndTags directive in werf.yaml).

Aspects of cleaning up the images that are being built

During the cleanup, werf applies user-defined policies to the set of images for each image defined in werf.yaml. The cleanup must respect all the images in use. On the other hand, the set of images based on the Git repository’s main branch may not cover all the suitable images (for example, images may be added to/deleted from some feature branch).

werf adds the name of the image being built to the container registry to avoid deleting images and stages in use. Such an approach frees werf from the strict set of image names defined in werf.yaml and forces it to take into account all the images ever used.

The werf managed-images ls|add|rm family of commands allows the user to edit the so-called managed images set and explicitly delete images that are no longer needed and can be removed entirely.

Complete cleanup

The werf purge command deletes all images from the container registry. It does not take into account if the images are being used in the Kubernetes cluster or not.

This command runs within the specific project and requires access to the project’s Git repository that contains werf.yaml.

Cleaning up the host

Cleaning up outdated data

The werf host cleanup command cleans up old, unused, outdated data and reduces the cache size for all projects on the host. It uses the space occupied and user settings as a guide.

The algorithm automatically decides which data to delete. It consists of the following steps:

  • Evaluating the space used on the volume where the local docker server data are located;
  • If the space used exceeds the threshold, werf calculates the amount of space that needs to be freed to get the percent of occupied space back below the threshold (with an extra 5%). By default, the threshold is 70% of the volume space, and you can configure it using the relevant parameter.
  • Next, the algorithm proceeds to delete the least recently used (LRU) data until the percent of occupied space goes back below the threshold. By default, the threshold is 70% and an extra 5% as a reserve (you can configure it using the relevant parameter).

What data can be deleted:

  • Git archives in the local werf cache: ~/.werf/local_cache/git_archives.
  • Git patches in the local werf cache: ~/.werf/local_cache/git_patches.
  • Git repositories in the local werf cache: ~/.werf/local_cache/git_repos.
  • Git worktree in the local werf cache: ~/.werf/local_cache/git_worktrees.
  • All docker images that were built by version v1.2 and exist on the local docker server.
  • Docker images that were built by version v1.1 and are stored in --stages-storage=REPO.
    • The algorithm cannot delete images created by version v1.1 and stored in --stages-storage=:local since this is the primary storage that keeps stages that can be used in production and other environments.

Note that the algorithm of the werf host cleanup command separately processes the volume where the local werf cache is stored (~/.werf/local_cache) and the volume where the local docker server data are stored (usually at /var/lib/docker). If werf cannot find the directory where the data of the local docker server are stored, you can specify the appropriate path explicitly via the --docker-server-storage-path=/var/lib/docker parameter (or via the WERF_DOCKER_SERVER_STORAGE_PATH environment variable).

By default, werf can automatically clean up the outdated host data as part of any werf command’s regular operation. That is why you do not need to invoke the werf host cleanup manually or via cron. However, the user can disable auto-cleaning of outdated host data using the --disable-auto-host-cleanup parameter (or the respective WERF_DISABLE_AUTO_HOST_CLEANUP environment variable). In this case, we recommend adding the werf host cleanup command to the list of cron jobs, e.g., as follows:

# /etc/cron.d/werf-host-cleanup
SHELL=/bin/bash
*/30 * * * * gitlab-runner source ~/.profile ; source $(trdl use werf 1.2 ea) ; werf host cleanup

By default, without additional parameters, the werf host cleanup command cleans up the data of all projects on the host. If invoked with the --project-name PROJECT parameter, the command can only clean up images on the local docker server. In this mode, the support for the command is partial.

Complete cleanup

The werf host purge command has two running modes: it can delete all the data of a single project or clean up all projects en masse.

CAUTION! By default, if no additional parameters are specified, werf host purge would completely destroy all werf traces on the host: images, stages, cache, and other data (service folders, temporary files) for all projects. This command provides the maximum level of cleaning.

If the --project-name PROJECT parameter is set, the command will delete images present on the local docker server related to the PROJECT. In this mode, the command is partially functional: werf will not delete images on the local docker server associated with the remote image storage in the container registry (e.g., local images left from running werf converge --repo REPO). You can use the werf host cleanup command (that cleans up all the outdated host data) to clean up these images.