Overview

The number of images can grow rapidly, taking up more space in the container registry and thus leading to a significant increase in costs. To control the growth and keep it at an acceptable level, werf offers its cleanup approach. It takes into account the images used in Kubernetes as well as their relevance based on the Git history when deciding which images to delete.

The werf cleanup command is designed to run on a schedule. werf performs the (safe) cleanup according to the cleanup policies.

Most likely, the default cleanup policies will cover all your needs, and no additional configuration will be necessary.

Note that the cleanup does not free up space occupied by images in the container registry. werf only removes tags for irrelevant images (manifests). You will have to run the container registry garbage collector periodically to clean up the associated data.

The issue of cleaning up images in the container registry and our approach to addressing it are covered in detail in the article The problem of “smart” cleanup of container images and addressing it in werf

Automating the container registry cleanup

Perform the following steps to automate the removal of irrelevant images from the container registry:

  • Set werf cleanup to run periodically to remove the no-longer-relevant tags from the container registry.
  • Set garbage collector to run on intervals to free up space in the container registry.

Ignoring images that Kubernetes uses

werf connects to all Kubernetes clusters described in all configuration contexts of kubectl. It then collects image names for the following object types: pod, deployment, replicaset, statefulset, daemonset, job, cronjob, replicationcontroller.

The user can configure werf’s behavior using the following parameters (and related environment variables):

  • --kube-config, --kube-config-base64 set out the kubectl configuration (by default, the user-defined configuration at ~/.kube/config is used);
  • --kube-context scans a specific context;
  • --scan-context-namespace-only scans the namespace linked to a specific context (by default, all namespaces are scanned).

You can disable Kubernetes scanning using the following directive in werf.yaml:

cleanup:
  disableKubernetesBasedPolicy: true

As long as some object in the Kubernetes cluster uses an image, werf will never delete this image from the container registry. In other words, if you run some object in a Kubernetes cluster, werf will not delete its related images under any circumstances during the cleanup.

Ignoring freshly built images

When cleaning up, werf ignores images built during a specified time period (the default is 2 hours). If necessary, the period can be adjusted or the policy can be disabled altogether using the following directives in werf.yaml:

cleanup:
  disableBuiltWithinLastNHoursPolicy: false
  keepImagesBuiltWithinLastNHours: 2

Configuring Git history-based cleanup policies

The cleanup configuration consists of a set of policies called keepPolicies. They are used to select relevant images using the git history. Thus, during a cleanup, images not meeting the criteria of any policy are deleted.

Each policy consists of two parts:

  • references defines a set of references, git tags, or git branches to perform scanning on.
  • imagesPerReference defines the limit on the number of images for each reference contained in the set.

Each policy must be associated with a set of git tags (tag) or git branches (branch). You can specify a specific reference name or a specific group using golang regular expression syntax.

tag: v1.1.1  # or /^v.*$/
branch: main # or /^(main|production)$/

When scanning, werf searches for the provided set of git branches in the origin remote references, but in the configuration, the origin/ prefix is omitted in branch names.

You can limit the set of references on the basis of the date when the git tag was created or the activity in the git branch. The limit group of parameters allows the user to define flexible and efficient policies for various workflows.

- references:
    branch: /^features\/.*/
    limit:
      last: 10
      in: 168h
      operator: And

In the example above, werf selects no more than 10 latest branches that have the features/ prefix in the name and have shown any activity during the last week.

  • The last parameter allows you to select the last n references from the set defined in branch/tag. By default, there is no limit (-1).
  • The in parameter (see the documentation to learn more) allows you to select git tags that were created during the specified period, or git branches with activity within the period. It can also be used for a specific set of branch / tag.
  • The operator parameter defines the references resulting from the policy. They may satisfy both conditions or either of them (And is set by default).

When scanning references, the number of images is not limited by default. However, you can configure this behavior using the imagesPerReference set of parameters:

imagesPerReference:
  last: int
  in: string
  • The last parameter specifies the number of images for each reference. By default, there is one image (1).
  • The in parameter (refer to the documentation to learn more) defines the period for which to search for images.

In the case of git tags, werf checks the HEAD commit only; the value of last>1 does not make any sense and is invalid

Policy priority for a specific reference

When describing a group of policies, you have to move from the general to the particular. In other words, imagesPerReference for a specific reference will match the latest policy it falls under:

- references:
    branch: /.*/
  imagesPerReference:
    last: 1
- references:
    branch: master
  imagesPerReference:
    last: 5

In the above example, the master reference matches both policies. Thus, when scanning the branch, the last parameter will equal to 5.

Default policies

If there are no custom cleanup policies defined in werf.yaml, werf uses default policies configured as follows:

cleanup:
  keepPolicies:
  - references:
      tag: /.*/
      limit:
        last: 10
  - references:
      branch: /.*/
      limit:
        last: 10
        in: 168h
        operator: And
    imagesPerReference:
      last: 2
      in: 168h
      operator: And
  - references:  
      branch: /^(main|master|staging|production)$/
    imagesPerReference:
      last: 10

Let us examine each policy individually:

  1. Keep an image for the last 10 tags (by date of creation).
  2. Keep no more than two images published over the past week, for no more than 10 branches active over the past week.
  3. Keep the 10 latest images for main, staging, and production branches.

Disabling policies

If Git history-based cleanup is not needed, you can disable it in werf.yaml as follows:

cleanup:
  disableGitHistoryBasedPolicy: true

Features of working with different container registries

By default, werf uses the Docker Registry API for deleting tags. The user must be authenticated and have a sufficient set of permissions. If the Docker Registry API isn’t supported and tags are deleted using the native API, then some additional container registry-specific actions are required on the user’s part.

   
AWS ECR *ok
Azure CR *ok
Default ok
Docker Hub *ok
GCR ok
GitHub Packages *ok
GitLab Registry *ok
Harbor ok
JFrog Artifactory ok
Nexus ok
Quay ok
Yandex container registry ok
Selectel CRaaS ok

werf tries to automatically detect the type of container registry using the repository address provided (via the --repo option). The user can explicitly specify the container registry using the --repo-container-registry option or via the WERF_REPO_CONTAINER_REGISTRY environment variable.

AWS ECR

werf deletes tags using the AWS SDK. Therefore, before performing a cleanup, the user must do one of the following:

Azure CR

werf deletes tags using the Azure CLI. Therefore, before performing a cleanup, the user must do the following:

  • Install the Azure CLI (az).
  • Perform authorization (az login).

The user must be assigned to one of the following roles: Owner, Contributor, or AcrDelete (learn more about Azure CR roles and permissions)

Docker Hub

werf uses the Docker Hub API to delete tags, so you need to set either the token or the username/password pair to clean up the container registry.

You can use the following script to get a token:

HUB_USERNAME=username
HUB_PASSWORD=password
HUB_TOKEN=$(curl -s -H "Content-Type: application/json" -X POST -d '{"username": "'${HUB_USERNAME}'", "password": "'${HUB_PASSWORD}'"}' https://hub.docker.com/v2/users/login/ | jq -r .token)

You can’t use the personal access token as a token since the deletion of resources is only possible using the user’s primary credentials.

You can use the following options (or their respective environment variables) to set the said parameters:

  • --repo-docker-hub-token or
  • --repo-docker-hub-username and --repo-docker-hub-password.

GitHub Packages

When organizing CI/CD pipelines in GitHub Actions, we recommend using our set of actions to solve most of the challenges for you.

werf uses the GitHub API to delete tags, so you need to set the token with the appropriate scopes (read:packages, delete:packages) to clean up the container registry.

You can use the --repo-github-token option or the corresponding environment variable to define the token.

GitLab Registry

werf uses the GitLab container registry API or Docker Registry API (depending on the GitLab version) to delete tags.

The privileges of the temporary CI job token ($CI_JOB_TOKEN) are not sufficient to delete tags. Therefore, the user must create a dedicated token in the Access Token section, select api in the Scope section, and ensure the role of Maintainer or Owner is assigned before using it for authorization

Container registry’s garbage collector

Note that during the cleanup, werf only removes tags from the images (manifests) to be deleted. The container registry garbage collector (GC) is responsible for the actual deletion.

While the garbage collector is running, the container registry must be set to read-only mode or turned off completely. Otherwise, there is a good chance that the garbage collector will not respect the images published during the procedure and may corrupt them.

You can read more about the garbage collector and how to use it in the documentation for the garbage collector you are using. Here are links for some of the most popular ones: