Overview
The number of images can grow rapidly, taking up more space in the container registry and thus leading to a significant increase in costs. To control the growth and keep it at an acceptable level, werf offers its cleanup approach. It takes into account the images used in Kubernetes as well as their relevance based on the Git history when deciding which images to delete.
The werf cleanup command is designed to run on a schedule. werf performs the (safe) cleanup according to the cleanup policies.
Most likely, the default cleanup policies will cover all your needs, and no additional configuration will be necessary.
Note that the cleanup does not free up space occupied by images in the container registry. werf only removes tags for irrelevant images (manifests). You will have to run the container registry garbage collector periodically to clean up the associated data.
The issue of cleaning up images in the container registry and our approach to addressing it are covered in detail in the article The problem of “smart” cleanup of container images and addressing it in werf
Automating the container registry cleanup
Perform the following steps to automate the removal of irrelevant images from the container registry:
- Set werf cleanup to run periodically to remove the no-longer-relevant tags from the container registry.
- Set garbage collector to run on intervals to free up space in the container registry.
Ignoring images that Kubernetes uses
werf connects to all Kubernetes clusters described in all configuration contexts of kubectl. It then collects image names for the following object types: pod
, deployment
, replicaset
, statefulset
, daemonset
, job
, cronjob
, replicationcontroller
.
The user can configure werf’s behavior using the following parameters (and related environment variables):
--kube-config
,--kube-config-base64
set out the kubectl configuration (by default, the user-defined configuration at~/.kube/config
is used);--kube-context
scans a specific context;--scan-context-namespace-only
scans the namespace linked to a specific context (by default, all namespaces are scanned).
You can disable Kubernetes scanning using the following directive in werf.yaml:
cleanup:
disableKubernetesBasedPolicy: true
As long as some object in the Kubernetes cluster uses an image, werf will never delete this image from the container registry. In other words, if you run some object in a Kubernetes cluster, werf will not delete its related images under any circumstances during the cleanup.
Ignoring freshly built images
When cleaning up, werf ignores images built during a specified time period (the default is 2 hours). If necessary, the period can be adjusted or the policy can be disabled altogether using the following directives in werf.yaml
:
cleanup:
disableBuiltWithinLastNHoursPolicy: false
keepImagesBuiltWithinLastNHours: 2
Configuring Git history-based cleanup policies
The cleanup configuration consists of a set of policies called keepPolicies
. They are used to select relevant images using the git history. Thus, during a cleanup, images not meeting the criteria of any policy are deleted.
Each policy consists of two parts:
references
defines a set of references, git tags, or git branches to perform scanning on.imagesPerReference
defines the limit on the number of images for each reference contained in the set.
Each policy must be associated with a set of git tags (tag
) or git branches (branch
). You can specify a specific reference name or a specific group using golang regular expression syntax.
tag: v1.1.1 # or /^v.*$/
branch: main # or /^(main|production)$/
When scanning, werf searches for the provided set of git branches in the origin remote references, but in the configuration, the
origin/
prefix is omitted in branch names.
You can limit the set of references on the basis of the date when the git tag was created or the activity in the git branch. The limit
group of parameters allows the user to define flexible and efficient policies for various workflows.
- references:
branch: /^features\/.*/
limit:
last: 10
in: 168h
operator: And
In the example above, werf selects no more than 10 latest branches that have the features/
prefix in the name and have shown any activity during the last week.
- The
last
parameter allows you to select the lastn
references from the set defined inbranch
/tag
. By default, there is no limit (-1
). - The
in
parameter (see the documentation to learn more) allows you to select git tags that were created during the specified period, or git branches with activity within the period. It can also be used for a specific set ofbranch
/tag
. - The
operator
parameter defines the references resulting from the policy. They may satisfy both conditions or either of them (And
is set by default).
When scanning references, the number of images is not limited by default. However, you can configure this behavior using the imagesPerReference
set of parameters:
imagesPerReference:
last: int
in: string
- The
last
parameter specifies the number of images for each reference. By default, there is one image (1
). - The
in
parameter (refer to the documentation to learn more) defines the period for which to search for images.
In the case of git tags, werf checks the HEAD commit only; the value of
last
>1 does not make any sense and is invalid
Policy priority for a specific reference
When describing a group of policies, you have to move from the general to the particular. In other words, imagesPerReference
for a specific reference will match the latest policy it falls under:
- references:
branch: /.*/
imagesPerReference:
last: 1
- references:
branch: master
imagesPerReference:
last: 5
In the above example, the master reference matches both policies. Thus, when scanning the branch, the last
parameter will equal to 5.
Default policies
If there are no custom cleanup policies defined in werf.yaml
, werf uses default policies configured as follows:
cleanup:
keepPolicies:
- references:
tag: /.*/
limit:
last: 10
- references:
branch: /.*/
limit:
last: 10
in: 168h
operator: And
imagesPerReference:
last: 2
in: 168h
operator: And
- references:
branch: /^(main|master|staging|production)$/
imagesPerReference:
last: 10
Let us examine each policy individually:
- Keep an image for the last 10 tags (by date of creation).
- Keep no more than two images published over the past week, for no more than 10 branches active over the past week.
- Keep the 10 latest images for main, staging, and production branches.
Disabling policies
If Git history-based cleanup is not needed, you can disable it in werf.yaml
as follows:
cleanup:
disableGitHistoryBasedPolicy: true
Features of working with different container registries
By default, werf uses the Docker Registry API for deleting tags. The user must be authenticated and have a sufficient set of permissions. If the Docker Registry API isn’t supported and tags are deleted using the native API, then some additional container registry-specific actions are required on the user’s part.
AWS ECR | *ok |
Azure CR | *ok |
Default | ok |
Docker Hub | *ok |
GCR | ok |
GitHub Packages | *ok |
GitLab Registry | *ok |
Harbor | ok |
JFrog Artifactory | ok |
Nexus | ok |
Quay | ok |
Yandex container registry | ok |
Selectel CRaaS | ok |
werf tries to automatically detect the type of container registry using the repository address provided (via the --repo
option). The user can explicitly specify the container registry using the --repo-container-registry
option or via the WERF_REPO_CONTAINER_REGISTRY
environment variable.
AWS ECR
werf deletes tags using the AWS SDK. Therefore, before performing a cleanup, the user must do one of the following:
- Install and configure the AWS CLI (
aws configure
), or - Set
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables.
Azure CR
werf deletes tags using the Azure CLI. Therefore, before performing a cleanup, the user must do the following:
- Install the Azure CLI (
az
). - Perform authorization (
az login
).
The user must be assigned to one of the following roles:
Owner
,Contributor
, orAcrDelete
(learn more about Azure CR roles and permissions)
Docker Hub
werf uses the Docker Hub API to delete tags, so you need to set either the token or the username/password pair to clean up the container registry.
You can use the following script to get a token:
HUB_USERNAME=username
HUB_PASSWORD=password
HUB_TOKEN=$(curl -s -H "Content-Type: application/json" -X POST -d '{"username": "'${HUB_USERNAME}'", "password": "'${HUB_PASSWORD}'"}' https://hub.docker.com/v2/users/login/ | jq -r .token)
You can’t use the personal access token as a token since the deletion of resources is only possible using the user’s primary credentials.
You can use the following options (or their respective environment variables) to set the said parameters:
--repo-docker-hub-token
or--repo-docker-hub-username
and--repo-docker-hub-password
.
GitHub Packages
When organizing CI/CD pipelines in GitHub Actions, we recommend using our set of actions to solve most of the challenges for you.
werf uses the GitHub API to delete tags, so you need to set the token with the appropriate scopes (read:packages
, delete:packages
) to clean up the container registry.
You can use the --repo-github-token
option or the corresponding environment variable to define the token.
GitLab Registry
werf uses the GitLab container registry API or Docker Registry API (depending on the GitLab version) to delete tags.
The privileges of the temporary CI job token ($CI_JOB_TOKEN) are not sufficient to delete tags. Therefore, the user must create a dedicated token in the Access Token section, select api in the Scope section, and ensure the role of Maintainer or Owner is assigned before using it for authorization
Container registry’s garbage collector
Note that during the cleanup, werf only removes tags from the images (manifests) to be deleted. The container registry garbage collector (GC) is responsible for the actual deletion.
While the garbage collector is running, the container registry must be set to read-only mode or turned off completely. Otherwise, there is a good chance that the garbage collector will not respect the images published during the procedure and may corrupt them.
You can read more about the garbage collector and how to use it in the documentation for the garbage collector you are using. Here are links for some of the most popular ones: