We propose to divide the assembly process into steps. Every step corresponds to the intermediate image (like layers in Docker) with specific functions and assignments. In werf, we call every such step a stage. So the final image consists of a set of built stages. All stages are kept in a stages storage. You can view it as a building cache of an application, however, that isn’t a cache but merely a part of a building context.
Stages
Stages are steps in the assembly process. They act as building blocks for constructing images. A stage is built from a logically grouped set of config instructions. It takes into account the assembly conditions and rules. Each stage relates to a single Docker image.
The werf assembly process involves a sequential build of stages using the stage conveyor. A stage conveyor is an ordered sequence of conditions and rules for carrying out stages. werf uses different stage conveyors to assemble various types of images depending on their configuration.
The user only needs to write a correct configuration: werf performs the rest of the work with stages
For each stage at every build, werf calculates the unique identifier of the stage called stage signature. Each stage is assembled in the assembly container that is based on the previous stage and saved in the stages storage. The stage signature is used for tagging a stage (signature is the part of image tag) in the stages storage. werf does not build stages that already exist in the stages storage (similar to caching in Docker yet more complex).
The stage signature is calculated as the checksum of:
- checksum of stage dependencies;
- previous stage signature;
- git commit-id related with the previous stage (if previous stage is git-related).
Signature identifier of the stage represents content of the stage and depends on git history which lead to this content. There may be multiple built images for a single signature. Stage for different git branches can have the same signature, but werf will prevent cache of different git branches from being reused for totally different branches, see stage selection algorithm.
It means that the stage conveyor can be reduced to several stages or even to a single from stage.
Stage dependencies
Stage dependency is a piece of data that affects the stage signature. Stage dependency may be represented by:
- some file from a git repo with its contents;
- instructions to build stage defined in the
werf.yaml
; - the arbitrary string specified by the user in the
werf.yaml
; - and so on.
Most stage dependencies are specified in the werf.yaml
, others relate to a runtime.
The tables below illustrate dependencies of a Dockerfile image, a Stapel image, and a Stapel artifact stages dependencies.
Each row describes dependencies for a certain stage.
Left column contains a short description of dependencies, right column includes related werf.yaml
directives and contains relevant references for more information.
stage dockerfile
image: <image name... || ~>
dockerfile: <relative path>
context: <relative path>
target: <docker stage name>
args:
<build arg name>: <value>
addHost:
- <host:ip>
Stages storage
Stages storage contains the stages of the project. Stages can be stored in the Docker Repo or locally on a host machine.
Most commands use stages and require the reference to a specific stages storage defined by the --stages-storage
option or WERF_STAGES_STORAGE
environment variable.
There are 2 types of stages storage:
- Local stages storage. Uses local docker server runtime to store stages as docker-images. Local stages storage is selected by param
--stages-storage=:local
. This was the only supported choise for stages storage prior version v1.1.10. - Remote stages storage. Uses docker registry to store images. Remote stages storage is selected by param
--stages-storage=DOCKER_REPO_DOMAIN
, for example--stages-storage=registry.mycompany.com/web/frontend/stages
. NOTE Each project should specify unique docker repo domain, that used only by this project.
Stages will be named differently depending on local or remote stages storage is being used.
When docker registry is used as the stages storage for the project there is also a cache of local docker images on each host where werf is running. This cache is cleared by the werf itself or can be freely removed by other tools (such as docker rmi
).
It is recommended though to use docker registry as a stages storage, werf uses this mode with CI/CD systems by default.
Host requirements to use remote stages storage:
- Connection to docker registry.
- Connection to the Kubernetes cluster (used to synchronize multiple build/publish/deploy processes running from different machines, see more info below).
Note that all werf commands that need an access to the stages should specify the same stages storage. So if it is a local stages storage, then all commands should run from the same host. It is irrelevant on which host werf command is running as long as the same remote stages storage used for the commands like: build, publish, cleanup, deploy, etc.
Stage naming
Stages in the local stages storage are named using the following schema: werf-stages-storage/PROJECT_NAME:SIGNATURE-TIMESTAMP_MILLISEC
. For example:
werf-stages-storage/myproject 9f3a82975136d66d04ebcb9ce90b14428077099417b6c170e2ef2fef-1589786063772 274bd7e41dd9 16 seconds ago 65.4MB
werf-stages-storage/myproject 7a29ff1ba40e2f601d1f9ead88214d4429835c43a0efd440e052e068-1589786061907 e455d998a06e 18 seconds ago 65.4MB
werf-stages-storage/myproject 878f70c2034f41558e2e13f9d4e7d3c6127cdbee515812a44fef61b6-1589786056879 771f2c139561 23 seconds ago 65.4MB
werf-stages-storage/myproject 5e4cb0dcd255ac2963ec0905df3c8c8a9be64bbdfa57467aabeaeb91-1589786050923 699770c600e6 29 seconds ago 65.4MB
werf-stages-storage/myproject 14df0fe44a98f492b7b085055f6bc82ffc7a4fb55cd97d30331f0a93-1589786048987 54d5e60e052e 31 seconds ago 64.2MB
Stages in the remote stages storage are named using the following schema: DOCKER_REPO_ADDRESS:SIGNATURE-TIMESTAMP_MILLISEC
. For example:
localhost:5000/myproject-stages d4bf3e71015d1e757a8481536eeabda98f51f1891d68b539cc50753a-1589714365467 7c834f0ff026 20 hours ago 66.7MB
localhost:5000/myproject-stages e6073b8f03231e122fa3b7d3294ff69a5060c332c4395e7d0b3231e3-1589714362300 2fc39536332d 20 hours ago 66.7MB
localhost:5000/myproject-stages 20dcf519ff499da126ada17dbc1e09f98dd1d9aecb85a7fd917ccc96-1589714359522 f9815cec0867 20 hours ago 65.4MB
localhost:5000/myproject-stages 1dbdae9cc1c9d5d8d3721e32be5ed5542199def38ff6e28270581cdc-1589714352200 6a37070d1b46 20 hours ago 65.4MB
localhost:5000/myproject-stages f88cb5a1c353a8aed65d7ad797859b39d357b49a802a671d881bd3b6-1589714347985 5295f82d8796 20 hours ago 65.4MB
localhost:5000/myproject-stages 796e905d0cc975e718b3f8b3ea0199ea4d52668ecc12c4dbf85a136d-1589714344546 a02ec3540da5 20 hours ago 64.2MB
Signature identifier of the stage represents content of the stage and depends on git history which lead to this content.
TIMESTAMP_MILLISEC
is generated during stage saving procedure after stage built. It is guaranteed that timestamp will be unique within specified stages storage.
Stage selection
Werf stage selection algorithm is based on the git commits ancestry detection:
- Calculate a stage signature for some stage.
- There may be multiple stages in the stages storage by this signature — so select all suitable stages by the signature.
- If current stage is related to git (git-archive, user stage with git patch, git cache or git latest patch), then select only those stages which are related to the commit that is ancestor of current git commit.
- Select the oldest by the
TIMESTAMP_MILLISEC
from the remaining stages.
There may be multiple built images for a single signature. Stage for different git branches can have the same signature, but werf will prevent cache of different git branches from being reused for different branch.
Stage building and saving
If suitable stage has not been found by target signature during stage selection, werf starts building a new image for stage.
Note that multiple processes (on a single or multiple hosts) may start building the same stage at the same time. Werf uses optimistic locking when saving newly built image into the stages storage: when a new stage has been built werf locks stages storage and saves newly built stage image into storage stages cache only if there are no suitable already existing stages exists. Newly saved image will have a guaranteed unique identifier TIMESTAMP_MILLISEC
. In the case when already existing stage has been found in the stages storage werf will discard newly built image and use already existing one as a cache.
In other words: the first process which finishes the build (the fastest one) will have a chance to save newly built stage into the stages storage. The slow build process will not block faster processes from saving build results and building next stages.
To select stages and save new ones into the stages storage werf uses synchronization service components to coordinate multiple werf processes and store stages cache needed for werf builder.
Image stages signature
Stages signature of the image is a signature which represents content of the image and depends on the history of git commits which lead to this content.
Stages signature calculated similarly to the regular stage signature as the checksum of:
- stage signature of last non empty image stage;
- git commit-id related with the last non empty image stage (if this last stage is git-related).
The stage signature is calculated as the checksum of:
- checksum of stage dependencies;
- previous stage signature;
- git commit-id related with the previous stage (if previous stage is git-related).
This signature used in content based tagging and used to import files from artifacts or images (stages signature of artifact or image will affect imports stage signature of the target image).
Images
Image is a ready-to-use Docker image corresponding to a specific application state and tagging strategy.
As mentioned above, stages are steps in the assembly process. They act as building blocks for constructing images. Unlike images, stages are not intended for the direct use. The main difference between images and stages is in cleaning policies due to the stored meta-information. The process of cleaning up the stages storage is only based on the related images in the images repo.
werf creates images using the stages storage. Currently, images can only be created during the publishing process and saved in the images repo.
Images should be defined in the werf configuration file werf.yaml
.
To publish new images into the images repo werf uses synchronization service components to coordinate multiple werf processes. Only a single werf process can perform publishing of the same image at a time.
Synchronization: locks and stages storage cache
Synchronization is a group of service components of the werf to coordinate multiple werf processes when selecting and saving stages into stages storage and publishing images into images repo. There are 2 such synchronization components:
- Stages storage cache is an internal werf cache, which significantly improves performance of the werf invocations when stages already exists in the stages storage. Stages storage cache contains the mapping of stages existing in stages storage by the signature (or in other words this cache contains precalculated result of stages selection by signature algorithm). This cache should be coherent with stages storage itself and werf will automatically reset this cache automatically when detects an inconsistency between stages storage cache and stages storage.
- Lock manager. Locks are needed to organize correct publishing of new stages into stages-storage, publishing images into images-repo and for concurrent deploy processes that uses the same release name.
All commands that requires stages storage (--stages-storage
) and images repo (--images-repo
) params also use synchronization service components address, which defined by the --synchronization
option or WERF_SYNCHRONIZATION=...
environment variable.
There are 3 types of sycnhronization components:
- Local. Selected by
--synchronization=:local
param.- Local stages storage cache is stored in the
~/.werf/shared_context/storage/stages_storage_cache/1/PROJECT_NAME/SIGNATURE
files by default, each file contains a mapping of images existing in stages storage by some signature. - Local lock manager uses OS file-locks in the
~/.werf/service/locks
as implementation of locks.
- Local stages storage cache is stored in the
- Kubernetes. Selected by
--synchronization=kubernetes://NAMESPACE[:CONTEXT][@(base64:CONFIG_DATA)|CONFIG_PATH]
param.- Kubernetes stages storage cache is stored in the specified
NAMESPACE
in ConfigMap named by projectcm/PROJECT_NAME
. - Kubernetes lock manager uses ConfigMap named by project
cm/PROJECT_NAME
(the same as stages storage cache) to store distributed locks in the annotations. Lockgate library is used as implementation of distributed locks using kubernetes resource annotations.
- Kubernetes stages storage cache is stored in the specified
- Http. Selected by
--synchronization=http[s]://DOMAIN
param.- There is a public instance of synchronization server available at domain
https://synchronization.werf.io
. - Custom http synchronization server can be run with
werf synchronization
command.
- There is a public instance of synchronization server available at domain
Werf uses --synchronization=:local
(local stages storage cache and local lock manager) by default when local stages storage is used (--stages-storage=:local
).
Werf uses --synchronization=https://synchronization.werf.io
(http stages storage cache and http lock manager) by default when docker-registry is used as stages storage.
User may force arbitrary non-default address of synchronization service components if needed using explicit --synchronization=:local|(kubernetes://NAMESPACE[:CONTEXT][@(base64:CONFIG_DATA)|CONFIG_PATH])|(http[s]://DOMAIN)
param.
NOTE: Multiple werf processes working with the same project should use the same stages storage and synchronization.
Working with stages
Sync command
werf stages sync --from=:local|REPO --to=:local|REPO
- Command will copy only difference of stages from one stages-storage to another.
- Command will copy multiple stages in parallel.
- Command run result is idempotent: sync can be called multiple times, interrupted, then called again — the result will be the same. Stages that are already synced will not be synced again on subsequent sync calls.
- There are delete options:
--remove-source
and--cleanup-local-cache
, which control whether werf will delete synced stages from source stages-storage and whether werf will cleanup localhost from temporary docker images created during sync process. - This command can be used to download project stages-storage to the localhost for development purpose as well as backup and migrating purposes.
Switch-from-local command
werf stages switch-from-local --to=REPO
- Command will automatically sync existing stages from :local stages storage to the specified REPO.
- Command will block project from being used with
:local
stages-storage.- This means after werf stages switch-from-local is done, any werf command that specifies
:local
stages-storage for the project will fail preventing storing and using build results from different stages-storages. - Note that project is blocked after all existing stages has been synced.
- This means after werf stages switch-from-local is done, any werf command that specifies
See switching to distributed mode article for guided steps.
Further reading
Learn more about the build process of stapel and Dockerfile builders.