Running assembly instructions

shell:
  beforeInstall:
  - <bash command>
  install:
  - <bash command>
  beforeSetup:
  - <bash command>
  setup:
  - <bash command>
  cacheVersion: <arbitrary string>
  beforeInstallCacheVersion: <arbitrary string>
  installCacheVersion: <arbitrary string>
  beforeSetupCacheVersion: <arbitrary string>
  setupCacheVersion: <arbitrary string>

ansible:
  beforeInstall:
  - <task>
  install:
  - <task>
  beforeSetup:
  - <task>
  setup:
  - <task>
  cacheVersion: <arbitrary string>
  beforeInstallCacheVersion: <arbitrary string>
  installCacheVersion: <arbitrary string>
  beforeSetupCacheVersion: <arbitrary string>
  setupCacheVersion: <arbitrary string>

Running assembly instructions with git

What are user stages?

User stage is a stage containing assembly instructions from config. Currently, there are two kinds of assembly instructions: shell and ansible. werf provides four user stages and executes them in the following order: beforeInstall, install, beforeSetup, and setup. You can create the specific docker layer by executing assembly instructions contained within the respective stage.

Motivation behind stages

Opinionated build structure

The user stages pattern is based on analysis of instructions for building real-life applications. It turns out that the categorization of assembly instructions into four user stages is perfectly enough for most applications. Grouping decreases the size of layers and speeds up the image building process.

Framework for a build process

The user stages pattern defines the structure of a building process, and thus, sets boundaries for a developer. It distinguishes favorably from Dockerfile’s unstructured instructions since the developer has a good understanding of what instructions have to be included in each stage.

Run an assembly instruction on git changes

The execution of a user stage can depend on changes of files in a git repository. werf supports local and remote git repositories. The user stage can be dependent on changes in several repositories. Various changes in the repository can lead to a rebuild of different user stages.

More build tools: shell, ansible, …

Shell is a familiar and well-known build tool. Ansible is newer, and you have to spend some time studying it.

If you need a prototype as soon as possible, then the shell is your choice — it works like a RUN instruction in Dockerfile. The configuration of Ansible is declarative, mostly idempotent, and well suited for long-term maintenance.

The stage execution is isolated in code, so implementing support for another tool is not so difficult.

Using user stages

werf provides four user stages where assembly instructions can be defined. werf does not impose any restrictions on assembly instructions. You can specify the same variety of instructions as for the RUN instruction in Dockerfile. At the same time, the categorization of assembly instructions is based on our experience with real applications. So, the following actions are enough for building the vast majority of applications:

install system packages
install system dependencies
install application dependencies
setup system applications
setup application

What is the best strategy to execute them? You might think the best way is to execute them one by one while caching interim results. On the other side, it is better not to mix instructions for these actions because of different file dependencies. The user stages pattern suggests the following strategy:

use the beforeInstall user stage for installing system packages
use the install user stage to install system and application dependencies
use the beforeSetup user stage to configure system parameters and install an application
use the setup user stage to configure an application

beforeInstall

This stage executes various instructions before installing an application. It is best suited for system applications that rarely change. At the same time, their installation process is very time-consuming. Also, at this stage, you can configure some system parameters that rarely change, such as setting a locale or a timezone, adding groups and users, etc. For example, you can install language distributions and build tools like PHP and composer, java and gradle, and so on, at this stage.

In practice, all these components rarely change, and the beforeInstall stage caches them for an extended period.

install

This stage is for installing an application. It is best suited for installing application dependencies and configuring some basic settings.

Instructions on this stage have access to application source codes, so you can install application dependencies using build tools (like composer, gradle, npm, etc.) that require a manifest file (e.g., pom.xml, Gruntfile) to work. A best practice is to make this stage dependent on changes in that manifest file.

beforeSetup

At this stage, you can prepare an application for tuning some parameters. It supports all kinds of compiling tasks: creating jars, creating executable files and dynamic libraries, creating web assets, uglification, and encryption. This stage is often made dependent on changes in the source code.

setup

This stage is intended for configuring application settings. The corresponding set of actions includes copying some profiles into /etc, copying configuration files to already-known locations, creating a file containing the application version. These actions should not be time-consuming since they will likely be executed on every commit.

custom strategy

Once again, no limitations are imposed on assembly instructions. The previous definitions of user stages are just suggestions arising from our experience with real-world applications. You can even use merely a single user stage or define your strategy for grouping assembly instructions and benefit from caching and git dependencies.

Syntax

There are two top-level builder directives for assembly instructions that are mutually exclusive: shell and ansible. You can build an image either via shell instructions or via their ansible counterparts.

The builder directive includes four directives that define assembly instructions for each user stage:

beforeInstall
install
beforeSetup
setup

Builder directives can also contain cacheVersion directives that, in essence, are user-defined parts of user-stage signatures. The detailed information is available in the CacheVersion section.

Shell

Here is the syntax for user stages containing shell assembly instructions:

shell:
  beforeInstall:
  - <bash_command 1>
  - <bash_command 2>
  ...
  - <bash_command N>
  install:
  - bash command
  ...
  beforeSetup:
  - bash command
  ...
  setup:
  - bash command
  ...
  cacheVersion: <version>
  beforeInstallCacheVersion: <version>
  installCacheVersion: <version>
  beforeSetupCacheVersion: <version>
  setupCacheVersion: <version>

Shell assembly instructions are made up of arrays. Each array consists of bash commands for the related user stage. Commands for each stage are executed as a single RUN instruction in Dockerfile. Thus, werf creates one layer for each user stage.

werf provides distribution-agnostic bash binary, so you do not need a bash binary in the base image.

beforeInstall:
- apt-get update
- apt-get install -y build-essential g++ libcurl4

werf performs user stage commands as follows:

generate the temporary script on host machine

  #!/.werf/stapel/embedded/bin/bash -e

  apt-get update
  apt-get install -y build-essential g++ libcurl4

mount it to the corresponding user stage assembly container as /.werf/shell/script.sh, and
run the script.

The bash binary is stored in a stapel volume. You can find additional information about the concept in the blog post [RU] (dappdeps was later renamed to stapel; nevertheless, the principle is the same)

Ansible

Here is the syntax for user stages containing ansible assembly instructions:

ansible:
  beforeInstall:
  - <ansible task 1>
  - <ansible task 2>
  ...
  - <ansible task N>
  install:
  - ansible task
  ...
  beforeSetup:
  - ansible task
  ...
  setup:
  - ansible task
  ...
  cacheVersion: <version>
  beforeInstallCacheVersion: <version>
  installCacheVersion: <version>
  beforeSetupCacheVersion: <version>
  setupCacheVersion: <version>

Ansible config and stage playbook

Ansible assembly instructions for user stage is a set of ansible tasks. To run this tasks via ansible-playbook command, werf mounts the directory structure presented below into the user stage assembly container:

/.werf/ansible-workdir
├── ansible.cfg
├── hosts
└── playbook.yml

ansible.cfg contains settings for ansible:

use local transport (transport = local)
werf’s stdout_callback method for better logging (stdout_callback = werf)
turn on the force_color mode (force_color = 1)
use sudo for privilege escalation (to avoid using become in ansible tasks)

hosts is an inventory file that contains the localhost as well as some ansible_* settings, e.g., the path to python in stapel.

playbook.yml is a playbook with all tasks from the specific user stage. Here is an example of werf.yaml that includes the install stage:

ansible:
  install:
  - debug: msg='Start install'
  - file: path=/etc mode=0777
  - copy:
      src: /bin/sh
      dest: /bin/sh.orig
  - apk:
      name: curl
      update_cache: yes
  ...

werf would generate the following playbook.yml for the install stage:

- hosts: all
  gather_facts: 'no'
  tasks:
  - debug: msg='Start install'  \
  - file: path=/etc mode=0777   |
  - copy:                        > these lines are copied from werf.yaml
      src: /bin/sh              |
      dest: /bin/sh.orig        |
  - apk:                        |
      name: curl                |
      update_cache: yes         /
  ...

werf plays the playbook of the user stage in the user stage assembly container via the playbook-ansible command:

$ export ANSIBLE_CONFIG="/.werf/ansible-workdir/ansible.cfg"
$ ansible-playbook /.werf/ansible-workdir/playbook.yml

ansible and python binaries/libraries are stored in a stapel volume. You can find more information about this concept in this blog post [RU] (dappdeps was later renamed to stapel; nevertheless, the principle is the same).

Supported modules

One of the ideas at the core of werf is idempotent builds. werf must generate the very same image every time if there are no changes. We solve this task by calculating a signature for stages. However, ansible’s modules are non-idempotent, meaning they produce different results even if the input parameters are the same. Thus, werf is unable to correctly calculate a signature in order to determine the need to rebuild stages. Because of that, werf currently supports a limited list of modules:

Command modules: command, shell, raw, script.
Crypto modules: openssl_certificate, and other.
Files modules: acl, archive, copy, stat, tempfile, and other.
Net Tools Modules: get_url, slurp, uri.
Packaging/Language modules: composer, gem, npm, pip, and other.
Packaging/OS modules: apt, apk, yum, and other.
System modules: user, group, getent, locale_gen, timezone, cron, and other.
Utilities modules: assert, debug, set_fact, wait_for.

An attempt to do a werf config with the module not in this list will lead to an error and a failed build. Feel free to report an issue if some module should be enabled.

Copying files

Git mappings are the preferred way of copying files into an image. werf cannot detect changes to files referred in the copy module. Currently, the only way to copy some external file into an image involves using the .Files.Get method of Go templates. This method returns the contents of the file as a string. Thus, the contents become a part of the user stage signature, and file changes lead to the rebuild of the user stage.

Here is an example of copying nginx.conf into an image:

ansible:
  install:
  - copy:
      content: |
{{ .Files.Get "/conf/etc/nginx.conf" | indent 8 }}
      dest: /etc/nginx/nginx.conf

werf renders that snippet as a go template and then transforms it into the following playbook.yml:

- hosts: all
  gather_facts: no
  tasks:
    install:
    - copy:
        content: |
          http {
            sendfile on;
            tcp_nopush on;
            tcp_nodelay on;
            ...

Jinja templates

Ansible supports Jinja templates in playbooks. However, Go templates and Jinja templates have the same delimiters: {{ and }}. Thus, you have to escape Jinja templates to use them. There are two possible solutions: you can escape {{ delimiters only or escape the whole Jinja expression.

Let’s take a look at the example. Say, you have the following ansible task:

- copy:
    src: {{item}}
    dest: /etc/nginx
    with_files:
    - /app/conf/etc/nginx.conf
    - /app/conf/etc/server.conf

In this case, the Jinja expression {{item}} must be escaped:

# escape {{ only
src: {{"{{"}} item }}

# escape the whole expression
src: {{`{{item}}`}}

Ansible complications

Only raw and command modules support Live stdout output. Other modules display contents of stdout and stderr streams after execution.
The apt module hangs the build process in some debian and ubuntu versions. The derived images are affected as well (issue #645).

Dependencies of user stages

werf features the ability to define dependencies for rebuilding the stage. As described in the stages reference, stages are built one by one, and the signature is calculated for each stage. Signatures have various dependencies. When dependencies change, the stage signature changes as well. As a result, werf rebuilds this stage and all the subsequent stages.

You can use these dependencies to shape the rebuilding process of user stages. Signatures of user stages (and, therefore, the rebuilding process) depend on:

changes in assembly instructions
changes of cacheVersion directives
changes in the git repository
changes in files being imported from artifacts

The first three dependencies are described below in more detail.

Dependency on changes in assembly instructions

The signature of the user stage depends on the rendered text of assembly instructions. Changes in assembly instructions for the user stage lead to the rebuilding of this stage. Say, you have the following shell-based assembly instructions:

shell:
  beforeInstall:
  - echo "Commands on the Before Install stage"
  install:
  - echo "Commands on the Install stage"
  beforeSetup:
  - echo "Commands on the Before Setup stage"
  setup:
  - echo "Commands on the Setup stage"

On the first build of this image, instructions for all four user stages will be executed. There is no git mapping in this config, so assembly instructions will never be executed on subsequent builds since signatures of user stages will be the same, and the build cache will remain valid.

Let us change assembly instructions for the install user stage:

shell:
  beforeInstall:
  - echo "Commands on the Before Install stage"
  install:
  - echo "Commands on the Install stage"
  - echo "Installing ..."
  beforeSetup:
  - echo "Commands on the Before Setup stage"
  setup:
  - echo "Commands on the Setup stage"

The signature of the install stage has changed, so werf build executes assembly instructions in the install stage and instructions defined in subsequent stages, i.e., beforeSetup and setup.

The stage signature may also change due to the use of environment variables and Go templates and that can lead to unforeseen rebuilds. For example:

shell:
  beforeInstall:
  - echo "Commands on the Before Install stage for {{ env "CI_COMMIT_SHA” }}"
  install:
  - echo "Commands on the Install stage"
  ...

The first build will calculate the signature of the beforeInstall stage:

echo "Commands on the Before Install stage for 0a8463e2ed7e7f1aa015f55a8e8730752206311b"

The signature of the beforeInstall stage will change with each subsequent commit:

echo "Commands on the Before Install stage for 36e907f8b6a639bd99b4ea812dae7a290e84df27"

In other words, the contents of assembly instructions will change with each subsequent commit because of the CI_COMMIT_SHA variable. Thus, such a configuration leads to the rebuild of the beforeInstall user stage on every commit.

Dependency on changes in the git repo

The git mapping reference states that there are gitArchive and gitLatestPatch stages. gitArchive runs after the beforeInstall user stage, and gitLatestPatch runs after the setup user stage if there are changes in the local git repository. Thus, in order to execute assembly instructions using the latest version of the source code, you can initiate the rebuilding of the beforeInstall stage (by changing cacheVersion or its instructions).

install, beforeSetup, and setup user stages also depend on changes in the git repository. In this case, a git patch is applied at the beginning of the user stage, and assembly instructions are executed using the latest version of the source code.

During the process of building an image, the source code is updated only at one of the stages; all subsequent stages are based on this stage and thus use the actualized files. The source files contained in the git repository are added with the first build during the gitArchive stage. All subsequent builds update source files during gitCache, gitLatestPatch stages, or during one of the following user stages: install, beforeSetup, setup.

This stage is pictured on the Calculation signature phase git files actualized on specific stage

You can specify the dependency of the user stage on changes in the git repository via the git.stageDependencies parameter. It has the following syntax:

git:
- ...
  stageDependencies:
    install:
    - <mask 1>
    ...
    - <mask N>
    beforeSetup:
    - <mask>
    ...
    setup:
    - <mask>

The git.stageDependencies parameter has 3 keys: install, beforeSetup and setup. Each key defines an array of masks for a single user stage. The user stage will be rebuilt if there are changes in the git repository that match one of the masks defined for the user stage.

For each user stage, werf creates a list of matching files and calculates a checksum over attributes and contents of each file. This checksum is a part of the stage signature. Thus, the signature changes in response to any changes in the repository, such as getting new attributes for the file, changing its contents, adding new matching file, deleting a matching file, etc.

git.stageDependencies masks work jointly with git.includePaths and git.excludePaths masks. Only files that match the includePaths filter and stageDependencies masks are considered suitable. Similarly, only files that do not match the excludePaths filter and stageDependencies masks are considered suitable by werf.

stageDependencies masks work similarly to includePaths and excludePaths filters. The mask defines a template for files and paths and may contain the following glob patterns:

* — matches any file. This pattern includes . and excludes /
** — matches directories recursively or files expansively
? — matches any single character. It is equivalent to /.{1}/ in regexp
[set] — matches any character within the set. It behaves exactly like character sets in regexp, including set negation ([^a-z])
\ — escapes the next metacharacter

Mask that starts with * is treated as an anchor name by the yaml parser. Thus, masks starting with * or ** patterns at the beginning must be surrounded by quotation marks:

# * at the beginning of mask, so use double quotation marks
- "*.rb"
# single quotation marks also work
- '**/*'
# no star at the beginning, no quotation marks are needed
- src/**/*.js

werf finds out whether files have been changed in the git repository by calculating checksums. It applies the following algorithm for the user stage and for each mask:

create a list of all files at the add path and apply the excludePaths and includePaths filters:
compare path of each file in the list to the mask using of glob patterns;
if some directory matches a mask, then all contents of this directory are considered matching recursively;
calculate the checksum of attributes and contents of all matching files.

These checksums are calculated at the beginning of the build process before any stage container is being run.

Example:

image: app
git:
- add: /src
  to: /app
  stageDependencies:
    beforeSetup:
    - "*"
shell:
  install:
  - echo "install stage"
  beforeSetup:
  - echo "beforeSetup stage"
  setup:
  - echo "setup stage"

The git mapping configuration in the above werf.yaml requires werf to transfer the contents of the /src directory of the local git repository to the /app directory of the image. During the first build, files are cached at the gitArchive stage, and assembly instructions for install and beforeSetup are executed. During the builds triggered by the subsequent commits that do not change he contents of the /src directory, werf does not execute assembly instructions. If there were changes in the /src directory because of some commit, then checksums of files matching the mask would change. As a result, werf would apply the git patch and rebuild all the existing stages beginning with beforeSetup, namely beforeSetup and setup. The git patch will be applied once during the beforeSetup stage.

Dependency on the CacheVersion value

There are situations when a user wants to rebuild all or just one user stage. This can be accomplished by changing cacheVersion or <user stage name>CacheVersion values.

The signature of the install user stage depends on the value of the installCacheVersion parameter. To rebuild the install user stage (and subsequent stages), you need to change the value of the installCacheVersion parameter.

Note that cacheVersion and beforeInstallCacheVersion directives have the same effect. Changing them triggers the rebuild of the beforeInstall stage and all subsequent stages.

Example. Universal image for multiple applications

An image containing shared system packages can be defined in a separate werf.yaml file. You can use the cacheVersion value for rebuilding this image to refresh packages’ versions.

image: ~
from: ubuntu:latest
shell:
  beforeInstallCacheVersion: 2
  beforeInstall:
  - apt update
  - apt install ...

You can use this image as a base for multiple applications if images from hub.docker.com do not quite suit your needs.

Example of using external dependencies

You can use CacheVersion directives jointly with go templates to define dependency of the user stage on files outside of the git tree.

image: ~
from: ubuntu:latest
shell:
  installCacheVersion: {{.Files.Get "some-library-latest.tar.gz" | sha256sum}}
  install:
  - tar zxf some-library-latest.tar.gz
  - <build application>

The build script can be used to download some-library-latest.tar.gz archive and then execute the werf build command. Any changes to the file trigger the rebuild of the install user stage and all the subsequent stages.