Skip to content

Latest commit

 

History

History
589 lines (508 loc) · 33.5 KB

README.adoc

File metadata and controls

589 lines (508 loc) · 33.5 KB

Roles Good Practices for Ansible

Note
this section has been rewritten, using OASIS metastandards repository as a starting place. If you have anything to add or review, please comment.

Role design considerations

Basic design

Details
Explanations

Design roles focused on the functionality provided, not the software implementation.

Rationale

Try to design roles focused on the functionality, not on the software implementation behind it. This will help abstracting differences between different providers, and help the user to focus on the functionality, not on technical details.

Examples

For an example, designing a role to implement an NTP configuration on a server would be a role. The role internally would have the logic to decide whether to use ntpd, chronyd, and the ntp site configurations. However, when the underlying implementations become too divergent, for example implementing an email server with postfix or sendmail, then separate roles are encouraged.

Design roles to accomplish a specific, guaranteed outcome and limit the scope of the role to that outcome. This will help abstracting differences between different providers (see above), and help the user to focus on the functionality, not on technical details.

  • Design the interface focused on the functionality, not on the software implementation behind it.

    Details
    Explanations

    Limit the consumer’s need to understand specific implementation details about a collection, the role names within, and the file names. Presenting the collection as a low-code/no-code "automation application" provides the developer flexibility as the content grows and matures, and limits change the consumer may have to make in a later version.

    Examples
    • In the first example, the mycollection.run role has been designed to be an entry-point for the execution of multiple roles. mycollection.run provides a standard interface for the user, without exposing the details of the underlying automations. The implementation details of thing_1 and thing_2 may be changed by the developer without impacting the user as long as the interface of mycollection.run does not change.

    Do this:
    - hosts: all
      gather_facts: false
      tasks:
      - name: Perform several actions
        include_role: mycollection.run
        vars:
          actions:
          - name: thing_1
            vars:
              thing_1_var_1: 1
              thing_1_var_2: 2
          - name: thing_2
            vars:
              thing_2_var_1: 1
              thing_2_var_2: 2
    • In this example, the user must maintain awareness of the structure of the thing_1 and thing_2 roles, and the order of operations necessary for these roles to be used together. If the implementation of these roles is changed, the user will need to modify their playbook. .Don’t do this:

    - hosts: all
      gather_facts: false
      tasks:
      - name: Do thing_1
        include_role:
          name: mycollection.thing_1
          tasks_from: thing_1.yaml
          vars:
            thing_1_var_1: 1
            thing_1_var_2: 2
      - name: Do thing_2
        include_role:
          name: mycollection.thing_2
          tasks_from: thing_2.yaml
          vars:
            thing_2_var_1: 1
            thing_2_var_2: 2
  • Place content common to multiple roles in a single reusable role, "common" is a typical name for this role when roles are packaged in a collection. Author loosely coupled, hierarchical content.

    Details
    Explanations

    Roles that have hard dependencies on external roles or variables have limited flexibility and increased risk that changes to the dependency will result in unexpected behavior or failures. Coupling describes the degree of dependency between roles and variables that need to act in coordination. Hierarchical content is an architectural approach to designing your content where individual roles have parent-child relationships in an overall tree structure

Role Structure

Details
Explanations

New roles should be initiated in line, with the skeleton directory, which has standard boilerplate code for a Galaxy-compatible Ansible role and some enforcement around these standards

Rationale

A consistent file tree structure will help drive consistency and reusability across the entire environment.

Role Distribution

Details
Explanations

Use semantic versioning for Git release tags. Use 0.y.z before the role is declared stable (interface-wise).

Rationale

There are some restrictions for Ansible Galaxy and Automation Hub. The versioning must be in strict X.Y.Z[ab][W] format, where X, Y, and Z are integers.

Naming parameters

Details
  • All defaults and all arguments to a role should have a name that begins with the role name to help avoid collision with other names. Avoid names like packages in favor of a name like foo_packages.

    Rationale

    Ansible has no namespaces, doing so reduces the potential for conflicts and makes clear what role a given variable belongs to.)

  • Same argument applies for modules provided in the roles, they also need a $ROLENAME_ prefix: foo_module. While they are usually implementation details and not intended for direct use in playbooks, the unfortunate fact is that importing a role makes them available to the rest of the playbook and therefore creates opportunities for name collisions.

  • Moreover, internal variables (those that are not expected to be set by users) are to be prefixed by two underscores: __foo_variable.

    Rationale

    role variables, registered variables, custom facts are usually intended to be local to the role, but in reality are not local to the role - as such a concept does not exist, and pollute the global namespace. Using the name of the role reduces the potential for name conflicts and using the underscores clearly marks the variables as internals and not part of the common interface. The two underscores convention has prior art in some popular roles like geerlingguy.ansible-role-apache). This includes variables set by set_fact and register, because they persist in the namespace after the role has finished!

  • Prefix all tags within a role with the role name or, alternatively, a "unique enough" but descriptive prefix.

  • Do not use dashes in role names. This will cause issues with collections.

Providers

Details

When there are multiple implementations of the same functionality, we call them “providers”. A role supporting multiple providers should have an input variable called $ROLENAME_provider. If this variable is not defined, the role should detect the currently running provider on the system, and respect it.

Rationale

users can be surprised if the role changes the provider if they are running one already. If there is no provider currently running, the role should select one according to the OS version.

Example

on RHEL 7, chrony should be selected as the provider of time synchronization, unless there is ntpd already running on the system, or user requests it specifically. Chrony should be chosen on RHEL 8 as well, because it is the only provider available.

The role should set a variable or custom fact called $ROLENAME_provider_os_default to the appropriate default value for the given OS version.

Rationale

users may want to set all their managed systems to a consistent state, regardless of the provider that has been used previously. Setting $ROLENAME_provider would achieve it, but is suboptimal, because it requires selecting the appropriate value by the user, and if the user has multiple system versions managed by a single playbook, a common value supported by all of them may not even exist. Moreover, after a major upgrade of their systems, it may force the users to change their playbooks to change their $ROLENAME_provider setting, if the previous value is not supported anymore. Exporting $ROLENAME_provider_os_default allows the users to set $ROLENAME_provider: "{{ $ROLENAME_provider_os_default }}" (thanks to the lazy variable evaluation in Ansible) and thus get a consistent setting for all the systems of the given OS version without having to decide what the actual value is - the decision is delegated to the role).

Distributions and Versions

Details
Explanations

Avoid testing for distribution and version in tasks. Rather add a variable file to "vars/" for each supported distribution and version with the variables that need to change according to the distribution and version.

Rationale

This way it is easy to add support to a new distribution by simply dropping a new file in to "vars/", see below Supporting multiple distributions and versions. See also Vars vs Defaults which mandates "Avoid embedding large lists or 'magic values' directly into the playbook." Since distribution-specific values are kind of "magic values", it applies to them. The same logic applies for providers: a role can load a provider-specific variable file, include a provider-specific task file, or both, as needed. Consider making paths to templates internal variables if you need different templates for different distributions.

Package roles in an Ansible collection to simplify distribution and consumption

Details
Rationale

Packaging roles as a collection allows you to distribute many roles in a single cohesive unit of re-usable automation. Inside a collection, you can share custom plugins across all roles in the collection instead of duplicating them in each role’s library/ directory. Collections give your roles a namespace, which removes the potential for naming collisions when developing new roles.

Example

See the Ansible documentation on migrating roles to collections for details.

Check Mode

Details
  • The role should work in check mode, meaning that first of all, they should not fail check mode, and they should also not report changes when there are no changes to be done. If it is not possible to support it, please state the fact and provide justification in the documentation. This applies to the first run of the role.

  • Reporting changes properly is related to the other requirement: idempotency. Roles should not perform changes when applied a second time to the same system with the same parameters, and it should not report that changes have been done if they have not been done. Due to this, using command: is problematic, as it always reports changes. Therefore, override the result by using changed_when:

  • Concerning check mode, one usual obstacle to supporting it are registered variables. If there is a task which registers a variable and this task does not get executed (e.g. because it is a command: or another task which is not properly idempotent), the variable will not get registered and further accesses to it will fail (or worse, use the previous value, if the role has been applied before in the play, because variables are global and there is no way to unregister them). To fix, either use a properly idempotent module to obtain the information (e.g. instead of using command: cat to read file into a registered variable, use slurp and apply .content|b64decode to the result like here), or apply proper check_mode: and changed_when: attributes to the task. more_info.

  • Another problem are commands that you need to execute to make changes. In check mode, you need to test for changes without actually applying them. If the command has some kind of "--dry-run" flag to enable executing without making actual changes, use it in check_mode (use the variable ansible_check_mode to determine whether we are in check mode). But you then need to set changed_when: according to the command status or output to indicate changes. See (https://github.com/linux-system-roles/selinux/pull/38/files#diff-2444ad0870f91f17ca6c2a5e96b26823L101) for an example.

  • Another problem is using commands that get installed during the install phase, which is skipped in check mode. This will make check mode fail if the role has not been executed before (and the packages are not there), but does the right thing if check mode is executed after normal mode.

  • To view reasoning for supporting why check mode in first execution may not be worthwhile: see here. If this is to be supported, see hhaniel’s proposal, which seems to properly guard even against such cases.

Idempotency

Details
Explanations

Reporting changes properly is related to the other requirement: idempotency. Roles should not perform changes when applied a second time to the same system with the same parameters, and it should not report that changes have been done if they have not been done. Due to this, using command: is problematic, as it always reports changes. Therefore, override the result by using changed_when:

Rationale

Additional automation or other integrations, such as with external ticketing systems, should rely on the idempotence of the ansible role to report changes accurately

Supporting multiple distributions and versions

Details
Use Cases
  • The role developer needs to be able to set role variables to different values depending on the OS platform and version. For example, if the name of a service is different between EL8 and EL9, or a config file location is different.

  • The role developer needs to handle the case where the user specifies gather_facts: false in the playbook.

  • The role developer needs to access the platform specific vars in role integration tests without making a copy.

Note
The recommended solution below requires at least some ansible_facts to be defined, and so relies on gathering some facts. If you just want to ensure the user always uses gather_facts: true, and do not want to handle this in the role, then the role documentation should state that gather_facts: true or setup: is required in order to use the role, and the role should use fail: with a descriptive error message if the necessary facts are not defined.

If it is desirable to use roles that require facts, but fact gathering is expensive, consider using a cache plugin List of Cache Plugins, and also consider running a periodic job on the controller to refresh the cache.

Platform specific variables

Details
Explanations

You normally use vars/main.yml (automatically included) to set variables used by your role. If some variables need to be parameterized according to distribution and version (name of packages, configuration file paths, names of services), use this in the beginning of your tasks/main.yml:

Examples
- name: Ensure ansible_facts used by role
  setup:
    gather_subset: min
  when: not ansible_facts.keys() | list |
    intersect(__rolename_required_facts) == __rolename_required_facts

- name: Set platform/version specific variables
  include_vars: "{{ __rolename_vars_file }}"
  loop:
    - "{{ ansible_facts['os_family'] }}.yml"
    - "{{ ansible_facts['distribution'] }}.yml"
    - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version'] }}.yml"
    - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"
  vars:
    __rolename_vars_file: "{{ role_path }}/vars/{{ item }}"
  when: __rolename_vars_file is file
  • Add this as the first task in tasks/main.yml:

    - name: Set platform/version specific variables
      include_tasks: tasks/set_vars.yml
  • Add files to vars/ for the required OS platforms and versions.

The files in the loop are in order from least specific to most specific:

  • os_family covers a group of closely related platforms (e.g. RedHat covers RHEL, CentOS, Fedora)

  • distribution (e.g. Fedora) is more specific than os_family

  • distribution_distribution_major_version (e.g. RedHat_8) is more specific than distribution

  • distribution_distribution_version (e.g. RedHat_8.3) is the most specific

See Commonly Used Facts for an explanation of the facts and their common values.

Each file in the loop list will allow you to add or override variables to specialize the values for platform and/or version. Using the when: item is file test means that you do not have to provide all of the vars/ files, only the ones you need. For example, if every platform except Fedora uses srv_name for the service name, you can define myrole_service: srv_name in vars/main.yml then define myrole_service: srv2_name in vars/Fedora.yml. In cases where this would lead to duplicate vars files for similar distributions (e.g. CentOS 7 and RHEL 7), use symlinks to avoid the duplication.

Note
With this setup, files can be loaded twice. For example, on Fedora, the distribution_major_version is the same as distribution_version so the file vars/Fedora_31.yml will be loaded twice if you are managing a Fedora 31 host. If distribution is RedHat then os_family will also be RedHat, and vars/RedHat.yml will be loaded twice. This is usually not a problem - you will be replacing the variable with the same value, and the performance hit is negligible. If this is a problem, construct the file list as a list variable, and filter the variable passed to loop using the unique filter (which preserves the order):
- name: Set vars file list
  set_fact:
    __rolename_vars_file_list:
      - "{{ ansible_facts['os_family'] }}.yml"
      - "{{ ansible_facts['distribution'] }}.yml"
      - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version'] }}.yml"
      - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"

- name: Set platform/version specific variables
  include_vars: "{{ __rolename_vars_file }}"
  loop: "{{ __rolename_vars_file_list | unique | list }}"
  vars:
    __rolename_vars_file: "{{ role_path }}/vars/{{ item }}"
  when: __rolename_vars_file is file

Or define your __rolename_vars_file_list in your vars/main.yml.

The task Ensure ansible_facts used by role handles the case where the user specifies gather_facts: false in the playbook. It gathers only the facts required by the role. The role developer may need to add additional facts to the list, and use a different gather_subset. See Setup Module for more information. Gathering facts can be expensive, so gather only the facts required by the role.

Using a separate task file for tasks/set_vars.yml allows role integration tests to access the internal variables. For example, if the role developer wants to pre-populate a VM with the packages used by the role, the following tasks can be used:

- hosts: all
  tasks:
    - name: Set platform/version specific variables
      include_role:
        name: my.fqcn.rolename
        tasks_from: set_vars.yml
        public: true

    - name: Install test packages
      package:
        name: "{{ __rolename_packages }}"
        state: present

In this way, the role developer does not have to copy and maintain a separate list of role packages.

Platform specific tasks

Details

Platform specific tasks, however, are different. You probably want to perform platform specific tasks once, for the most specific match. In that case, use lookup('first_found') with the file list in order of most specific to least specific, including a "default":

- name: Perform platform/version specific tasks
  include_tasks: "{{ lookup('first_found', __rolename_ff_params) }}"
  vars:
    __rolename_ff_params:
      files:
        - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"
        - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version'] }}.yml"
        - "{{ ansible_facts['distribution'] }}.yml"
        - "{{ ansible_facts['os_family'] }}.yml"
        - "default.yml"
      paths:
        - "{{ role_path }}/tasks/setup"

Then you would provide tasks/setup/default.yml to do the generic setup, and e.g. tasks/setup/Fedora.yml to do the Fedora specific setup. The tasks/setup/default.yml is required in order to use lookup('first_found'), which will give an error if no file is found.

If you want to have the "use first file found" semantics, but do not want to have to provide a default file, add skip: true:

- name: Perform platform/version specific tasks
  include_tasks: "{{ lookup('first_found', __rolename_ff_params) }}"
  vars:
    __rolename_ff_params:
      files:
        - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"
        - "{{ ansible_facts['os_family'] }}.yml"
      paths:
        - "{{ role_path }}/tasks/setup"
      skip: true

NOTE:

  • Use include_tasks or include_vars with lookup('first_found') instead of with_first_found. loop is not needed - the include forms take a string or a list directly.

  • Always specify the explicit, absolute path to the files to be included, using {{ role_path }}/vars or {{ role_path }}/tasks, when using these idioms. See below "Ansible Best Practices" for more information.

  • Use the ansible_facts['name'] bracket notation rather than the ansible_facts.name or ansible_name form. For example, use ansible_facts['distribution'] instead of ansible_distribution or ansible.distribution. The ansible_name form relies on fact injection, which can break if there is already a fact of that name. Also, the bracket notation is what is used in Ansible documentation such as Commonly Used Facts and Operating System and Distribution Variance.

Supporting multiple providers

Details

Use a task file per provider and include it from the main task file, like this example from storage:

- name: include the appropriate provider tasks
  include_tasks: "main_{{ storage_provider }}.yml"

The same process should be used for variables (not defaults, as defaults can not be loaded according to a variable). You should guarantee that a file exists for each provider supported, or use an explicit, absolute path using role_path. See below "Ansible Best Practices" for more information.

Generating files from templates

Details
  • Add {{ ansible_managed | comment }} at the top of the template file file to indicate that the file is managed by Ansible roles, while making sure that multi-line values are properly commented. For more information, see Adding comments to files.

  • When commenting, don’t include anything like "Last modified: {{ date }}". This would change the file at every application of the role, even if it doesn’t need to be changed for other reasons, and thus break proper change reporting.

  • Use standard module parameters for backups, keep it on unconditionally (backup: true), until there is a user request to have it configurable.

  • Make prominently clear in the HOWTO (at the top) what settings/configuration files are replaced by the role instead of just modified.

  • Use {{ role_path }}/subdir/ as the filename prefix when including files if the name has a variable in it.

    Rationale

    your role may be included by another role, and if you specify a relative path, the file could be found in the including role. For example, if you have something like include_vars: "{{ ansible_facts['distribution'] }}.yml" and you do not provide every possible vars/{{ ansible_facts['distribution'] }}.yml in your role, Ansible will look in the including role for this file. Instead, to ensure that only your role will be referenced, use include_vars: "{{role_path}}/vars/{{ ansible_facts['distribution'] }}.yml". Same with other file based includes such as include_tasks. See Ansible Developer Guide » Ansible architecture » The Ansible Search Path for more information.

Vars vs Defaults

Details
  • Avoid embedding large lists or "magic values" directly into the playbook. Such static lists should be placed into the vars/main.yml file and named appropriately

  • Every argument accepted from outside of the role should be given a default value in defaults/main.yml. This allows a single place for users to look to see what inputs are expected. Document these variables in the role’s README.md file copiously

  • Use the defaults/main.yml file in order to avoid use of the default Jinja2 filter within a playbook. Using the default filter is fine for optional keys on a dictionary, but the variable itself should be defined in defaults/main.yml so that it can have documentation written about it there and so that all arguments can easily be located and identified.

  • Don’t define defaults in defaults/main.yml if there is no meaningful default. It is better to have the role fail if the variable isn’t defined than have it do something dangerously wrong. Still do add the variable to defaults/main.yml but commented out, so that there is one single source of input variables.

  • Avoid giving default values in vars/main.yml as such values are very high in the precedence order and are difficult for users and consumers of a role to override.

  • As an example, if a role requires a large number of packages to install, but could also accept a list of additional packages, then the required packages should be placed in vars/main.yml with a name such as foo_packages, and the extra packages should be passed in a variable named foo_extra_packages, which should default to an empty array in defaults/main.yml and be documented as such.

Documentation conventions

Details
  • Use fully qualified role names in examples, like: linux-system-roles.$ROLENAME (with the Galaxy prefix).

  • Use RFC 5737, 7042 and 3849 addresses in examples.

Create a meaningful README file for every role

Details
Rationale

The documentation is essential for the success of the content. Place the README file in the root directory of the role. The README file exists to introduce the user to the purpose of the role and any important information on how to use it, such as credentials that are required.

At minimum include:

  • Example playbooks that allow users to understand how the developed content works are also part of the documentation.

  • The inbound and outbound role argument specifications

  • List of user-facing capabilities within the role

  • The unit of automation the role provides

  • The outcome of the role

  • The roll-back capabilities of the role

  • Designation of the role as idempotent (True/False)

  • Designation of the role as atomic if applicable (True/False)

Don’t use host group names or at least make them a parameter

Details
Explanations

It is relatively common to use (inventory) group names in roles:

  • either to loop through the hosts in the group, generally in a cluster context

  • or to validate that a host is in a specific group

    Instead, store the host name(s) in a (list) variable, or at least make the group name a parameter of your role. You can always set the variable at group level to avoid repetitions.

Rationale

Groups are a feature of the data in your inventory, meaning that you mingle data with code when you use those groups in your code. Rely on the inventory-parsing process to provide your code with the variables it needs instead of enforcing a specific structure of the inventory. Not all inventory sources are flexible enough to provide exactly the expected group name. Even more importantly, in a cluster context for example, if the group name is fixed, you can’t describe (and hence automate) more than one cluster in each inventory. You can’t possibly have multiple groups with the same name in the same inventory. On the other hand, variables can have any kind of value for each host, so that you can have as many clusters as you want.

Examples

Assuming we have the following inventory (not according to recommended practices for sake of simplicity):

An inventory with two clusters
link:dont_use_groups/inventory[role=include]

We can then use one of the following three approaches in our role (here as playbook, again for sake of simplicity):

A playbook showing how to loop through a group
link:dont_use_groups/playbook.yml[role=include]

The first approach is probably best to create a cluster configuration file listing all cluster’s hosts. The other approaches are good to make sure each action is performed only once, but this comes at the price of many skips. The second one fails if the first host isn’t reachable (which might be what you’d want anyway), and the last one has the best chance to be executed once and only once, even if some hosts aren’t available.

Tip
the variable cluster_group_name could have a default group name value in your role, of course properly documented, for simple use cases.

Overall, it is best to avoid this kind of constructs if the use case permits, as they are clumsy.

Prefix task names in sub-tasks files of roles

Details
Explanation

It is a common practice to have tasks/main.yml file including other tasks files, which we’ll call sub-tasks files. Make sure that the tasks' names in these sub-tasks files are prefixed with a shortcut reminding of the sub-tasks file’s name.

Rationale

Especially in a complex role with multiple (sub-)tasks file, it becomes difficult to understand which task belongs to which file. Adding a prefix, in combination with the role’s name automatically added by Ansible, makes it a lot easier to follow and troubleshoot a role play.

Examples

In a role with one tasks/main.yml task file, including tasks/sub.yml, the tasks in this last file would be named as follows:

A prefixed task in a sub-tasks file
- name: sub | Some task description
  mytask: [...]

The log output will then look something like TASK [myrole : sub | Some task description] **, which makes it very clear where the task is coming from.

Tip
with a verbosity of 2 or more, ansible-playbook will show the full path to the task file, but this generally means that you need to restart the play in a higher verbosity to get the information you could have had readily available.

Argument Validation

Details
Explanation

Starting from ansible version 2.11, an option is available to activate argument validation for roles by utilizing an argument specification. When this specification is established, a task is introduced at the onset of role execution to validate the parameters provided for the role according to the defined specification. If the parameters do not pass the validation, the role execution will terminate.

Rationale

Argument validation significantly contributes to the stability and reliability of the automation. It also makes the playbook using the role fail fast instead of failing later when an incorrect variable is utilized. By ensuring roles receive accurate input data and mitigating common issues, we can enhance the effectiveness of the Ansible playbooks using the roles.

Examples

The specification is defined in the meta/argument_specs.yml. For more details on how to write the specification, refer to https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_reuse_roles.html#specification-format.

Argument Specification file that validates the arguments provided to the role.
argument_specs:
  main:
    short_description: Role description.
    options:
      string_arg1:
        description: string argument description.
        type: "str"
        default: "x"
        choices: ["x", "y"]
      dict_arg1:
        description: dict argument description.
        type: dict
        required: True
        options:
          key1:
            description: key1 description.
            type: int
          key2:
            description: key2 description.
            type: str
          key3:
            description: key3 description.
            type: dict

References

Details

Links that contain additional standardization information that provide context, inspiration or contrast to the standards described above.