openQA starter guide

Introduction

openQA is an automated test tool that makes it possible to test the whole installation process of an operating system. It uses virtual machines to reproduce the process, check the output (both serial console and screen) in every step and send the necessary keystrokes and commands to proceed to the next. openQA can check whether the system can be installed, whether it works properly in 'live' mode, whether applications work or whether the system responds as expected to different installation options and commands.

Even more importantly, openQA can run several combinations of tests for every revision of the operating system, reporting the errors detected for each combination of hardware configuration, installation options and variant of the operating system.

openQA is free software released under the GPLv2 license. The source code and documentation are hosted in the os-autoinst organization on GitHub.

This document describes the general operation and usage of openQA. The main goal is to provide a general overview of the tool, with all the information needed to become a happy user.

For a quick start, if you already have an openQA instance available you can refer to the section Cloning existing jobs - openqa-clone-job directly to trigger a new test based on already existing job. For a quick installation refer directly to openQA quick bootstrap.

For the installation of openQA in general see the Installation Guide, as a user of an existing instance see the Users Guide. More advanced topics can be found in other documents. All documents are also available in the official repository.

Architecture

Although the project as a whole is referred to as openQA, there are in fact several components that are hosted in separate repositories as shown in the following figure.

openQA architecture
Figure 1. openQA architecture

The heart of the test engine is a standalone application called 'os-autoinst' (blue). In each execution, this application creates a virtual machine and uses it to run a set of test scripts (red). 'os-autoinst' generates a video, screenshots and a JSON file with detailed results.

'openQA' (green) on the other hand provides a web based user interface and infrastructure to run 'os-autoinst' in a distributed way. The web interface also provides a JSON based REST-like API for external scripting and for use by the worker program. Workers fetch data and input files from openQA for os-autoinst to run the tests. A host system can run several workers. The openQA web application takes care of distributing test jobs among workers. Web application and workers don’t have to run on the same machine but can be connected via network instead.

Note that the diagram shown above is simplified. There exists a more sophisticated version which is more complete and detailed. (The diagram can be edited via its underlying GraphML file.)

Basic concepts

Glossary

The following terms are used within the context of openQA
test modules

an individual test case in a single perl module file, e.g. "sshxterm". If not further specified a test module is denoted with its "short name" equivalent to the filename including the test definition. The "full name" is composed of the test group (TBC), which itself is formed by the top-folder of the test module file, and the short name, e.g. "x11-sshxterm" (for x11/sshxterm.pm)

test suite

a collection of test modules, e.g. "textmode". All test modules within one test suite are run serially

job

one run of individual test cases in a row denoted by a unique number for one instance of openQA, e.g. one installation with subsequent testing of applications within gnome

test run

equivalent to job

test result

the result of one job, e.g. "passed" with the details of each individual test module

test step

the execution of one test module within a job

distri

a test distribution but also sometimes referring to a product (CAUTION: ambiguous, historically a "GNU/Linux distribution"), composed of multiple test modules in a folder structure that compose test suites, e.g. "opensuse" (test distribution, short for "os-autoinst-distri-opensuse")

product

the main "system under test" (SUT), e.g. "openSUSE", also called "Medium Types" in the web interface of openQA

job group

equivalent to product, used in context of the webUI

version

one version of a product, don’t confuse with builds, e.g. "Tumbleweed"

flavor

a specific variant of a product to distinguish differing variants, e.g. "DVD"

arch

an architecture variant of a product, e.g. "x86_64"

machine

additional variant of machine, e.g. used for "64bit", "uefi", etc.

scenario

A composition of <distri>-<version>-<flavor>-<arch>-<test_suite>@<machine>, e.g. "openSUSE-Tumbleweed-DVD-x86_64-gnome@64bit", nicknamed koala

build

Different versions of a product as tested, can be considered a "sub-version" of version, e.g. "Build1234"; CAUTION: ambiguity: either with the prefix "Build" included or not

Jobs

One of the most important features of openQA is that it can be used to test several combinations of actions and configurations. For every one of those combinations, the system creates a virtual machine, performs certain steps and returns an overall result. Every one of those executions is called a 'job'. Every job is labeled with a numeric identifier and has several associated 'settings' that will drive its behavior.

A job goes through several states:

  • scheduled Initial state for recently created jobs. Queued for future execution.

  • running In progress.

  • cancelled The job was explicitly cancelled by the user or was replaced by a clone (see below).

  • done Execution finished.

Jobs in state 'done' have typically gone through a whole sequence of steps (called 'testmodules') each one with its own result. But in addition to those partial results, a finished job also provides an overall result from the following list.

  • none For jobs that have not reached the 'done' state.

  • passed No critical check failed during the process. It does not necessarily mean that all testmodules were successful or that no single assertion failed.

  • failed At least one assertion considered to be critical was not satisfied at some point.

  • softfailed At least one known, non-critical issue has been found. That could be that workaround needles are in place, a softfailure has been recorded explicitly via record_soft_failure (from os-autoinst) or a job failure has been ignored explicitly via a job label.

  • timeout_exceeded The job was aborted because MAX_JOB_TIME was exceeded, which is by default two hours.

  • skipped Dependencies failed so the job was not started.

  • obsoleted The job was superseded by scheduling a new product.

  • parallel_failed/parallel_restarted The job could not continue because a job which is supposed to run in parallel failed or was restarted.

  • user_cancelled/user_restarted The job was cancelled/restarted by the user.

  • incomplete The test execution failed due to an unexpected error, e.g. the network connection to the worker was lost.

Sometimes, the reason of a failure is not an error in the tested operating system itself, but an outdated test or a problem in the execution of the job for some external reason. In those situations, it makes sense to re-run a given job from the beginning once the problem is fixed or the tests have been updated. This is done by means of 'cloning'. Every job can be superseded by a clone which is scheduled to run with exactly the same settings as the original job. If the original job is still not in 'done' state, it’s cancelled immediately. From that point in time, the clone becomes the current version and the original job is considered outdated (and can be filtered in the listing) but its information and results (if any) are kept for future reference.

Needles

One of the main mechanisms for openQA to know the state of the virtual machine is checking the presence of some elements in the machine’s 'screen'. This is performed using fuzzy image matching between the screen and the so called 'needles'. A needle specifies both the elements to search for and a list of tags used to decide which needles should be used at any moment.

A needle consists of a full screenshot in PNG format and a json file with the same name (e.g. foo.png and foo.json) containing the associated data, like which areas inside the full screenshot are relevant or the mentioned list of tags.

{
   "area" : [
      {
         "xpos" : INTEGER,
         "ypos" : INTEGER,
         "width" : INTEGER,
         "height" : INTEGER,
         "type" : ( "match" | "ocr" | "exclude" ),
         "match" : INTEGER, // 0-100. similarity percentage
      },
      ...
   ],
   "tags" : [
      STRING, ...
   ]
}

Areas

There are three kinds of areas:

  • Regular areas define relevant parts of the screenshot. Those must match with at least the specified similarity percentage. Regular areas are displayed as green boxes in the needle editor and as green or red frames in the needle view (green for matching areas, red for non-matching ones).

  • OCR areas also define relevant parts of the screenshot. However, an OCR algorithm is used for matching. In the needle editor OCR areas are displayed as orange boxes. To turn a regular area into an OCR area within the needle editor, double click the concerning area twice. Note that such needles are only rarely used.

  • Exclude areas can be used to ignore parts of the reference picture. In the needle editor exclude areas are displayed as red boxes. To turn a regular area into an exclude area within the needle editor, double click the concerning area. In the needle view exclude areas are displayed as gray boxes.

Access management

Some actions in openQA require special privileges. openQA provides authentication through openID. By default, openQA is configured to use the openSUSE openID provider, but it can very easily be configured to use any other valid provider. Every time a new user logs into an instance, a new user profile is created. That profile only contains the openID identity and two flags used for access control:

  • operator Means that the user is able to manage jobs, performing actions like creating new jobs, cancelling them, etc.

  • admin Means that the user is able to manage users (granting or revoking operator and admin rights) as well as job templates and other related information (see the the corresponding section).

Many of the operations in an openQA instance are not performed through the web interface but using the REST-like API. The most obvious examples are the workers and the scripts that fetch new versions of the operating system and schedule the corresponding tests. Those clients must be authorized by an operator using an API key with an associated shared secret.

For that purpose, users with the operator flag have access in the web interface to a page that allows them to manage as many API keys as they may need. For every key, a secret is automatically generated. The user can then configure the workers or any other client application to use whatever pair of API key and secret owned by him. Any client to the REST-like API using one of those API keys will be considered to be acting on behalf of the associated user. So the API key not only has to be correct and valid (not expired), it also has to belong to a user with operator rights.

For more insights about authentication, authorization and the technical details of the openQA security model, refer to the detailed blog post about the subject by the openQA development team.

Job groups

A job can belong to a job group. Those job groups are displayed on the index page when there are recent test results in these job groups and in the Job Groups menu on the navigation bar. From there the job group overview pages can be accessed. Besides the test results the job group overview pages provide a description about the job group and allow commenting.

Job groups have properties. These properties are mostly cleanup related. The configuration can be done in the operators menu for job groups.

It is also possible to put job groups into categories. The nested groups will then inherit properties from the category. The categories are meant to combine job groups with common builds so test results for the same build can be shown together on the index page.

Cleanup

Important
openQA automatically deletes data that it considers "old" based on different settings. For example job data is deleted from old jobs by the gru task.

The following cleanup settings can be done on job-group-level:

size limit

Limits size of assets

keep logs for

Specifies how long logs of a non-important job are retained after it finished

keep important logs for

How long logs of an important job are retained after it finished

keep results for

specifies How long results of a non-important job are retained after it finished

keep important results for

How long results of an important job are retained after it finished

The defaults for those values are defined in lib/OpenQA/Schema/JobGroupDefaults.pm.

NOTE Deletion of job results includes deletion of logs and will cause the job to be completely removed from the database.

NOTE Jobs which do not belong to a job group are currently not affected by the mentioned cleanup properties.

Using the client script

Just as the worker uses an API key+secret every user of the client script must do the same. The same API key+secret as previously created can be used or a new one created over the webUI.

The personal configuration should be stored in a file ~/.config/openqa/client.conf in the same format as previously described for the client.conf, i.e. sections for each machine, e.g. localhost.

Testing openSUSE or Fedora

An easy way to start using openQA is to start testing openSUSE or Fedora as they have everything setup and prepared to ease the initial deployment. If you want to play deeper, you can configure the whole openQA manually from scratch, but this document should help you to get started faster.

Getting tests

First you need to get actual tests. You can get openSUSE tests and needles (the expected results) from GitHub. It belongs into the /var/lib/openqa/tests/opensuse directory. To make it easier, you can just run

/usr/share/openqa/script/fetchneedles

Which will download the tests to the correct location and will set the correct rights as well.

Fedora’s tests are also in git. To use them, you may do:

cd /var/lib/openqa/share/tests
mkdir fedora
cd fedora
git clone https://pagure.io/fedora-qa/os-autoinst-distri-fedora.git
./templates --clean
cd ..
chown -R geekotest fedora/

Getting openQA configuration

To get everything configured to actually run the tests, there are plenty of options to set in the admin interface. If you plan to test openSUSE Factory, using tests mentioned in the previous section, the easiest way to get started is the following command:

/var/lib/openqa/share/tests/opensuse/products/opensuse/templates [--apikey API_KEY] [--apisecret API_SECRET]

This will load some default settings that were used at some point of time in openSUSE production openQA. Therefore those should work reasonably well with openSUSE tests and needles. This script uses /usr/share/openqa/script/load_templates, consider reading its help page (--help) for documentation on possible extra arguments.

For Fedora, similarly, you can call:

/var/lib/openqa/share/tests/fedora/templates [--apikey API_KEY] [--apisecret API_SECRET]

Some Fedora tests require special hard disk images to be present in /var/lib/openqa/share/factory/hdd/fixed. The createhdds.py script in the createhdds repository can be used to create these. See the documentation in that repo for more information.

Adding a new ISO to test

To start testing a new ISO put it in /var/lib/openqa/share/factory/iso and call the following commands:

# Run the first test
openqa-client isos post \
         ISO=openSUSE-Factory-NET-x86_64-Build0053-Media.iso \
         DISTRI=opensuse \
         VERSION=Factory \
         FLAVOR=NET \
         ARCH=x86_64 \
         BUILD=0053

If your openQA is not running on port 80 on 'localhost', you can add option --host=http://otherhost:9526 to specify a different port or host.

Warning
Use only the ISO filename in the 'client' command. You must place the file in /var/lib/openqa/share/factory/iso. You cannot place the file elsewhere and specify its path in the command. However, openQA also supports a remote-download feature of assets from trusted domains.

For Fedora, a sample run might be:

# Run the first test
openqa-client isos post \
         ISO=Fedora-Everything-boot-x86_64-Rawhide-20160308.n.0.iso \
         DISTRI=fedora \
         VERSION=Rawhide \
         FLAVOR=Everything-boot-iso \
         ARCH=x86_64 \
         BUILD=Rawhide-20160308.n.0

More details on triggering tests can also be found in the Users Guide.

Pitfalls

Take a look at Documented Pitfalls.

openQA installation guide

Introduction

openQA is an automated test tool that makes it possible to test the whole installation process of an operating system. It is free software released under the GPLv2 license. The source code and documentation are hosted in the os-autoinst organization on GitHub.

This document provides the information needed to install and setup the tool, as well as information useful for everyday administration of the system. It is assumed that the reader is already familiar with the concepts of openQA and has already read the Getting Started Guide, also available at the official repository.

Continue with the “openQA quick bootstrap” to get a simple, ready-to-use installation, useful for a single user setup. Else, continue with the more advanced section about "`Custom installation - Repositories and procedure''.

openQA quick bootstrap

To quickly get a working openQA installation, you can use openQA-bootstrap.

Directly on your machine

This should work on openSUSE Leap and openSUSE Tumbleweed and will setup openQA on your machine.

zypper in openQA-bootstrap
/usr/share/openqa/script/openqa-bootstrap

If you happen to be using an old Leap 15.0 system which does not already have the openQA-bootstrap RPM in the repo you can simply download the openqa-bootstrap script - it will do the rest for you:

# get root
curl -s https://raw.githubusercontent.com/os-autoinst/openQA/master/script/openqa-bootstrap | bash -x

openQA-bootstrap supports to immediately clone an existing job simply by supplying openqa-clone-job parameters directly for a quickstart:

/usr/share/openqa/script/openqa-bootstrap -from openqa.opensuse.org 12345 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/kontact

The above command will bootstrap an openQA installation and immediately afterwards start a local test job clone from a test job from a remote instance with optional, overridden parameters. More information about openqa-clone-job can be found in Cloning existing jobs - openqa-clone-job.

openQA in a container

NOTE This method is not available on openSUSE Leap older than version 15.1.

You can also setup a systemd-nspawn container with openQA with the following commands. and you need to have no application listening on port 80 yet because the container will share the host system’s network stack.

zypper in openQA-bootstrap
/usr/share/openqa/script/openqa-bootstrap-container

systemd-run -tM openqa1 /bin/bash # start a shell in the container

Custom installation - Repositories and procedure

Keep in mind that there can be disruptive changes between openQA versions. You need to be sure that the webui and the worker that you are using have the same version number or, at least, are compatible.

For example, the packages distributed with older versions of openSUSE Leap are not compatible with the version on Tumbleweed. And the package distributed with Tumbleweed may not be compatible with the version in the development package.

Official repositories

The easiest way to install openQA is from distribution packages.

  • For openSUSE, packages are available for Leap and Tumbleweed

  • For Fedora, packages are available in the official repositories for Fedora 23 and later.

Development version repository

You can find the development version of openQA in OBS in the openQA:devel repository.

To add the development repository to your system, you can use these commands.

# openSUSE Tumbleweed
zypper ar -p 95 -f 'http://download.opensuse.org/repositories/devel:openQA/openSUSE_Tumbleweed' devel_openQA

# openSUSE Leap
zypper ar -p 95 -f 'http://download.opensuse.org/repositories/devel:openQA/openSUSE_Leap_$releasever' devel_openQA
zypper ar -p 90 -f 'http://download.opensuse.org/repositories/devel:openQA:Leap:$releasever/openSUSE_Leap_$releasever' devel_openQA_Leap
Note
If you installed openQA from the official repository first, you may need to change the vendor of the dependencies.
# openSUSE Tumbleweed and Leap
zypper dup --from devel_openQA --allow-vendor-change

# openSUSE Leap
zypper dup --from devel_openQA_Leap --allow-vendor-change

Installation

You can install the packages using these commands.

# openSUSE Leap 42.3+
zypper in openQA


# Fedora 23+
dnf install openqa openqa-httpd

Basic configuration

For a local instance setup you can simply execute the script:

/usr/share/openqa/script/configure-web-proxy

This will automatically setup a local apache http proxy. Read on for more detailed setup instructions with all the details.

Apache proxy

It is required to run openQA behind an http proxy (apache, nginx, etc..). See the openqa.conf.template config file in /etc/apache2/vhosts.d (openSUSE) or /etc/httpd/conf.d (Fedora). To make everything work correctly on openSUSE, you need to enable the 'headers', 'proxy', 'proxy_http', 'proxy_wstunnel' and 'rewrite' modules using the command 'a2enmod'. This is not necessary on Fedora.

# openSUSE Only
# You can check what modules are enabled by using 'a2enmod -l'
a2enmod headers
a2enmod proxy
a2enmod proxy_http
a2enmod proxy_wstunnel
a2enmod rewrite

For a basic setup, you can copy openqa.conf.template to openqa.conf and modify the ServerName if required setting. This will direct all HTTP traffic to openQA.

cp /etc/apache2/vhosts.d/openqa.conf.template /etc/apache2/vhosts.d/openqa.conf

TLS/SSL

By default openQA expects to be run with HTTPS. The openqa-ssl.conf.template Apache config file is available as a base for creating the Apache config; you can copy it to openqa-ssl.conf and uncomment any lines you like, then ensure a key and certificate are installed to the appropriate location (depending on distribution and whether you uncommented the lines for key and cert location in the config file). On openSUSE, you should also add SSL to the APACHE_SERVER_FLAGS so it looks like this in /etc/sysconfig/apache2:

APACHE_SERVER_FLAGS="SSL"

If you don’t have a TLS/SSL certificate for your host you must turn HTTPS off. You can do that in /etc/openqa/openqa.ini:

[openid]
httpsonly = 0

Database

Since version 4.5.1512500474.437cc1c7 of openQA, PostgreSQL is used as the database.

To configure access to the database in openQA, edit /etc/openqa/database.ini and change the settings in the [production] section.

The dsn value format technically depends on the database type and is documented for PostgreSQL at DBD::Pg

Example for connecting to local PostgreSQL database

[production]
dsn = dbi:Pg:dbname=openqa

Example for connecting to remote PostgreSQL database

[production]
dsn = dbi:Pg:dbname=openqa;host=db.example.org
user = openqa
password = somepassword

For older versions of openQA, you can migrate from SQLite to PostgreSQL according to DB migration from SQLite to PostgreSQL

User authentication

OpenQA supports three different authentication methods - OpenID (default), iChain and Fake. See auth section in /etc/openqa/openqa.ini.

[auth]
# method name is case sensitive!
method = OpenID|iChain|Fake

Independently of method used, the first user that logs in (if there is no admin yet) will automatically get administrator rights!

OpenID

By default openQA uses OpenID with opensuse.org as OpenID provider. OpenID method has its own openid section in /etc/openqa/openqa.ini:

[openid]
## base url for openid provider
provider = https://www.opensuse.org/openid/user/
## enforce redirect back to https
httpsonly = 1

OpenQA supports only OpenID version up to 2.0. Newer OpenID-Connect and OAuth is not supported currently.

iChain

Use only if you use iChain (NetIQ Access Manager) proxy on your hosting server.

Fake

For development purposes only! Fake authentication bypass any authentication and automatically allow any login requests as 'Demo user' with administrator privileges and without password. To ease worker testing, API key and secret is created (or updated) with validity of one day during login. You can then use following as /etc/openqa/client.conf:

[localhost]
key = 1234567890ABCDEF
secret = 1234567890ABCDEF

If you switch authentication method from Fake to any other, review your API keys! You may be vulnerable for up to a day until Fake API key expires.

Run the web UI

To start openQA and enable it to run on each boot call

systemctl enable --now postgresql
systemctl enable --now openqa-webui
systemctl enable --now openqa-scheduler
# openSUSE
systemctl restart apache2
# Fedora
# for now this is necessary to allow Apache to connect to openQA
setsebool -P httpd_can_network_connect 1
systemctl restart httpd

The openQA web UI should be available on http://localhost/ now. To simply start openQA without enabling it permanently one can simply use systemctl start instead.

Run workers

Workers are processes running virtual machines to perform the actual testing. They are distributed as a separate package and can be installed on multiple machines but still using only one WebUI.

# openSUSE
zypper in openQA-worker
# Fedora
dnf install openqa-worker

To allow workers to access your instance, you need to log into openQA as operator and create a pair of API key and secret. Once you are logged in, in the top right corner, is the user menu, follow the link 'manage API keys'. Click the 'create' button to generate key and secret. There is also a script available for creating an admin user and an API key+secret pair non-interactively, /usr/share/openqa/script/create_admin, which can be useful for scripted deployments of openQA. Copy and paste the key and secret into /etc/openqa/client.conf on the machine(s) where the worker is installed. Make sure to put in a section reflecting your webserver URL. In the simplest case, your client.conf may look like this:

[localhost]
key = 1234567890ABCDEF
secret = 1234567890ABCDEF

To start the workers you can use the provided systemd files via systemctl start openqa-worker@1. This will start worker number one. You can start as many workers as you dare, you just need to supply different 'worker id' (number after @).

You can also run workers manually from command line.

install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/X
sudo -u _openqa-worker /usr/share/openqa/script/worker --instance X

This will run a worker manually showing you debug output. If you haven’t installed 'os-autoinst' from packages make sure to pass --isotovideo option to point to the checkout dir where isotovideo is, not to /usr/lib! Otherwise it will have trouble finding its perl modules.

Where to now?

From this point on, you can refer to the Getting Started guide to fetch the tests cases and possibly take a look at Test Developer Guide

Advanced configuration

Setting up git support

Editing needles from web can optionally commit new or changed needles automatically to git. To do so, you need to enable git support by setting

[global]
scm = git

in /etc/openqa/openqa.ini. Once you do so and restart the web interface, openQA will automatically commit new needles to the git repository.

You may want to add some description to automatic commits coming from the web UI. You can do so by setting your configuration in the repository (/var/lib/os-autoinst/needles/.git/config) to some reasonable defaults such as:

[user]
	email = whatever@example.com
	name = openQA web UI

To enable automatic pushing of the repo as well, you need to add the following to your openqa.ini:

[scm git]
do_push = yes

Depending on your setup, you might need to generate and propagate ssh keys for user 'geekotest' to be able to push.

It might also be useful to rebase first. To enable that, add the remote to get the latest updates from and the branch to rebase against to your openqa.ini:

[scm git]
update_remote = origin
update_branch = origin/master

Referer settings to auto-mark important jobs

Automatic cleanup of old results (see GRU jobs) can sometimes render important tests useless. For example bug report with link to openQA job which no longer exists. Job can be manually marked as important to prevent quick cleanup or referer can be set so when job is accessed from particular web page (for example bugzilla), this job is automatically labeled as linked and treated as important.

List of recognized referers is space separated list configured in /etc/openqa/openqa.ini:

[global]
recognized_referers = bugzilla.suse.com bugzilla.opensuse.org

Worker settings

Default behavior for all workers is to use the 'Qemu' backend and connect to 'http://localhost'. If you want to change some of those options, you can do so in /etc/openqa/workers.ini. For example to point the workers to the FQDN of your host (needed if test cases need to access files of the host) use the following setting:

[global]
HOST = http://openqa.example.com

Once you got workers running they should show up in the admin section of openQA in the workers section as 'idle'. When you get so far, you have your own instance of openQA up and running and all that is left is to set up some tests.

Configuring remote workers

There are some additional requirements to get remote worker running. First is to ensure shared storage between openQA WebUI and workers. Directory /var/lib/openqa/share contains all required data and should be shared with read-write access across all nodes present in openQA cluster. This step is intentionally left on system administrator to choose proper shared storage for her specific needs.

Example of NFS configuration: NFS server is where openQA WebUI is running. Content of /etc/exports

/var/lib/openqa/share *(fsid=0,rw,no_root_squash,sync,no_subtree_check)

NFS clients are where openQA workers are running. Run following command:

mount -t nfs openQA-webUI-host:/var/lib/openqa/share /var/lib/openqa/share

Configuring AMQP message emission

You can configure openQA to send events (new comments, tests finished, …) to an AMQP message bus. The messages consist of a topic and a body. The body contains json encoded info about the event. See amqp_infra.md for more info about the server and the message topic format. There you will find instructions how to configure the AMQP server as well.

To let openQA send messages to an AMQP message bus, first make sure that the perl-Mojo-RabbitMQ-Client RPM is installed. Then you will need to configure amqp in /etc/openqa/openqa.ini:

# Configuration for AMQP plugin
[amqp]
heartbeat_timeout = 60
reconnect_timeout = 5
# guest/guest is the default anonymous user/pass for RabbitMQ
url = amqp://guest:guest@localhost:5672/
exchange = pubsub
topic_prefix = suse

For a TLS connection use amqps:// and port 5671.

Configuring worker to use more than one openQA server

When there are multiple openQA web interfaces (openQA instances) available a worker can be configured to register and accept jobs from all of them.

Requirements:

  • /etc/openqa/client.conf must contain API keys and secrets to all instances

  • Shared storage from all instances must be properly mounted

In the /etc/openqa/workers.ini enter space-separated instance hosts and optionally configure where the shared storage is mounted. Example:

[global]
HOST = openqa.opensuse.org openqa.fedora.fedoraproject.org

[openqa.opensuse.org]
SHARE_DIRECTORY = /var/lib/openqa/opensuse

[openqa.fedoraproject.org]
SHARE_DIRECTORY = /var/lib/openqa/fedora

Configuring SHARE_DIRECTORY is not a hard requirement. Worker will try following directories prior registering with openQA instance:

  1. SHARE_DIRECTORY

  2. /var/lib/openqa/$instance_host

  3. /var/lib/openqa/share

  4. /var/lib/openqa

  5. fail if none of above is available

Once worker registers to openQA instance it checks for available job and starts accepting websockets commands. Worker accepts jobs as they will come in, there is no priority, or other ordering, support at the moment. It is possible to mix local openQA instance with remote instances or use only remote instances.

Asset Caching

If your network is slow or you experience long time to load needles you might want to consider to enable caching in your remote workers. To enable caching, /var/lib/openqa/cache must exist, and right permissions given to the '_openqa-worker' user to read everything under this path. If you install openQA through the repositories, said directory will be created for you.

Start and enable the Cache Service:

systemctl enable --now openqa-worker-cacheservice

Enable and start the Cache Worker:

systemctl enable --now openqa-worker-cacheservice-minion

In the /etc/openqa/workers.ini

[global]
HOST=http://webui
CACHEDIRECTORY = $cache_location
CACHELIMIT = 50 # GB, default is 50.
CACHEWORKERS = 5 # Number of parallel cache minion workers, defaults to 5

[http://webui]
TESTPOOLSERVER = rsync://yourlocation/tests

Setup and run rsync server daemon on HOST machine, in /etc/rsyncd.conf should be:

gid = users
read only = true
use chroot = true
transfer logging = true
log format = %h %o %f %l %b
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
slp refresh = 300
use slp = false

#[Example]
#	path = /home/Example
#	comment = An Example
#	auth users = user
#	secrets file = /etc/rsyncd.secrets

[tests]
path = /var/lib/openqa/share/tests
comment = OpenQA Test Distributions

and

systemctl enable --now rsyncd

This will allow the workers to download the assets from the webUI and use them locally. If TESTPOOLSERVER is set tests and needles will also be cached by the worker.

Auditing - tracking openQA changes

Auditing plugin enables openQA administrators to maintain overview about what is happening with the system. Plugin records what event was triggered by whom, when and what the request looked like. Actions done by openQA workers are tracked under user whose API keys are workers using.

Audit log is directly accessible from Admin menu.

Auditing, by default enabled, can be disabled by global configuration option in /etc/openqa/openqa.ini:

[global]
audit_enabled = 0

The audit section of /etc/openqa/openqa.ini allows to exclude some events from logging using a space separated blacklist:

[audit]
blacklist = job_grab job_done

The audit/storage_duration section of /etc/openqa/openqa.ini allows to set the retention policy for different audit event types:

[audit/storage_duration]
startup = 10
jobgroup = 365
jobtemplate = 365
table = 365
iso = 60
user = 60
asset = 30
needle = 30
other = 15

In this example events of the type startup would be cleaned up after 10 days, events related to job groups after 365 days and so on. Events which do not fall into one of these categories would be cleaned after 15 days. By default, cleanup is disabled.

Use systemctl enable --now openqa-enqueue-audit-event-cleanup.timer to schedule the cleanup automatically every day. It is also possible to trigger the cleanup manually by invoking /usr/share/openqa/script/openqa minion job -e limit_audit_events.

List of events tracked by the auditing plugin

  • Assets:

    • asset_register asset_delete

  • Workers:

    • worker_register command_enqueue

  • Jobs:

    • iso_create iso_delete iso_cancel

    • jobtemplate_create jobtemplate_delete

    • job_create job_grab job_delete job_update_result job_done jobs_restart job_restart job_cancel job_duplicate

    • jobgroup_create jobgroup_connect

  • Tables:

    • table_create table_update table_delete

  • Users:

    • user_new_comment user_update_comment user_delete_comment user_login

  • Needles:

    • needle_delete needle_modify

Some of these events are very common and may clutter audit database. For this reason job_grab and job_done events are blacklisted by default.

Note
Upgrading openQA does not automatically update /etc/openqa/openqa.ini. Review your configuration after upgrade.

Filesystem layout

Tests, needles, assets, results and working directories (a.k.a. "pool directories") are located in certain subdirectories within /var/lib/openqa. This directory is configurable (see Customize base directory). Here we assume the default is in place.

Note that the sub directories within /var/lib/openqa must be accessible by the user that runs the openQA web UI (by default 'geekotest') or by the user that runs the worker/isotovideo (by default '_openqa-worker').

These are the most important sub directories within /var/lib/openqa:

  • db contains the web UI’s database lockfile

  • images is where the web UI stores test screenshots and thumbnails

  • testresults is where the web UI stores test logs and test-generated assets

  • webui is where the web UI stores miscellaneous files

  • pool contains working directories of the workers/isotovideo

  • share contains directories shared between the web UI and (remote) workers, can be owned by root

  • share/factory contains test assets and temp directory, can be owned by root but sysadmin must create subdirs

  • share/factory/iso and share/factory/iso/fixed contain ISOs for tests

  • share/factory/hdd and share/factory/hdd/fixed contain hard disk images for tests

  • share/factory/repo and share/factory/repo/fixed contain repositories for tests

  • share/factory/other and share/factory/other/fixed contain miscellaneous test assets (e.g. kernels and initrds)

  • share/factory/tmp is used as a temporary directory (openQA will create it if it owns share/factory)

  • share/tests contains the tests themselves

Each of the asset directories (factory/iso, factory/hdd, factory/repo and factory/other) may contain a fixed/ subdirectory, and assets of the same type may be placed in that directory. Placing an asset in the fixed/ subdirectory indicates that it should not be deleted to save space: the GRU task which removes old assets when the size of all assets for a given job group is above a specified size will ignore assets in the fixed/ subdirectories.

It also contains several symlinks which are necessary due to various things moving around over the course of openQA’s development. All the symlinks can of course be owned by root:

  • script (symlink to /usr/share/openqa/script/)

  • tests (symlink to share/tests)

  • factory (symlink to share/factory)

It is always best to use the canonical locations, not the compatibility symlinks - so run scripts from /usr/share/openqa/script, not /var/lib/openqa/script.

You only need the asset directories for the asset types you will actually use, e.g. if none of your tests refer to openQA-stored repositories, you will need no factory/repo directory. The distribution packages may not create all asset directories, so make sure the ones you need are created if necessary. Packages will likewise usually not contain any tests; you must create your own tests, or use existing tests for some distribution or other piece of software.

The worker needs to own /var/lib/openqa/pool/$INSTANCE, e.g.

  • /var/lib/openqa/pool/1

  • /var/lib/openqa/pool/2

  • …​ - add more if you have more worker instances

You can also give the whole pool directory to the _openqa-worker user and let the workers create their own instance directories.

Terms and variables for certain directories used by openQA and isotovideo

  • the "base directory"

    • by default /var/lib

    • configurable via environment variable OPENQA_BASEDIR

    • referred as $basedir within openQA

  • the "project directory"

    • defined as $basedir/openqa, by default /var/lib/openqa

    • referred as $prjdir within openQA

  • the "share directory": contains directories shared between web UI and (remote) workers

    • defined as $prjdir/share, by default /var/lib/openqa/share

    • referred as $sharedir within openQA

  • the "test case directory": contains a test distribution

    • by default $sharedir/tests/$distri or $sharedir/tests/$distri-$version

    • configurable via the test variable CASEDIR (see backend variables documentation)

    • this default is provided by openQA; when starting isotovideo manually the CASEDIR variable must be initialized by hand

    • might contain the sub directory lib for placing Perl modules used by the tests

  • the "product directory": contains the test schedule (main.pm) for a certain product within a test distribution

    • by default identical to the "test case directory"

    • usually a directory products/$distri within the "test case directory"

    • configurable via the test variable PRODUCTDIR (see backend variables documentation)

  • the "needles directory": contains reference images for a certain product within a test distribution

    • by default $PRODUCTDIR/needles

    • configurable via the test variable NEEDLES_DIR (see backend variables documentation)

Further notes

  • Setting the test variables has only an influence on os-autoinst. The web UI on the other hand always relies on the directory structure described above. For the exact details how these paths are computed by the web UI have a look at lib/OpenQA/Utils.pm.

  • When enabling the worker cache parts of the usual "share directory" are located in the specified cache directory on the worker host.

Troubleshooting

Tests fail quickly

Check the log files in /var/lib/openqa/testresults

KVM doesn’t work

  • make sure you have a machine with kvm support

  • make sure kvm_intel or kvm_amd modules are loaded

  • make sure you do have virtualization enabled in BIOS

  • make sure the '_openqa-worker' user can access /dev/kvm

  • make sure you are not already running other hypervisors such as VirtualBox

  • when running inside a vm make sure nested virtualization is enabled (pass nested=1 to your kvm module)

openid login times out

www.opensuse.org’s openid provider may have trouble with IPv6. openQA shows a message like this:

no_identity_server: Could not determine ID provider from URL.

To avoid that switch off IPv6 or add a special route that prevents the system from trying to use IPv6 with www.opensuse.org:

ip -6 r a to unreachable 2620:113:8044:66:130:57:66:6/128

openQA users guide

Introduction

This document provides additional information for use of the web interface or the REST API as well as administration information. For administrators it is recommend to have read the Installation Guide first to understand the structure of components as well as the configuration of an installed instance.

Using job templates to automate jobs creation

The problem

When testing an operating system, especially when doing continuous testing, there is always a certain combination of jobs, each one with its own settings, that needs to be run for every revision. Those combinations can be different for different 'flavors' of the same revision, like running a different set of jobs for each architecture or for the Full and the Lite versions. This combinational problem can go one step further if openQA is being used for different kinds of tests, like running some simple pre-integration tests for some snapshots combined with more comprehensive post-integration tests for release candidates.

This section describes how an instance of openQA can be configured using the options in the admin area to automatically create all the required jobs for each revision of your operating system that needs to be tested. If you are starting from scratch, you should probably go through the following order:

  1. Define machines in 'Machines' menu

  2. Define medium types (products) you have in 'Medium types' menu

  3. Specify various collections of tests you want to run in the 'Test suites' menu

  4. Define job groups in 'Job groups' menu for groups of tests

  5. Select individual 'Job groups' and decide what combinations make sense and need to be tested

Machines, mediums, test suites and job templates can all set various configuration variables. The so called job templates within the job groups define how the test suites, mediums and machines should be combined in various ways to produce individual 'jobs'. All the variables from the test suite, medium, machine and job template are combined and made available to the actual test code run by the 'job', along with variables specified as part of the job creation request. Certain variables also influence openQA’s and/or os-autoinst’s own behavior in terms of how it configures the environment for the job. Variables that influence os-autoinst’s behavior are documented in the file doc/backend_vars.asciidoc in the os-autoinst repository.

In openQA we can parameterize a test to describe for what product it will run and for what kind of machines it will be executed. For example, a test suite kde can be run for any product that has the KDE software stack installed, like openSUSE-DVD-x86_64 and openSUSE-NET-i586, and can be tested in different x86-64 and i586 machines like 64bit, 64bit_USBBoot, 32bit. In this example we could have the following test scenarios considering that the “x86_64” flavor is not compatible with the 32bit machine:

  • openSUSE-DVD-x86_64-kde-64bit

  • openSUSE-DVD-x86_64-kde-64bit_USBBoot

  • openSUSE-NET-i586-kde-64bit

  • openSUSE-NET-i586-kde-64bit_USBBoot

  • openSUSE-NET-i586-kde-32bit

For every test scenario we need to configure a different instance of the test backend, for example os-autoinst, with a different set of parameters.

Machines

You need to have at least one machine set up to be able to run any tests. Those machines represent virtual machine types that you want to test. To make tests actually happen, you have to have an 'openQA worker' connected that can fulfill those specifications.

  • Name. User defined string - only needed for operator to identify the machine configuration.

  • Backend. What backend should be used for this machine. Recommended value is qemu as it is the most tested one, but other options (such as kvm2usb or vbox) are also possible.

  • Variables Most machine variables influence os-autoinst’s behavior in terms of how the test machine is set up. A few important examples:

    • QEMUCPU can be 'qemu32' or 'qemu64' and specifies the architecture of the virtual CPU.

    • QEMUCPUS is an integer that specifies the number of cores you wish for.

    • LAPTOP if set to 1, QEMU will create a laptop profile.

    • USBBOOT when set to 1, the image will be loaded through an emulated USB stick.

Medium Types (products)

A medium type (product) in openQA is a simple description without any concrete meaning. It basically consists of a name and a set of variables that define or characterize this product in os-autoinst.

Some example variables used by openSUSE are:

  • ISO_MAXSIZE contains the maximum size of the product. There is a test that checks that the current size of the product is less or equal than this variable.

  • DVD if it is set to 1, this indicates that the medium is a DVD.

  • LIVECD if it is set to 1, this indicates that the medium is a live image (can be a CD or USB)

  • GNOME this variable, if it is set to 1, indicates that it is a GNOME only distribution.

  • PROMO marks the promotional product.

  • RESCUECD is set to 1 for rescue CD images.

Test Suites

A test suite consists of a name and a set of test variables that are used inside this particular test together with an optional description. The test variables can be used to parameterize the actual test code and influence the behaviour according to the settings.

Some sample variables used by openSUSE are:

  • BTRFS if set, the file system will be BtrFS.

  • DESKTOP possible values are 'kde' 'gnome' 'lxde' 'xfce' or 'textmode'. Used to indicate the desktop selected by the user during the test.

  • DOCRUN used for documentation tests.

  • DUALBOOT dual boot testing, needs HDD_1 and HDDVERSION.

  • ENCRYPT encrypt the home directory via YaST.

  • HDDVERSION used together with HDD_1 to set the operating system previously installed on the hard disk.

  • INSTALLONLY only basic installation.

  • INSTLANG installation language. Actually used only in documentation tests.

  • LIVETEST the test is on a live medium, do not install the distribution.

  • LVM select LVM volume manager.

  • NICEVIDEO used for rendering a result video for use in show rooms, skipping ugly and boring tests.

  • NOAUTOLOGIN unmark autologin in YaST

  • NUMDISKS total number of disks in QEMU.

  • REBOOTAFTERINSTALL if set to 1, will reboot after the installation.

  • SCREENSHOTINTERVAL used with NICEVIDEO to improve the video quality.

  • SPLITUSR a YaST configuration option.

  • TOGGLEHOME a YaST configuration option.

  • UPGRADE upgrade testing, need HDD_1 and HDDVERSION.

  • VIDEOMODE if the value is 'text', the installation will be done in text mode.

Some of the variables usually set in test suites that influence openQA and/or os-autoinst’s own behavior are:

  • HDDMODEL variable to set the HDD hardware model

  • HDDSIZEGB hard disk size in GB. Used together with BtrFS variable

  • HDD_1 path for the pre-created hard disk

  • RAIDLEVEL RAID configuration variable

  • QEMUVGA parameter to declare the video hardware configuration in QEMU

Job Groups

The job groups are the place where the actual test scenarios are defined by the selection of the medium type, the test suite and machine together with a priority.

The priority is used in the scheduler to choose the next job. If multiple jobs are scheduled and their requirements for running them are fulfilled the ones with a lower value for the priority are triggered. The id is the second sorting key: Of two jobs with equal requirements and same priority the one with lower id is triggered first.

Job groups themselves can be created over the web UI as well as the REST API. Job groups can optionally be nested into categories. The display order of job groups and categories can be configured by drag-and-drop in the web UI.

The scenario definitions within the job groups can be created and configured by different means:

  • A simple web UI wizard which is automatically shown for job groups when a new medium is added to the job group.

  • An intuitive table within the web UI for adding additional test scenarios to existing media including the possibility to configure the priority values.

  • The scripts openqa-load-templates and openqa-dump-templates to quickly dump and load the configuration from custom plain-text dump format files using the REST API.

  • Using declarative schedule definitions in the YAML format using REST API routes or an online-editor within the web UI including a syntax checker.

Variable expansion

Any variable defined in Test Suite, Machine, Product or Job Template table can refer to another variable using this syntax: %NAME%. When the test job is created, the string will be substituted with the value of the specified variable at that time.

For example this variable defined for Test Suite:

PUBLISH_HDD_1 = %DISTRI%-%VERSION%-%ARCH%-%DESKTOP%.qcow2

may be expanded to this job variable:

PUBLISH_HDD_1 = opensuse-13.1-i586-kde.qcow2

Variable precedence

It’s possible to define the same variable in multiple places that would all be used for a single job - for instance, you may have a variable defined in both a test suite and a product that appear in the same job template. The precedence order for variables is as follows (from lowest to highest):

  • Product

  • Machine

  • Test suite

  • Job template

  • API POST query parameters

That is, variable values set as part of the API request that triggers the jobs will 'win' over values set at any of the other locations.

If you need to override this precedence - for example, you want the value set in one particular test suite to take precedence over a setting of the same value from the API request - you can add a leading + to the variable name. For instance, if you set +VARIABLE = foo in a test suite, and passed VARIABLE=bar in the API request, the test suite setting would 'win' and the value would be foo.

If the same variable is set with a + prefix in multiple places, the same precedence order described above will apply to those settings.

Note that the WORKER_CLASS variable is not overridden in the way described above. Instead multiple occurrences are combined.

Use of the web interface

In general the web UI should be intuitive or self-explanatory. Look out for the little blue help icons and click them for detailed help on specific sections.

Some pages use queries to select what should be shown. The query parameters are generated on clickable links, for example starting from the index page or the group overview page clicking on single builds. On the query pages there can be UI elements to control the parameters, for example to look for more older builds or only show failed jobs or other settings. Additionally, the query parameters can be tweaked by hand if you want to provide a link to specific views.

Description of test suites

Test suites can be described using API commands or the admin table for any operator using the web UI.

test suite description edit field
Figure 2. Entering a test suite description in the admin table using the web interface:

If a description is defined, the name of the test suite on the tests overview page shows up as a link. Clicking the link will show the description in a popup. The same syntax as for comments can be used, that is Markdown with custom extensions such as shortened links to ticket systems.

test suite description popup
Figure 3. popover in test overview with content as configured in the test suites database:

/tests/overview - Customizable test overview page

The overview page is configurable by the filter box. Also, some additional query parameters can be provided which can be considered advanced or experimental. For example specifying no build will resolve the latest build which matches the other parameters specified. Specifying no group will show all jobs from all matching job groups. Also specifying multiple groups works, see the following example.

test overview page showing multiple groups
Figure 4. The openQA test overview page showing multiple groups at once. The URL query parameters specify the groupid parameter two times to resolve both the "opensuse" and "opensuse test" group.

Specifying multiple groups with no build will yield the latest build of the first group. This can be useful to have a static URL for bookmarking.

Review badges

Based on comments in the individual job results for each build a certificate icon is shown on the group overview page as well as the index page to indicate that every failure has been reviewed, e.g. a bug reference or a test issue reason is stated:

Review badges

Meaning of the different colors

  • The green icons shows up when there is no work to be done.

  • No icon is shown if at least one failure still need to be reviewed.

  • The black icon is shown if all review work has been done.

(To simplify, checking for false-negatives is not considered here.)

Show bug or label icon on overview if labeled gh#550

  • Show bug icon with URL if mentioned in test comments

  • Show bug or label icon on overview if labeled

For bugreferences write <bugtracker_shortname>#<bug_nr> in a comment, e.g. "bsc#1234", for generic labels use label:<keyword> where <keyword> can be any valid character up to the next whitespace, e.g. "false_positive". The keywords are not defined within openQA itself. A valid list of keywords should be decided upon within each project or environment of one openQA instance.

Example of a generic label
Figure 5. Example for a generic label
Example of a bug label
Figure 6. Example for bug label

Related issue: #10212

'Hint:' You can also write (or copy-paste) full links to bugs and issues. The links are automatically changed to the shortlinks (e.g. https://progress.opensuse.org/issues/11110 turns into poo#11110). Related issue: poo#11110

Also github pull requests and issues can be linked using the generic format `<marker>[#<project/repo>]#<id>`, e.g. gh#os-autoinst/openQA#1234, see gh#973

All issue references are stored within the internal database of openQA. The status can be updated using the /bugs API route for example using external tools.

Example for visualization of closed issue references
Figure 7. Example for visualization of closed issue references. Upside down icons in red visualize closed issues.

Distinguish product and test issues bugref gh#708

“progress.opensuse.org” is used to track test issues, bugzilla for product issues, at least for SUSE/openSUSE. openQA bugrefs distinguish this and show corresponding icons

Different icons for product and test issues

Build tagging

Tag builds with special comments on group overview

Based on comments on the group overview individual builds can be tagged. As 'build' by themselves do not own any data the job group is used to store this information. A tag has a build to link it to a build. It also has a type and an optional description. The type can later on be used to distinguish tag types.

The generic format for tags is

tag:<build_id>:<type>[:<description>], e.g. tag:1234:important:Beta1.

The more recent tag always wins.

A 'tag' icon is shown next to tagged builds together with the description on the group_overview page. The index page does not show tags by default to prevent a potential performance regression. Tags can be enabled on the index page using the corresponding option in the filter form at the bottom of the page.

Example of a tag coment and corresponding tagged build

Keeping important builds

As builds can now be tagged we come up with the convention that the 'important' type - the only one for now - is used to tag every job that corresponds to a build as 'important' and keep the logs for these jobs longer so that we can always refer to the attached data, e.g. for milestone builds, final releases, jobs for which long-lasting bug reports exist, etc.

Filtering test results and builds

At the top of the test results overview page is a form which allows filtering tests by result, architecture and TODO-status.

Filter form

There is also a similar form at the bottom of the index page which allows filtering builds by group and customizing the limits.

Highlighting job dependencies in 'All tests' table

When hovering over the branch icon after the test name children of the job will be highlighted blue and parents red. So far this only works for jobs displayed on the same page of the table.

highlighted child jobs

Show previous results in test results page gh#538

On a tests result page there is a tab for “Next & previous results” showing the result of test runs in the same scenario. This shows next and previous builds as well as test runs in the same build. This way you can easily check and compare results from before including any comments, labels, bug references (see next section). This helps to answer questions like “Is this a new issue”, “Is it reproducible”, “has it been seen in before”, “how does the history look like”.

Querying the database for former test runs of the same scenario is a rather costly operation which we do not want to do for multiple test results at once but only for each individual test result (1:1 relation). This is why this is done in each individual test result and not for a complete build.

Related issue: #10212

Screenshot of the feature:

Next and previous job results

Find the always latest job in a scenario with the link after the scenario name in the tab “Next & previous results” Screenshot: image::images/test_details-link_to_latest.png[Link to latest in scenario]

Add `latest' query route gh#815

Should always refer to most recent job for the specified scenario.

  • have the same link for test development, i.e. if one retriggers tests, the person has to always update the URL. If there would be a static URL even the browser can be instructed to reload the page automatically

  • for linking to the always current execution of the last job within one scenario, e.g. to respond faster to the standard question in bug reports “does this bug still happen?”

Examples:

  • tests/latest?distri=opensuse&version=13.1&flavor=DVD&arch=x86_64&test=kde&machine=64bit

  • tests/latest?flavor=DVD&arch=x86_64&test=kde

  • tests/latest?test=foobar - this searches for the most recent job using test_suite `foobar' covering all distri, version, flavor, arch, machines. To be more specific, add the other query entries.

Allow group overview query by result gh#531

This allows e.g. to show only failed builds. Could be included like in http://lists.opensuse.org/opensuse-factory/2016-02/msg00018.html for “known defects”.

Example: Add query parameters like …&result=failed&arch=x86_64 to show only failed for the single architecture selected.

Add web UI controls to select more builds in group_overview gh#804

The query parameter `limit_builds' allows to show more than the default 10 builds on demand. Just like we have for configuring previous results, the current commit adds web UI selections to reload the same page with higher number of builds on demand. For this, the limit of days is increased to show more builds but still limited by the selected number.

Example screenshot:

Select different limit for number of displayed builds

More query parameters for configuring last builds gh#575

By using advanced query parameters in the URLs you can configure the search for builds. Higher numbers would yield more complex database queries but can be selected for special investigation use cases with the advanced query parameters, e.g. if one wants to get an overview of a longer history. This applies to both the index dashboard and group overview page.

Example to show up to three week old builds instead of the default two weeks with up to 20 builds instead of up to 10 being the default for the group overview page:

http://openqa/group_overview/1?time_limit_days=21&limit_builds=20

Web UI controls to filter only tagged or all builds gh#807

Using a new query parameter `only_tagged=[0|1]' the list can be filtered, e.g. show only tagged (important) builds.

Example screenshot:

Show only tagged or all builds

Related issue: #11052

Carry over bugrefs from previous jobs in same scenario if still failing gh#564

It is possible to label all failing tests but tedious to do by a human user as many failures are just having the same issue until it gets fixed. It helps if a label is preserved for a build that is still failing. This idea is inspired by https://wiki.jenkins-ci.org/display/JENKINS/Claim+plugin and has been activated for bugrefs.

Does not carry over bugrefs over passes: After a job passed a new issue in a subsequent fail is assumed to be failed for a different reason.

Related issue: #10212

Pinning comments as group description

This is possible by adding the keyword pinned-description anywhere in a comment on the group overview page. Then the comment will be shown at the top of the group overview page. However, it only works as operator or admin.

Developer mode

The developer mode allows to:

  • Create or update needles from assert_screen mismatches ("re-needling")

  • Pause the test execution (at a certain module) for manual investigation of the SUT

It can be accessed via the "Live View" tab of a running test. Only registered users can take control over a tests. Basic instructions and buttons providing further information about the different options are already contained on the web page itself. So I am not repeating that information here and rather explain the overall workflow.

In case the developer mode in not working on your instance, try to follow the steps for debugging the developer mode under 'Pitfalls'.

Workflow for creating or updating needles

  1. In case a new needles should be created, add the corresponding assert_screen calls to your test.

  2. Start the test with the assert_screen calls which are supposed to fail.

  3. Select "assert_screen timeout" under "Pause on screen mismatch" and confirm.

  4. Wait until the test has paused. There is a button to skip the current timeout to speed this up.

  5. A button for accessing the needle editor should occur. It may take a few seconds till it occurs because the screenshots created so far need to be uploaded from the worker to the web UI. Of course it is also possible to go back to the "Details" tab to create a new needle from any previous screenshot/match available.

  6. After creating the new needle, click the resume button to test whether it worked.

Steps 4. to 6. can be repeated for further needles without restarting the test.

Job group editor gh#2111

Scenarios are defined as part of a job group. The Edit job group button exposes the editor.

YAML job templates editor

Settings can be specified as a key/value pair for each scenario. There is no equivalent in the table view so you need to migrate groups to use this feature.

Any settings specified on test suites, machines or products are also used and can still be modified independently. However, the YAML document should be updated before renaming or deleting test suites, products or machines used by it, otherwise that would create an inconsistent state.

Job groups can be updated through the YAML editor or the YAML-related REST API routes.

Deprecated: Table-based (pre-migration)

In old versions openQA had a table-based UI for defining job templates, listed in a table per medium. Machines can be added by selecting the architecture column and picking a machine from the list. Remove scenarios by removing all of their machines. Add new scenarios via the blue Plus icon at the top of the table. Changes to the priority are applied immediately.

If job groups still exist showing the old mode, the Edit YAML button can be used to reveal the YAML editor and migrate a group. After saving for the first time, the group can only be configured in YAML. The table view will not be shown anymore.

Note that making a backup before migrating groups may be a good idea, for example using openqa-dump-templates.

To migrate an old job group using the API the current schedule can be retrieved in YAML format and sent back to save as a complete YAML document. For example for all job groups in the old format:

for i in $(ssh openqa.example.com "sudo -u geekotest psql --no-align --tuples-only --command=\"select id from job_groups where template is null order by id;\" openqa") ; do
    curl -s http://openqa.example.com/api/v1/job_templates_scheduling/$i | openqa-client --json-output --host http://openqa.example.com job_templates_scheduling/$i post --form schema=JobTemplates-01.yaml template="$(cat -)"
done

Note that in some cases you might run into errors where old test suites or products have invalid names which the old editor did not enforce:

Product names may not contain : or @ characters. Something like Server-DVD-Staging:A would require replacing the : with eg. a -.

Test suites may not contain : or @ characters. A test suite such as ext4_uefi@staging would have been allowed previously. The use of the @ as a suffix could be replaced with a - or if it is used for variants of the same test suite with different settings, settings can be specified in YAML directly.

More generally the regular expression [A-Za-z0-9._*-]+ could be used to check if a name is allowed for a product or test suite.

Configuring job groups via YAML documents

A new job group starts out empty, which in YAML means that the two mandatory sections are present but contain nothing. This is what can be seen when editing a completely group, and what is also the state to revert to before deleting a job group that is no longer useful:

products: {}
scenarios: {}

A job group is comprised of up to three main sections. products defines one or more mediums to run the scenarios in the group. At least one needs to be specified to be able to run tests. Going by an example of openSUSE 15.1 the name, distri, flavor and version could be written like so. Note that the version is a string in single quotes.

products:
  opensuse-15.1-DVD-Updates-x86_64:
    distri: opensuse
    flavor: DVD-Updates
    version: '15.1'

To complete the job group at least one scenario has to be added. A scenario is a combination of a test suite, a machine and an architecture. Scenarios must also be unique across job groups - trying to add it to multiple job groups is an error. Case in point, textmode and gnome could be defined like so:

scenarios:
  x86_64:
    opensuse-15.1-DVD-Updates-x86_64:
    - textmode
    - gnome:
      machine: uefi
      priorty: 70
      settings:
        QEMUVGA: cirrus

Now there are two scenarios for x86_64, one by giving just the name of the test suite and another which has a machine, priority and settings. Both are allowed. However since at least one scenario relies on defaults those need to be specified once in their own section:

defaults:
  x86_64:
    machine: 64bit
    priority: 50

The defaults section is only required whenever a scenario is not completely defined in-place. When it is used, the available parameters are identical to those for a single scenario. For instance the example could be amended to use settings and run every test suite for that architecture on several machines by default.

defaults:
  x86_64:
    machine: [64bit, 32bit]
    priority: 50
    settings:
      FOO: 1

Defaults are always overwritten by explicit parameters on scenarios. Further more, all settings can be specified in YAML. Using this together with custom job template names, variants of a scenario can even be specified when they would normally be considered duplicated:

scenarios:
  x86_64:
    opensuse-15.1-DVD-Updates-x86_64:
    - textmode
    - gnome:
      machine: uefi
      priorty: 70
      settings:
        QEMUVGA: cirrus
    - gnome_staging:
      testsuite: gnome
      machine: [32bit, 64bit-staging]
      settings:
        FOO: 2

Even more flexibility can be achieved by using aliases in YAML, or in other words re-using a scenario by reference, such as to run the same scenarios in two different mediums. & is used to define an anchor, while * is the alias referencing the anchor:

products:
  opensuse-15.1-DVD-Updates-x86_64:
    distri: opensuse
    flavor: DVD-Updates
    version: '15.1'
  opensuse-15.2-GNOME-Live-x86_64:
    distri: opensuse
    flavor: GNOME-Live
    version: '15.2'
scenarios:
  x86_64:
    opensuse-15.1-DVD-Updates-x86_64:
    - textmode
    - gnome: &gnome
      machine: uefi
      priorty: 70
      settings:
        QEMUVGA: cirrus
    - gnome_staging: &gnome_staging
      testsuite: gnome
      machine: [32bit, 64bit-staging]
      settings:
        FOO: 2
    opensuse-15.2-GNOME-Live-x86_64:
    - textmode
    - gnome: *gnome
    - gnome_staging: *gnome_staging

Use of the REST API

openQA includes a client script which - depending on the distribution - is packaged independantly if you just want to interface with an existing openQA instance without needing to install the full package. Call openqa-client --help for help.

Basics are described in the Getting Started guide.

Triggering tests

Tests can be triggered over multiple ways, using openqa-clone-job, jobs post, isos post as well as retriggering existing jobs or whole media over the web UI.

Cloning existing jobs - openqa-clone-job

If one wants to recreate an existing job from any publically available openQA instance the script openqa-clone-job can be used to copy the necessary settings and assets to another instance and schedule the test. For the test to be executed it has to be ensured that matching ressources can be found, for example a worker with matching WORKER_CLASS must be registered. More details on openqa-clone-job can be found in Writing Tests.

Spawning single new jobs - jobs post

Single jobs can be spawned using the jobs post API route. All necessary settings on a job must be supplied in the API request. The "openQA client" has examples for this.

Spawning multiple jobs based on templates - isos post

The most common way of spawning jobs on production instances is using the isos post API route. Based on previously defined settings for media, job groups, machines and test suites jobs are triggered based on template matching. The Getting Started guide already mentioned examples. Additionally to the necessary template matching parameters more parameters can be specified which are forwarded to all triggered jobs. There are also special parameters which only have an influence on the way the triggering itself is done. These parameters all start with a leading underscore but are set as request parameters in the same way as the other parameters.

The following scheduling parameters exist
_OBSOLETE

Obsolete jobs in older builds with same DISTRI and VERSION (The default behavior is not obsoleting). With this option jobs which are currently pending, for example scheduled or running, are cancelled when a new medium is triggered.

_DEPRIORITIZEBUILD

Setting this switch to '1' will deprioritize the unfinished jobs of old builds, and it will obsolete the jobs once the configurable limit of priority is reached.

_DEPRIORITIZE_LIMIT

The configurable limit of priority up to which jobs should be deprioritized. Needs _DEPRIORITIZEBUILD. Default 100.

_ONLY_OBSOLETE_SAME_BUILD

Only obsolete (or deprioritize) jobs for the same BUILD. This is useful for cases where a new build appearing does not necessarily mean existing jobs for earlier builds with the same DISTRI and VERSION are no longer interesting, but you still want to be able to re-submit jobs for a build and have existing jobs for the exact same build obsoleted. Needs _OBSOLETE.

_SKIP_CHAINED_DEPS

Do not schedule parent test suites which are specified in START_AFTER_TEST or START_DIRECTLY_AFTER_TEST.

_GROUP

Job templates not matching the given group name are ignored. Does not affect obsoletion behavior.

_GROUP_ID

Same as _GROUP but allows to specify the group directly by ID.

_PRIORITY

Sets the priority for the new jobs (which otherwise defaults to the priority of the job template)

Example for _DEPRIORITIZEBUILD and _DEPRIORITIZE_LIMIT.

openqa-client isos post ISO=my_iso.iso DISTRI=my_distri FLAVOR=sweet \
         ARCH=my_arch VERSION=42 BUILD=1234 \
         _DEPRIORITIZEBUILD=1 _DEPRIORITIZE_LIMIT=120 \

Job template YAML

Job groups can be queried via the experimental REST API:

api/v1/experimental/job_templates_scheduling

The GET request will get the YAML for one or multiple groups while a POST request conversely updates the YAML for a particular group.

Two scripts using these routes can be used to import and export YAML templates:

openqa-dump-templates --json --group test > test.json
openqa-load-templates test.json

Asset handling

Multiple parameters exist to reference "assets" to be used by tests. "Assets" are essentially content that is stored by the openQA web-UI and provided to the workers; when sending jobs to os-autoinst on the workers, openQA adjusts the parameter values to refer to an absolute path where the worker will be able to access the content. Things that are typically assets include the ISOs and other images that are tested, for example.

Some assets can also be produced by a job, sent back to the web-UI, and used by a later job (see explanation of 'storing' and 'publishing' assets, below). Assets can also be seen in the web-UI and downloaded directly (though there is a configuration option to hide some or all asset types from public view in the web-UI).

The parameters treated as assets are as follows. Where you see e.g. ISO_n, that means ISO_1, ISO_2 etc. will all be treated as assets.

  • ISO (type iso)

  • ISO_n (type iso)

  • HDD_n (type hdd)

  • UEFI_PFLASH_VARS (type hdd) (in some cases, see below)

  • REPO_n (type repo)

  • ASSET_n (type other)

  • KERNEL (type other)

  • INITRD (type other)

The values of the above parameters are expected to be the name of a file - or, in the case of REPO_n, a directory - that exists under the path /var/lib/openqa/share/factory on the openQA web-UI. That path has subdirectories for each of the asset types, and the file or directory must be in the correct subdirectory, so e.g. the file for an asset HDD_1 must be under /var/lib/openqa/share/factory/hdd. You may create a subdirectory called fixed for any asset type and place assets there (e.g. in /var/lib/openqa/share/factory/hdd/fixed for hdd-type assets): this exempts them from the automatic cleanup described under 'Asset cleanup' above. Non-fixed assets are always subject to the cleanup.

UEFI_PFLASH_VARS is a special case: whether it is treated as an asset depends on the value. If the value looks like an absolute path (starts with /), it will not be treated as an asset (and so the value should be an absolute path for a file which exists on the relevant worker system(s)). Otherwise, it is treated as an hdd-type asset. This allows tests to use a stock base image (like the ones provided by edk2) for a simple case, but also allows a job to upload its image on completion - including any changes made to the UEFI variables during the execution of the job - for use by a child job which needs to inherit those changes.

You can also use special suffixes to the basic parameter forms to access some special handling for assets.

The following suffixes exist:
_URL

Before starting these jobs, try to download these assets into the relevant asset directory of the openQA web-UI from trusted domains specified in /etc/openqa/openqa.ini. For e.g., ISO_1_URL=http://trusted.com/foo.iso would, if trusted.com is set as a trusted domain, cause openQA to download the file foo.iso to /var/lib/openqa/share/factory/iso and set ISO_1=foo.iso. If you set both ISO_1 and ISO_1_URL, the file pointed to by ISO_1_URL will be downloaded and renamed to the name set as ISO_1.

_DECOMPRESS_URL

Specify a compressed asset to be downloaded that will be uncompressed by openQA. For e.g. ISO_1_DECOMPRESS_URL=http://host/foo2.iso.xz will download the file foo2.iso.xz, uncompress it to foo2.iso, store it in /var/lib/openqa/share/factory/iso and set ISO_1=foo2.iso. Again, you can also set ISO_1 to change the name the file will be downloaded and uncompressed as.

Assets may be shared between the web-UI and the workers by having them literally use a shared filesystem (this used to be the only option), or by having the workers download them from the server when needed and cache them locally. See 'Asset Caching' in theInstalling guide for more on this.

HDD_n assets can be 'stored' or 'published' by a job, and UEFI_PFLASH_VARS assets can be 'published'. These both mean that if the job completes successfully, the resulting state of those disk assets will be sent back to the web-UI and made available as an hdd-type asset. To 'store' an asset, you can specify e.g. STORE_HDD_1. To 'publish' it, you can specify e.g. PUBLISH_HDD_1 or PUBLISH_PFLASH_VARS. If you specify PUBLISH_HDD_1=updated.qcow2, the HDD_1 disk image as it exists at the end of the test will be uploaded back to the web-UI and stored under the name updated.qcow2; any other job can then specify HDD_1=updated.qcow2 to use this published image as its HDD_1. To force publishing assets even in case of a failed job one can try the FORCE_PUBLISH_HDD_ variable.

The difference between 'storing' and 'publishing' is that when 'storing' an asset, it will be altered in some way (currently, by prepending the job ID to the filename) to associate it with the particular job that produced it. That means that many jobs can 'store' an asset under "the same name" without conflicting. Of course, that would seem to make it hard for other jobs to use the 'stored' image - but for "chained" jobs, the reverse operation is done transparently. This all means that a 'parent' job template can specify STORE_HDD_1=somename.qcow2 and its 'child' job template(s) can specify HDD_1=somename.qcow2, and everything will work, without multiple runs of the same jobs overwriting the asset. For more on "chained" jobs, see 'Job dependencies' in the Writing Tests guide.

When using this mechanism you will often also want to use the 'Variable expansion' mechanism described in the Getting Started guide.

Asset cleanup

For more information on assets, see 'Asset handling' below.

Assets like ISO files consume a huge amount of disk space. Therefore openQA removes assets automatically according to configurable limits.

This section provides an overall description of the cleanup strategy and how to configure the limits. Cleanup-related parameter for the REST API can be found in the 'Asset handling' section under 'Use of the REST API'.

Cleanup strategy

openQA frequently checks whether assets need to be removed according to the configured limits.

To find out whether an asset should be removed, openQA determines by which job groups the asset is used. If at least one job within a certain job group is using an asset, the asset is considered to be used by that group.

So an asset can be accounted to multiple groups. The assets table which is accessible via the admin menu shows these groups for each asset and also the latest job.

If the size limit for assets by a certain group is exceeded, openQA will remove assets accounted to that group:

  • Assets belonging to old jobs are preferred.

  • Assets belonging to jobs which are still scheduled or running are not considered.

  • Assets which are also accounted to another group that has still space left are not considered.

Assets which do not belong to any group are removed after a configurable duration unless the files are still being updated. Keep in mind that this behavior is also enabled on local instances and affects all cloned jobs (unless cloned into a job group).

'Fixed' assets - those placed in the fixed subdirectory of the relevant asset directory - are counted against the group size limit, but are never cleaned up. This is intended for things like base disk images which must always be available for a test to work.

Configure limit for assets within groups

To configure the maximum size for the assets of a group, open 'Job groups' in the operators menu and select a group. The size limit for assets can be configured under 'Edit job group properties'. It also shows the size of assets which belong to that group and not to any other group.

Job groups inherit the size limit from their parent group unless the limit is set explicitely. The default size limit for groups can be adjusted in the default_group_limits section of the openQA config file.

Configure limit for groupless assets

Assets not belonging to jobs within a group are deleted automatically after a certain number of days. That duration can be adjusted by setting untracked_assets_storage_duration in the misc_limits section of the openQA config to the desired number of days.

In less trivial cases where a common limit is not enough or certain assets need more fine-grained control, patterns based on the filename can be used. The patterns are interpreted as Perl regular expressions and if a pattern matches the basename of an asset the specified duration in days will be used. In simple cases the pattern is just a match on a word.

Consider the following examples to specify custom limits that would match assets with the names testrepo-latest and openSUSE-12.3-x86_64.iso.

[assets/storage_duration]
latest = 30
openSUSE.+x86_64 = 10

Note that modifications to the file will count against the limit, so if an asset was updated within the timespan it will not be removed.

CLI interface

Beside the daemon argument to run the actual web service the openQA startup script /usr/share/openqa/script/openqa supports further arguments.

For a full list of those commands, just invoke /usr/share/openqa/script/openqa -h. This also works for sub-commands(e.g. /usr/share/openqa/script/openqa minion -h, /usr/share/openqa/script/openqa minion job -h).

Note that prefork is only supported for the main web service but not for other services like the live view handler.

Where to now?

For test developers it is recommended to continue with the Test Developer Guide.

openQA tests developer guide

Introduction

openQA is an automated test tool that makes it possible to test the whole installation process of an operating system. It’s free software released under the GPLv2 license. The source code and documentation are hosted in the os-autoinst organization on GitHub.

This document provides the information needed to start developing new tests for openQA or to improve the existing ones. It’s assumed that the reader is already familiar with openQA and has already read the Starter Guide, available at the official repository.

Basic

This section explains the basic layout of openQA tests and the API available in tests. openQA tests are written in the Perl programming language. Some basic but no in-depth knowledge of Perl is needed. This document assumes that the reader is already familiar with Perl.

API

os-autoinst provides the API for the tests using the os-autoinst backend, you can take a look to the published documentation at http://open.qa/api/testapi/.

How to write tests

openQA tests need to implement at least the run subroutine to contain the actual test code and the test needs to be loaded in the distribution’s main.pm.

The test_flags subroutine specifies what should happen when test execution of the current test module is finished depending on the result. If we should skip execution of the following test modules if current one failed, or it should be used to create a snapshot of SUT to rollback to. See example below.

There are several callbacks defined:

  • post_fail_hook is called to upload log files or determine the state of the machine

  • pre_run_hook is called before the run function - mainly useful for a whole group of tests

  • post_run_hook is run after successful run function - mainly useful for a whole group of tests

The following example is a basic test that assumes some live image that boots into the desktop when pressing enter at the boot loader:

use base "basetest";
use strict;
use testapi;

sub run {
    # wait for bootloader to appear
    # with a timeout explicitly lower than the default because
    # the bootloader screen will timeout itself
    assert_screen "bootloader", 15;

    # press enter to boot right away
    send_key "ret";

    # wait for the desktop to appear
    assert_screen "desktop", 300;
}

sub test_flags {
    # 'fatal'          - abort whole test suite if this fails (and set overall state 'failed')
    # 'ignore_failure' - if this module fails, it will not affect the overall result at all
    # 'milestone'      - after this test succeeds, update 'lastgood'
    # 'no_rollback'     - don't roll back to 'lastgood' snapshot if this fails
    # 'always_rollback'    - roll back to 'lastgood' snapshot even if test was successful (supported on QEMU backend only)
    return { fatal => 1 };
}

1;

Test Case Examples

Example: Console test that installs software from remote repository via zypper command
sub run() {
    # change to root
    become_root;

    # output zypper repos to the serial
    script_run "zypper lr -d > /dev/$serialdev";

    # install xdelta and check that the installation was successful
    assert_script_run 'zypper --gpg-auto-import-keys -n in xdelta';

    # additionally write a custom string to serial port for later checking
    script_run "echo 'xdelta_installed' > /dev/$serialdev";

    # detecting whether 'xdelta_installed' appears in the serial within 200 seconds
    die "we could not see expected output" unless wait_serial "xdelta_installed", 200;

    # capture a screenshot and compare with needle 'test-zypper_in'
    assert_screen 'test-zypper_in';
}
Example: Typical X11 test testing kate
sub run() {
    # make sure kate was installed
    # if not ensure_installed will try to install it
    ensure_installed 'kate';

    # start kate
    x11_start_program 'kate';

    # check that kate execution succeeded
    assert_screen 'kate-welcome_window';

    # close kate's welcome window and wait for the window to disappear before
    # continuing
    wait_screen_change { send_key 'alt-c' };

    # typing a string in the editor window of kate
    type_string "If you can see this text kate is working.\n";

    # check the result
    assert_screen 'kate-text_shown';

    # quit kate
    send_key 'ctrl-q';

    # make sure kate was closed
    assert_screen 'desktop';
}

Variables

Test case behavior can be controlled via variables. Some basic variables like DISTRI, VERSION, ARCH are always set. Others like DESKTOP are defined by the 'Test suites' in the openQA web UI. Check the existing tests at os-autoinst-distri-opensuse on GitHub for examples.

Variables are accessible via the get_var and check_var functions.

Advanced test features

Capturing kernel exceptions and/or any other exceptions from the serial console

Soft and hard failures can be triggered on demand by regular expressions when they match the serial output which is done after the test is executed. In case it doesn’t make sense to continue test run even if current test module doesn’t have fatal flag, use fatal as serial failure type, so all subsequent test modules won’t be executed if such failure was detected. To use this functionality the test developer needs to define the patterns to look for in the serial output either in the main.pm or in the test itself. Any pattern change done in a test it will be reflected in the next tests.

The patterns defined in the main.pm will be valid for all the tests.

To simplify tests results review, if job fails with the same message, which is defined for the pattern, as previous job, automatic comment carryover will work even if test suites have failed due to different test modules.

Example: Defining serial exception capture in the main.pm
$testapi::distri->set_expected_serial_failures([
        {type => 'soft', message  => 'known issue',  pattern => quotemeta 'Error'},
        {type => 'hard', message  => 'broken build', pattern => qr/exception/},
        {type => 'fatal', message => 'critical issue build', pattern => qr/kernel oops/},
    ]
);
Example: Defining serial exception capture in the test
sub run {
    my ($self) = @_;
    $self->{serial_failures} = [
        {type => 'soft', message  => 'known issue',  pattern => quotemeta 'Error'},
        {type => 'hard', message  => 'broken build', pattern => qr/exception/},
        {type => 'fatal', message => 'critical issue build', pattern => qr/kernel oops/},
    ];
    ...
}
Example: Adding serial exception capture in the test
sub run {
    my ($self) = @_;
    push @$self->{serial_failures}, {type => 'soft', message => 'known issue',  pattern => quotemeta 'Error'};
    ...
}

Assigning jobs to workers

By default, any worker can get any job with the matching architecture.

This behavior can be changed by setting job variable WORKER_CLASS. Jobs with this variable set (typically via machines or test suites configuration) are assigned only to workers, which have the same variable in the configuration file.

For example, the following configuration ensures, that jobs with WORKER_CLASS=desktop can be assigned only to worker instances 1 and 2.

File: workers.ini
[1]
WORKER_CLASS = desktop

[2]
WORKER_CLASS = desktop

[3]
# WORKER_CLASS is not set

Writing multi-machine tests

Scenarios requiring more than one system under test (SUT), like High Availability testing, are covered as multi-machine tests (MM tests) in this section.

OpenQA approaches multi-machine testing by assigning dependencies between individual jobs. This means the following:

  • everything needed for MM tests must be running as a test job (or you are on your own), even support infrastructure (custom DHCP, NFS, etc. if required), which in principle is not part of the actual testing, must have a defined test suite so a test job can be created

  • OpenQA scheduler makes sure tests are started as a group and in right order, cancelled as a group if some dependencies are violated and cloned as a group if requested.

  • OpenQA does not synchronize individual steps of the tests.

  • OpenQA provides locking server for basic synchronization of tests (e.g. wait until services are ready for failover), but the correct usage of locks is test designer job (beware deadlocks).

In short, writing multi-machine tests adds a few more layers of complexity:

  1. documenting the dependencies and order between individual tests

  2. synchronization between individual tests

  3. actual technical realization (i.e. custom networking)

Job dependencies

There are different dependency types (see subsequent section). Additionally, dependencies can be machine-specific (see "Inter-machine dependencies" section).

Dependency types

There are 3 types of dependencies: CHAINED, DIRECTLY_CHAINED and PARALLEL

Chained dependencies

CHAINED and DIRECTLY_CHAINED describe when one test case depends on another and both are run sequentially, i.e. KDE test suite is run after and only after Installation test suite is successfully finished and cancelled if fail.

The difference between CHAINED and DIRECTLY_CHAINED dependencies is that DIRECTLY_CHAINED means the tests must run directly after another on the same worker slot. This can be useful to test efficiently on bare metal SUTs and other self-provisioning environments.

To define a CHAINED dependency add the variable START_AFTER_TEST with the name(s) of test suite(s) after which the selected test suite is supposed to run. Use comma separated list for multiple test suite dependency. E.g. START_AFTER_TEST="kde,dhcp-server"

To define a DIRECTLY_CHAINED dependency add the variable START_DIRECTLY_AFTER_TEST. It works in the same way as for CHAINED dependencies. Mismatching worker classes between jobs to run in direct sequence on the same worker are considered an error.

Parallel dependencies

PARALLEL describes multi-machine tests. That are test suites scheduled to run at the same time and managed as a group. On top of that, PARALLEL also describes test suite dependencies, where some test suites (children) run parallel with other test suites (parents) only when parents are running.

To define a PARALLEL dependency, use the PARALLEL_WITH variable with the name(s) of test suite(s) which acts as a parent suite(s) to selected test suite. In other words, PARALLEL_WITH describes "I need this test suite to be running during my run". Use a comma separated list for multiple test suite dependency (e.g. PARALLEL_WITH="web-server,dhcp-server"). Keep in mind that the parent job must be running until all children finish. Otherwise the scheduler will cancel child jobs once parent is done.

Job dependencies are only resolved when using the iso controller to create new jobs from job templates. Posting individual jobs manually won’t work.

Inter-machine dependencies

Those dependencies make it possible to create job dependencies between tests which are supposed to run on different machines.

To use it, simply append the machine name for each dependent test suite with an @ sign separated. If a machine is not explicitly defined, the variable MACHINE of the current job is used for the dependent test suite.

Example 1:

START_AFTER_TEST="kde@64bit-1G,dhcp-server@64bit-8G"

Example 2:

PARALLEL_WITH="web-server@ipmi-fly,dhcp-server@ipmi-bee,http-server"

Then, in job templates, add test suite(s) and all of its dependent test suite(s). Keep in mind to place the machines which have been explicitly defined in a variable for each dependent test suite. Checkout the following example sections to get a better understanding.

openQA tries to handle things sensibly when jobs with relations like this either fail, or are manually cancelled or restarted. When a chained or parallel parent fails or is cancelled, all children will be cancelled; if the parent is restarted, all children are also restarted. When a parallel child is restarted, the parent and any other children will also be restarted. When a chained child is restarted, the parent is not restarted; this will usually be fine, but be aware that if an asset uploaded by the chained parent has been cleaned up, the child may fail immediately. To deal with this case, just restart the parent.

By default, when a parallel child fails or is cancelled, the parent and all other children are also cancelled. This behaviour is intended for closely-related clusters of jobs, e.g. high availability tests, where it’s sensible to assume the entire test is invalid if any of its components fails. A special variable can be used to change this behaviour. Setting a parallel parent job’s PARALLEL_CANCEL_WHOLE_CLUSTER to any truth-y value (e.g. 1 or 'true') changes this so that, if one of its children fails or is cancelled but the parent has other pending or active children, the parent and the other children will not be cancelled. This behaviour makes more sense if the parent is providing services to the various children but the children themselves are not closely related and a failure of one does not imply that the tests run by the other children and the parent are invalid.

Examples
Specify machine explicitly

Assume there is a test suite A supposed to run on machine 64bit-8G. Additionally, test suite B supposed to run on machine 64bit-1G. That means test suite B needs the variable START_AFTER_TEST=A@64bit-8G. This results in the following dependency:

A@64bit-8G --> B@64bit-1G
Implicitly inherit machines from parent

Assume test suite A is supposed to run on the machines 64bit and ppc. Additionally, test suite B is supposed to run on both of these machines as well. This can be achieved by simply adding the variable START_AFTER_TEST=A to test suite B (omitting the machine at all). openQA take the best matches. This results in the following dependencies:

A@64bit --> B@64bit
A@ppc --> B@ppc
Conflicting machines prevent inheritence from parent

Assume test suite A is supposed to run on machine 64bit-8G. Additionally, test suite B is supposed to run on machine 64bit-1G.

Adding the variable START_AFTER_TEST=A to test suite B will not work. That means openQA will not create a job dependency and instead shows an error message. So it is required to explicitly define the variable as START_AFTER_TEST=A@64bit-8G in that case.

Consider a different example: Assume test suite A is supposed to run on the machines ppc, 64bit and s390x. Additionally, there are 3 testsuites B on ppc-1G, C on ppc-2G and D on ppc64le.

Adding the variable PARALLEL_WITH=A@ppc to the test suites B, C and D will result in the following dependencies:

            A@ppc
              ^
           /  |  \
         /    |    \
B@ppc-1G  C@ppc-2G  D@ppc64le

openQA will also show errors that test suite A is not necessary on the machines 64bit and s390x.

Implicitly creating a dependency on same machine

Assume the value of the variable START_AFTER_TEST or PARALLEL_WITH only contains a test suite name but no machine (e.g. START_AFTER_TEST=A,B or PARALLEL_WITH=A,B).

In this case openQA will create job dependencies that are scheduled on the same machine if all test suites are placed on the same machine.

Notes regarding directly chained dependencies

Having multiple jobs with START_DIRECTLY_AFTER_TEST pointing to the same parent job is possible, e.g.:

   --> B --> C
 /
A
 \
   --> D --> E

Of course only either B or D jobs can really be started directly after A. However, the use of START_DIRECTLY_AFTER_TEST still makes sure that no completely different job is executed in the middle and of course that all of these jobs are executed on the same worker.

The directly chained sub-trees are executed in alphabetical order. So the above tree would result in the following execution order: A, B, C, D, E.

If A fails, none of the other jobs are attempted to be executed. If B fails, C is not attempted to be executed but D and E are. The assumption is that the average error case does not leave the system in a completely broken state and possibly required cleanup is done in the post fail hook.

Directly chained dependencies and regularly chained dependencies can be mixed. This allows to create a dependency tree which contains multiple directly chained sub-trees. Be aware that these sub-trees might be executed on different workers and depending on the tree even be executed in parallel.

Worker requirements

CHAINED and DIRECTLY_CHAINED dependencies require only one worker. PARALLEL dependencies on the other hand require as many free workers as jobs are present in the parallel cluster.

Examples
Listing 1. CHAINED - i.e. test basic functionality before going advanced - requires 1 worker
A --> B --> C

Define test suite A,
then define B with variable START_AFTER_TEST=A and then define C with START_AFTER_TEST=B

-or-

Define test suite A, B
and then define C with START_AFTER_TEST=A,B
In this case however the start order of A and B is not specified.
But C will start only after A and B are successfully done.
Listing 2. PARALLEL basic High-Availability
A
^
B

Define test suite A
and then define B with variable PARALLEL_WITH=A.
A in this case is parent test suite to B and must be running throughout B run.
Listing 3. PARALLEL with multiple parents - i.e. complex support requirements for one test - requires 4 workers
A B C
\ | /
  ^
  D

Define test suites A,B,C
and then define D with PARALLEL_WITH=A,B,C.
A,B,C run in parallel and are parent test suites for D and all must run until D finish.
Listing 4. PARALLEL with one parent - i.e. running independent tests against one server - requires at least 2 workers
   A
   ^
  /|\
 B C D

Define test suite A
and then define B,C,D with PARALLEL_WITH=A
A is parent test suite for B, C, D (all can run in parallel).
Children B, C, D can run and finish anytime, but A must run until all B, C, D finishes.

Test synchronization and locking API

OpenQA provides a locking API. To use it in your test files import the lockapi package (use lockapi;). It provides the following functions: mutex_create, mutex_lock, mutex_unlock, mutex_wait

Each of these functions takes the name of the mutex lock as first parameter. The name must not contain the "-" character. Mutex locks are associated with the caller’s job.

mutex_lock tries to lock the mutex for the caller’s job. The mutex_lock call blocks if the mutex does not exist or has been locked by a different job.

mutex_unlock tries to unlock the mutex. If the mutex is locked by a different job, mutex_unlock call blocks until the lock becomes available. If the mutex does not exist the call returns immediately without doing anything.

mutex_wait is a combination of mutex_lock and mutex_unlock. It displays more information about mutex state (time spent waiting, location of the lock). Use it if you need to wait for a specific action from single place (e.g. that Apache is running on the master node).

mutex_create creates a new mutex which is initially unlocked. If the mutex already exists the call returns immediately without doing anything.

Mutexes are addressed by their name. Each cluster of parallel jobs (defined via PARALLEL_WITH dependencies) has its own namespace. That means concurrently running jobs in different parallel job clusters use distinct mutexes (even if the same names are used).

The mmapi package provides wait_for_children which the parent can use to wait for the children to complete.

use lockapi;
use mmapi;

# On parent job
sub run {
    # ftp service started automatically on boot
    assert_screen 'login', 300;

    # unlock by creating the lock
    mutex_create 'ftp_service_ready';

    # wait until all children finish
    wait_for_children;
}

# On child we wait for ftp server to be ready
sub run {
    # wait until ftp service is ready
    # performs mutex lock & unlock internally
    mutex_wait 'ftp_service_ready';

    # connect to ftp and start downloading
    script_run 'ftp parent.job.ip';
    script_run 'get random_file';
}

# Mutexes can be used also for garanting exclusive access to resource
# Example on child when only one job should access ftp at time
sub run {
    # wait until ftp service is ready
    mutex_lock 'ftp_service_ready';

    # Perform operation with exclusive access
    script_run 'ftp parent.job.ip';
    script_run 'put only_i_am_here';
    script_run 'bye';

    # Allow other jobs to connect afterwards
    mutex_unlock 'ftp_service_ready';
}

Sometimes it is useful to wait for a certain action from the child or sibling job rather than the parent. In this case the child or sibling will create a mutex and any cluster job can lock/unlock it.

The child can however die at any time. To prevent parent deadlock in this situation, it is required to pass the mutex owner’s job ID as a second parameter to mutex_lock and mutex_wait. The mutex owner is the job that creates the mutex. If a child job with a given ID has already finished, mutex_lock calls die. The job ID is also required when unlocking such a mutex.

Example of mmapi: Parent JobWait until the child reaches given point
use lockapi;
use mmapi;

sub run {
    my $children = get_children();

    # let's suppose there is only one child
    my $child_id = (keys %$children)[0];

    # this blocks until the lock is available and then does nothing
    mutex_wait('child_reached_given_point', $child_id);

    # continue with the test
}

Mutexes are a way to wait for specific events from a single job. When we need multiple jobs to reach a certain state we need to use barriers.

To create a barrier call barrier_create with the parameters name and count. The name serves as an ID (same as with mutexes). The count parameter specifies the number of jobs needed to call barrier_wait to unlock barrier.

There is an optional barrier_wait parameter called check_dead_job. When used it will kill all jobs waiting in barrier_wait if one of the cluster jobs dies. It prevents waiting for states that will never be reached (and eventually dies on job timeout). It should be set only on one of the barrier_wait calls.

An example would be one master and three worker jobs and you want to do initial setup in the three worker jobs before starting main actions. In such a case you might use check_dead_job to avoid useless actions when one of the worker jobs dies.

Example of barriers: Check for dead jobs while waiting for barrier
use lockapi;

# In main.pm
barrier_create('NODES_CONFIGURED', 4);

# On master job
sub run {
    assert_screen 'login', 300;

    # Master is ready, waiting while workers are configured (check_dead_job is optional)
    barrier_wait {name => "NODES_CONFIGURED", check_dead_job => 1};

    # When 4 jobs called barrier_wait they are all unblocked
    script_run 'create_cluster';
    script_run 'test_cluster';

    # Notify all nodes that we are finished
    mutex_create 'CLUSTER_CREATED';
    wait_for_children;
}

# On 3 worker jobs
sub run {
    assert_screen 'login', 300;

    # do initial worker setup
    script_run 'zypper in HA';
    script_run 'echo IP > /etc/HA/node_setup';

    # Join the group of jobs waiting for each other
    barrier_wait 'NODES_CONFIGURED';

    # Don't finish until cluster is created & tested
    mutex_wait 'CLUSTER_CREATED';
}

Getting information about parents and children

Example of mmapi: Getting info about parents / children
use base "basetest";
use strict;
use testapi;
use mmapi;

sub run {
    # returns a hash ref containing (id => state) for all children
    my $children = get_children();

    for my $job_id (keys %$children) {
      print "$job_id is cancelled\n" if $children->{$job_id} eq 'cancelled';
    }

    # returns an array with parent ids, all parents are in running state (see Job dependencies above)
    my $parents = get_parents();

    # let's suppose there is only one parent
    my $parent_id = $parents->[0];

    # any job id can be queried for details with get_job_info()
    # it returns a hash ref containing these keys:
    #   name priority state result worker_id
    #   t_started t_finished test
    #   group_id group settings
    my $parent_info = get_job_info($parent_id);

    # it is possible to query variables set by openqa frontend,
    # this does not work for variables set by backend or by the job at runtime
    my $parent_name = $parent_info->{settings}->{NAME}
    my $parent_desktop = $parent_info->{settings}->{DESKTOP}
    # !!! this does not work, VNC is set by backend !!!
    # my $parent_vnc = $parent_info->{settings}->{VNC}
}

Support Server based tests

The idea is to have a dedicated "helper server" to allow advanced network based testing.

Support server takes advantage of the basic parallel setup as described in the previous section, with the support server being the parent test 'A' and the test needing it being the child test 'B'. This ensures that the test 'B' always have the support server available.

Preparing the supportserver

The support server image is created by calling a special test, based on the autoyast test:

/usr/share/openqa/script/client jobs post DISTRI=opensuse VERSION=13.2 \
    ISO=openSUSE-13.2-DVD-x86_64.iso  ARCH=x86_64 FLAVOR=Server-DVD \
    TEST=supportserver_generator MACHINE=64bit DESKTOP=textmode  INSTALLONLY=1 \
    AUTOYAST=supportserver/autoyast_supportserver.xml SUPPORT_SERVER_GENERATOR=1 \
    PUBLISH_HDD_1=supportserver.qcow2

This produces QEMU image 'supportserver.qcow2' that contains the supportserver. The 'autoyast_supportserver.xml' should define correct user and password, as well as packages and the common configuration.

More specific role the supportserver should take is then selected when the server is run in the actual test scenario.

Using the supportserver

In the Test suites, the supportserver is defined by setting:

HDD_1=supportserver.qcow2
SUPPORT_SERVER=1
SUPPORT_SERVER_ROLES=pxe,qemuproxy
WORKER_CLASS=server,qemu_autoyast_tap_64

where the SUPPORT_SERVER_ROLES defines the specific role (see code in 'tests/support_server/setup.pm' for available roles and their definition), and HDD_1 variable must be the name of the supportserver image as defined via PUBLISH_HDD_1 variable during supportserver generation. If the support server is based on older SUSE versions (opensuse 11.x, SLE11SP4..) it may also be needed to add HDDMODEL=virtio-blk. In case of QEMU backend, one can also use BOOTFROM=c, for faster boot directly from the HDD_1 image.

Then for the 'child' test using this supportserver, the following additional variable must be set: PARALLEL_WITH=supportserver-pxe-tftp where 'supportserver-pxe-tftp' is the name given to the supportserver in the test suites screen. Once the tests are defined, they can be added to openQA in the usual way:

/usr/share/openqa/script/client isos post DISTRI=opensuse VERSION=13.2 \
        ISO=openSUSE-13.2-DVD-x86_64.iso ARCH=x86_64 FLAVOR=Server-DVD

where the DISTRI, VERSION, FLAVOR and ARCH correspond to the job group containing the tests. Note that the networking is provided by tap devices, so both jobs should run on machines defined by (apart from others) having NICTYPE=tap, WORKER_CLASS=qemu_autoyast_tap_64.

Example of Support Server: a simple tftp test

Let’s assume that we want to test tftp client operation. For this, we setup the supportserver as a tftp server:

HDD_1=supportserver.qcow2
SUPPORT_SERVER=1
SUPPORT_SERVER_ROLES=dhcp,tftp
WORKER_CLASS=server,qemu_autoyast_tap_64

With a test-suites name supportserver-opensuse-tftp.

The actual test 'child' job, will then have to set PARALLEL_WITH=supportserver-opensuse-tftp, and also other variables according to the test requirements. For convenience, we have also started a dhcp server on the supportserver, but even without it, network could be set up manually by assigning a free ip address (e.g. 10.0.2.15) on the system of the test job.

Example of Support Server: The code in the *.pm module doing the actual tftp test could then look something like the example below
use strict;
use base 'basetest';
use testapi;

sub run {
  my $script="set -e -x\n";
  $script.="echo test >test.txt\n";
  $script.="time tftp ".$server_ip." -c put test.txt test2.txt\n";
  $script.="time tftp ".$server_ip." -c get test2.txt\n";
  $script.="diff -u test.txt test2.txt\n";
  script_output($script);

}

assuming of course, that the tested machine was already set up with necessary infrastructure for tftp, e.g. network was set up, tftp rpm installed and tftp service started, etc. All of this could be conveniently achieved using the autoyast installation, as shown in the next section.

Example of Support Server: autoyast based tftp test

Here we will use autoyast to setup the system of the test job and the os-autoinst autoyast testing infrastructure. For supportserver, this means using proxy to access QEMU provided data, for dowloading autoyast profile and tftp verify script:

HDD_1=supportserver.qcow2
SUPPORT_SERVER=1
SUPPORT_SERVER_ROLES=pxe,qemuproxy
WORKER_CLASS=server,qemu_autoyast_tap_64

The actual test 'child' job, will then be defined as :

AUTOYAST=autoyast_opensuse/opensuse_autoyast_tftp.xml
AUTOYAST_VERIFY=autoyast_opensuse/opensuse_autoyast_tftp.sh
DESKTOP=textmode
INSTALLONLY=1
PARALLEL_WITH=supportserver-opensuse-tftp

again assuming the support server’s name being supportserver-opensuse-tftp. Note that the pxe role already contains tftp and dhcp server role, since they are needed for the pxe boot to work.

Example of Support Server: The tftp test defined in the autoyast_opensuse/opensuse_autoyast_tftp.sh file could be something like:
set -e -x
echo test >test.txt
time tftp #SERVER_URL# -c put test.txt test2.txt
time tftp #SERVER_URL# -c get test2.txt
diff -u test.txt test2.txt && echo "AUTOYAST OK"

and the rest is done automatically, using already prepared test modules in tests/autoyast subdirectory.

Using text consoles and the serial terminal

Typically the OS you are testing will boot into a graphical shell e.g. The Gnome desktop environment. This is fine if you wish to test a program with a GUI, but in many situations you will need to enter commands into a textual shell (e.g Bash), TTY, text terminal, command prompt, TUI etc.

OpenQA has two basic methods for interacting with a text shell. The first uses the same input and output methods as when interacting with a GUI, plus a serial port for getting raw text output from the SUT. This is primarily implemented with VNC and so I will referrer to it as the VNC text console.

The serial port device which is used with the VNC text console is the default virtual serial port device in QEMU (i.e. the device configured with the -serial command line option). I will refer to this as the "default serial port". OpenQA currently only uses this serial port for one way communication from the SUT to the host.

The second method uses another serial port for both input and output. The SUT attaches a TTY to the serial port which os-autoinst logs into. All communication is therefor text based, similar to if you SSH’d into a remote machine. This is called the serial terminal console (or the virtio console, see implementation section for details).

The VNC text console is very slow and expensive relative to the serial terminal console, but allows you to continue using assert_screen and is more widely supported. Below is an example of how to use the VNC text console.

To access a text based console or TTY, you can do something like the

following.

use 5.018;
use warnings;
use base 'opensusebasetest';
use testapi;
use utils;

sub run {
    wait_boot;  # Utility function defined by the SUSE distribution
    select_console 'root-console';
}

1;

This will select a text TTY and login as the root user (if necessary). Now that we are on a text console it is possible to run scripts and observe their output either as raw text or on the video feed.

Note that root-console is defined by the distribution, so on different distributions or operating systems this can vary. There are also many utility functions that wrap select_console, so check your distribution’s utility library before using it directly.

Running a script: Using the assert_script_run and script_output commands
assert_script_run('cd /proc');
my $cpuinfo = script_output('cat cpuinfo');
if($cpuinfo =~ m/avx2/) {
    # Do something which needs avx2
}
else {
    # Do some workaround
}

This returns the contents of the SUT’s /proc/cpuinfo file to the test script and then searches it for the term 'avx2' using a regex.

The script_run and script_output are high level commands which use type_string and wait_serial underneath. Sometimes you may wish to use lower level commands which give you more control, but be warned that it may also make your code less portable.

The command wait_serial watches the SUT’s serial port for text output and matches it against a regex. type_string sends a string to the SUT like it was typed in by the user over VNC.

Using a serial terminal

Important
You need a QEMU version >= 2.6.1 and to set the VIRTIO_CONSOLE variable to 1 to use this with the QEMU backend.

Usually OpenQA controls the system under test using VNC. This allows the use of both graphical and text based consoles. Key presses are sent individually as VNC commands and output is returned in the form of screen images and text output from the SUT’s default serial port.

Sending key presses over VNC is very slow, so for tests which send a lot of text commands it is much faster to use a serial port for both sending shell commands and received program output.

Communicating entirely using text also means that you no longer have to worry about your needles being invalidated due to a font change or similar. It is also much cheaper to transfer text and test it against regular expressions than encode images from a VNC feed and test them against sample images (needles).

On the other hand you can no longer use assert_screen or take a screen shot because the text is never rendered as an image. A lot of programs will also send ANSI escape sequences which will appear as raw text to the test script instead of being interpreted by a terminal emulator which then renders the text.

select_console('root-virtio-terminal');  # Selects a virtio based serial terminal

The above code will cause type_string and wait_serial to write and read from a virtio serial port. A distribution specific call back will be made which allows os-autoinst to log into a serial terminal session running on the SUT. Once select_console returns you should be logged into a TTY as root.

Note
for os-autoinst-distri-opensuse tests instead of using select_console('root-virtio-terminal') directly is the preferred way to use wrapper select_serial_terminal(), which handles all backends:
# Selects a virtio based serial terminal if available or fallback to the best suitable console
# for the current backend.
select_serial_terminal();

If you are struggling to visualise what is happening, imagine SSH-ing into a remote machine as root, you can then type in commands and read the results as if you were sat at that computer. What we are doing is much simpler than using an SSH connection (it is more like using GNU screen with a serial port), but the end result looks quite similar.

As mentioned above, changing input and output to a serial terminal has the effect of changing where wait_serial reads output from. On a QEMU VM wait_serial usually reads from the default serial port which is also where the kernel log is usually output to.

When switching to a virtio based serial terminal, wait_serial will then read from a virtio serial port instead. However the default serial port still exists and can receive output. Some utility library functions are hard coded to redirect output to the default serial port and expect that wait_serial will be able to read it. Usually it is not too difficult to fix the utility function, you just need to remove some redirection from the relevant shell command.

Another common problem is that some library or utility function tries to take a screen shot. The hard part is finding what takes the screen shot, but then it is just a simple case of checking is_serial_terminal and not taking the screen shot if we are on a serial terminal console.

Distributions usually wrap select_console, so instead of using it directly, you can use something like the following which is from the OpenSUSE test suite.

if (select_serial_terminal()) {
        # Do something which only works, or is necessary, on a serial terminal
}

This selects the virtio based serial terminal console if possible. If it is available then it returns true. It is also possible to check if the current console is a serial terminal by calling is_serial_terminal.

Once you have selected a serial terminal, the video feed will disappear from the live view, however at the bottom of the live screen there is a separate text feed. After the test has finished you can view the serial log(s) in the assets tab. You will probably have two serial logs; serial0.txt which is written from the default serial port and serial_terminal.txt.

Now that you are on a serial terminal console everything will start to go a lot faster. So much faster in fact that race conditions become a big issue. Generally these can be avoided by using the higher level functions such as script_run and script_output.

It is rarely necessary to use the lower level functions, however it helps to recognise problems caused by race conditions at the lower level, so please read the following section regardless.

So if you do need to use type_string and wait_serial directly then try to use the following pattern:

1) Wait for the terminal prompt to appear. 2) Send your command 3) Wait for your command text to be echoed by the shell (if applicable) 4) Send enter 5) Wait for your command output (if applicable)

To illustrate this is a snippet from the LTP test runner which uses the lower level commands to achieve a little bit more control. I have numbered the lines which correspond to the steps above.

my $fin_msg    = "### TEST $test->{name} COMPLETE >>> ";
my $cmd_text   = qq($test->{command}; echo "$fin_msg\$?");
my $klog_stamp = "echo 'OpenQA::run_ltp.pm: Starting $test->{name}' > /dev/$serialdev";

# More variables and other stuff

if (is_serial_terminal) {
        script_run($klog_stamp);
        wait_serial(serial_term_prompt(), undef, 0, no_regex => 1); #Step 1
        type_string($cmd_text);		  	    	     	    #Step 2
        wait_serial($cmd_text, undef, 0, no_regex => 1);	    #Step 3
        type_string("\n");     	      	 	     		    #Step 4
} else {
        # None serial terminal console code (e.g. the VNC console)
}
my $test_log = wait_serial(qr/$fin_msg\d+/, $timeout, 0, record_output => 1); #Step 5

The first wait_serial (Step 1) ensures that the shell prompt has appeared. If we do not wait for the shell prompt then it is possible that we can send input to whatever command was run before. In this case that command would be 'echo' which is used by script_run to print a 'finished' message.

It is possible that echo was able to print the finish message, but was then suspended by the OS before it could exit. In which case the test script is able to race ahead and start sending input to echo which was intended for the shell. Waiting for the shell prompt stops this from happening.

INFO: It appears that echo does not read STDIN in this case, and so the input will stay inside STDIN’s buffer and be read by the shell (Bash). Unfortunately this results in the input being displayed twice: once by the terminal’s echo (explained later) and once by Bash. Depending on your configuration the behavior could be completely different

The function serial_term_prompt is a distribution specific function which returns the characters previously set as the shell prompt (e.g. export PS1="# ", see the bash(1) or dash(1) man pages). If you are adapting a new distribution to use the serial terminal console, then we recommend setting a simple shell prompt and keeping track of it with utility functions.

The no_regex argument tells wait_serial to use simple string matching instead of regular expressions, see the implementation section for more details. The other arguments are the timeout (undef means we use the default) and a boolean which inverts the result of wait_serial. These are explained in the os-autoinst/testapi.pm documentation.

Then the test script enters our command with type_string (Step 2) and waits for the command’s text to be echoed back by the system under test. Terminals usually echo back the characters sent to them so that the user can see what they have typed.

However this can be disabled (see the stty(1) man page) or possibly even unimplemented on your terminal. So this step may not be applicable, but it provides some error checking so you should think carefully before disabling echo deliberately.

We then consume the echo text (Step 3) before sending enter, to both check that the correct text was received and also to separate it from the command output. It also ensures that the text has been fully processed before sending the newline character which will cause the shell to change state.

It is worth reminding oneself that we are sending and receiving data extremely quickly on an interface usually limited by human typing speed. So any string which results in a significant state change should be treated as a potential source of race conditions.

Finally we send the newline character and wait for our custom finish message. record_output is set to ensure all the output from the SUT is saved (see the next section for more info).

What we do not do at this point, is wait for the shell prompt to appear. That would consume the prompt character breaking the next call to script_run.

We choose to wait for the prompt just before sending a command, rather than after it, so that Step 5 can be deferred to a later time. In theory this allows the test script to perform some other work while the SUT is busy.

Sending new lines and continuation characters

The following command will timeout: script_run("echo \"1\n2\""). The reason being script_run will call wait_serial("echo \"1\n2\"") to check that the command was entered successfully and echoed back (see above for explanation of serial terminal echo, note the echo shell command has not been executed yet). However the shell will translate the newline characters into a newline character plus '>', so we will get something similar to the following output.

echo "1
> 2"

The '>' is unexpected and will cause the match to fail. One way to fix this is simply to do echo -e \"1\\n2\". In this case Perl will not replace \n with a newline character, instead it will be passed to echo which will do the substitution instead (note the '-e' switch for echo).

In general you should be aware that, Perl, the guest kernel and the shell may transform whatever character sequence you enter. Transformations can be spotted by comparing the input string with what wait_serial actually finds.

Sending signals - ctrl-c and ctrl-d

On a VNC based console you simply use send_key like follows.

send_key('ctrl-c');

This usually (see termios(3)) has the effect of sending SIGINT to whatever command is running. Most commands terminate upon receiving this signal (see signal(7)).

On a serial terminal console the send_key command is not implemented (see implementation section). So instead the following can be done to achieve the same effect.

type_string('', terminate_with => 'ETX');

The ETX ASCII code means End of Text and usually results in SIGINT being raised. In fact pressing ctrl-c may just be translated into ETX, so you might consider this a more direct method. Also you can use 'EOT' to do the same thing as pressing ctrl-d.

You also have the option of using Perl’s control character escape sequences in the first argument to type_string. So you can also send ETX with:

type_string("\cC");

The terminate_with parameter just exists to display intention. It is also possible to send any character using the hex code like '\x0f' which may have the effect of pressing the magic SysRq key if you are lucky.

The virtio serial terminal implementation

The os-autoinst package supports several types of 'consoles' of which the virtio serial terminal is one. The majority of code for this console is located in consoles/virtio_terminal.pm and consoles/serial_screen.pm (used also svirt serial console). However there is also related code in backends/qemu.pm and distribution.pm.

You may find it useful to read the documentation in virtio_terminal.pm and serial_screen.pm if you need to perform some special action on a terminal such as triggering a signal or simulating the SysRq key. There are also some console specific arguments to wait_serial and type_string such as record_output.

The virtio 'screen' essentially reads data from a socket created by QEMU into a ring buffer and scans it after every read with a regular expression. The ring buffer is large enough to hold anything you are likely to want to match against, but not too large as to cause performance issues. Usually the contents of this ring buffer, up to the end of the match, are returned by wait_serial. This means earlier output will be overwritten once the ring buffer’s length is exceeded. However you can pass record_output which saves the output to a separate unlimited buffer and returns that instead.

Like record_output, the no_regex argument is a console specific argument supported by the serial terminal console. It may or may not have some performance benefits, but more importantly it allows you to easily match arbitrary strings which may contain regex escape sequences. To be clear, no_regex hints that wait_serial should just treat its input as a plain string and use the Perl library function index to search for a match in the ring buffer.

The send_key function is not implemented for the serial terminal console because the OpenQA console implementation would need to map key actions like ctrl-c to a character and then send that character. This may mislead some people into thinking they are actually sending ctrl-c to the SUT and also requires OpenQA to choose what character ctrl-c represents which varies across terminal configurations.

Very little of the code (perhaps none) is specific to a virtio based serial terminal and can be reused with a physical serial port, SSH socket, IPMI or some other text based interface. It is called the virtio console because the current implementation just uses a virtio serial device in QEMU (and it could easily be converted to an emulated port), but it otherwise has nothing to do with the virtio standard and so you should avoid using the name 'virtio console' unless specifically referring to the QEMU virtio implementation.

As mentioned previously, ANSI escape sequences can be a pain. So we try to avoid them by informing the shell that it is running on a 'dumb' terminal (see the SUSE distribution’s serial terminal utility library). However some programs ignore this, but piping there output into tee is usually enough to stop them outputting non-printable characters.

Test Development tricks

Trigger new tests by modifying settings from existing test runs

To trigger new tests with custom settings the command line client openqa-client can be used. To trigger new tests relying on all settings from existing tests runs but modifying specific settings the openqa-clone-job script can be used. Within the openQA repository the script is located at /usr/share/openqa/script/. This tool can be used to create a new job that adds, removes or changes settings.

openqa-clone-job --from localhost --host localhost 42 FOO=bar BAZ=

If you do not want a cloned job to start up in the same job group as the job you cloned from, e.g. to not pollute build results, the job group can be overwritten, too, using the special variable _GROUP. Add the quoted group name, e.g.:

openqa-clone-job --from localhost 42 _GROUP="openSUSE Tumbleweed"

The special group value 0 means that the group connection will be separated and the job will not appear as a job in any job group, e.g.:

openqa-clone-job --from localhost 42 _GROUP=0

Backend variables for faster test execution

The os-autoinst backend offers multiple test variables which are helpful for test development. For example:

  • Set _EXIT_AFTER_SCHEDULE=1 if you only want to evaluate the test schedule before the test modules are executed

  • Use _SKIP_POST_FAIL_HOOKS=1 to prevent lengthy post_fail_hook execution in case of expected and known test fails, for examples when you need to create needles anyway

Using snapshots to speed up development of tests

For lower turn-around times during test development based on virtual machines the QEMU backend provides a feature that allows a job to start from a snapshot which can help in this situation.

Depending on the use case, there are two options to help:

  • Create and preserve snapshots for every test module run (MAKETESTSNAPSHOTS)

    • Offers more flexibility as the test can be resumed almost at any point. However disk space requirements are high (expect more than 30GB for one job)

    • This mode is useful for fixing non-fatal issues in tests and debugging SUT as more than just the snapshot of the last failed module is saved.

  • Create a snapshot after every successful test module while always overwriting the existing snapshot to preserve only the latest (TESTDEBUG)

    • Allows to skip just before the start of the first failed test module, which can be limiting, but preserves disk space in comparison to MAKETESTSNAPSHOTS.

    • This mode is useful for iterative test development

In both modes there is no need to modify tests (i.e. adding milestone test flag as the behaviour is implied). In the later mode every test module is also considered fatal. This means the job is aborted after the first failed test module.

Enable snapshots for each module

  • Run the worker with --no-cleanup parameter. This will preserve the hard disks after test runs.

  • Set MAKETESTSNAPSHOTS=1 on a job. This will make openQA save a snapshot for every test module run. One way to do that is by cloning an existing job and adding the setting:

openqa-clone-job --from https://openqa.opensuse.org  --host localhost 24 MAKETESTSNAPSHOTS=1
  • Create a job again, this time setting the SKIPTO variable to the snapshot

  • you need. Again, openqa-clone-job comes handy here:

openqa-clone-job --from https://openqa.opensuse.org  --host localhost 24 SKIPTO=consoletest-yast2_i
  • Use qemu-img snapshot -l something.img to find out what snapshots are in the image. Snapshots are named "test module category"-"test module name" (e.g. installation-start_install).

Storing only the last sucessful snapshot

  • Run the worker with --no-cleanup parameter. This will preserve the hard disks after test runs.

  • Set TESTDEBUG=1 on a job. This will make openQA save a snapshot after each successful test module run. Snapshots are overwritten. The snapshot is named lastgood in all cases.

openqa-clone-job --from https://openqa.opensuse.org  --host localhost 24 TESTDEBUG=1
  • Create a job again, this time setting the SKIPTO variable to the snapshot which failed on previous run. Make sure the new job will also have TESTDEBUG=1 set. This can be ensured by the use of the clone_job script on the clone source job or specifying the variable explicitly:

openqa-clone-job --from https://openqa.opensuse.org  --host localhost 24 TESTDEBUG=1 SKIPTO=consoletest-yast2_i

Defining a custom test schedule or custom test modules

Normally the test schedule, that is which test modules should be executed and which order, is prescribed by the main.pm file within the test distribution. Additionally it is possible to exclude certain test modules from execution using the os-autoinst test variables INCLUDE_MODULES and EXCLUDE_MODULES as well as define a custom schedule using the test variable SCHEDULE. Also test modules can be defined and overridden on-the-fly using a downloadable asset.

EXCLUDE_MODULES

If a job has the following schedule:

  • boot/boot_to_desktop

  • console/systemd_testsuite

  • console/docker

The module console/docker can be excluded with:

openqa-clone-job --from https://openqa.opensuse.org --host https://openqa.opensuse.org 24 EXCLUDE_MODULES=docker

The schedule would be:

  • boot/boot_to_desktop

  • console/systemd_testsuite

Note
Excluding modules that are not scheduled does not raise an error.

INCLUDE_MODULES

If a job has the following schedule:

  • boot/boot_to_desktop

  • console/systemd_testsuite

  • console/docker

The module console/docker can be excluded with:

openqa-clone-job --from https://openqa.opensuse.org --host https://openqa.opensuse.org 24 INCLUDE_MODULES=boot_to_desktop,systemd_testsuite

The schedule would be:

  • boot/boot_to_desktop

  • console/systemd_testsuite

Note
Including modules that are not scheduled does not raise an error, but they are not scheduled.

SCHEDULE

Additionally it is possible to define a custom schedule using the test variable SCHEDULE.

openqa-clone-job --from https://openqa.opensuse.org --host https://openqa.opensuse.org 24 SCHEDULE=tests/boot/boot_to_desktop,tests/console/consoletest_setup
Note
Any existing test module within CASEDIR can be scheduled.

SCHEDULE + ASSET_<NR>_URL

Test modules can be defined and overridden on-the-fly using a downloadable asset (combining ASSET_<NR>_URL and SCHEDULE).

For example one can schedule a job on a production instance with a custom schedule consisting of two modules from the provided test distribution plus one test module which is defined dynamically and downloaded as an asset from an external trusted download domain:

openqa-clone-job --from https://openqa.opensuse.org --host https://openqa.opensuse.org 24 SCHEDULE=tests/boot/boot_to_desktop,tests/console/consoletest_setup,foo,bar ASSET_1_URL=https://example.org/my/test/bar.pm  ASSET_2_URL=https://example.org/my/test/foo.pm
Note
The asset number doesn’t affect the schedule order.
The test modules foo.pm and bar.pm will be downloaded into the root of the pool directory where tests and assets are used by isotovideo. For this reason, to schedule them, no path is needed.

A valid test module format looks like this:

use base 'consoletest';
use strict;
use testapi;

sub run {
    select_console 'root-console';
    assert_script_run 'foo';
}

sub post_run_hook {}
1;

For example this can be used in bug investigations or trying out new test modules which are hard to test locally. https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc describes the SCHEDULE parameter in details as well as the others. The section "Asset handling" in the Users Guide describes how downloadable assets can be specified. It is important to note that the specified asset is only downloaded once. New versions must be supplied as new, unambiguous download target file names.

Triggering tests based on an any remote git refspec or open github pull request

openQA also supports to trigger tests using test code from an open pull request on github or any branch or git refspec. That means that code changes that are not yet available on a production instance of openQA can be tested safely to ensure the code changes work as expected before merging the code intro a production repository and branch. This works by setting the CASEDIR parameter of os-autoinst to a valid git repository path including an optional branch/refspec specifier. See https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc for details.

A helper script openqa-clone-custom-git-refspec is available for convenience that supports some combinations.

To clone one job within a remote instance based on an open github pull request the following syntax can be used:

openqa-clone-custom-git-refspec $GITHUB_PR_URL $OPENQA_TEST_URL

For example:

openqa-clone-custom-git-refspec https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6649 https://openqa.opensuse.org/tests/839191

Keep in mind that if PRODUCTDIR is overwritten it might not relate to the state of the specified git refspec. For example the asset caching functionality will override the PRODUCTDIR variable. The above method can still be used in this case in the common case that the test schedule does not need to be changed or in case a custom schedule is defined by SCHEDULE.

openQA test harness result processing

Introduction

From time to time, a test developer might want to use openQA to execute a test suite from a different test harness than openQA, but still use openQA to setup test scenarios and prepare the environment for a test suite run; for this case openQA has the ability to process logs from external harnesses, and display the results integrated within the job results of the webUI.

One could say that a Test Harness is supported if its output is compatible with the available {parser-format}, such as LTP, and also xUnit or JUnit, but this can be easily extended to include more formats, such as RSpec or TAP.

The requirements to use this functionality, are quite simple:

  • The test harness must produce a compatible format with supported {parser-format}.

  • The test results can be uploaded via testapi::parse_extra_log within an openQA tests.

  • The test results can also be uploaded via web Web Api endpoint.

openQA will store these results in its own internal format for easier presentation, but still will allow the original file to be downloaded.

Usage

If a test developer wishes to use the functional interface, after finishing the execution of the the testing too, calling testapi::parse_extra_log with the location to a the file generated.

openQA test distribution

From within a common openQA test distribution, a developer can use parse_extra_log to upload a text file that contains a supported test output:

script_run('prove --verbose --formatter=TAP::Formatter::JUnit t/28-logging.t > junit-logging.xml');
parse_extra_log('XUnit','junit-logging.xml');

Available parser formats

Current parser formats:

  • OpenQA::Parser::Format::TAP,

  • OpenQA::Parser::Format::JUnit

  • OpenQA::Parser::Format::LTP

  • OpenQA::Parser::Format::XUnit,

Extending the parser

OOP Interface

The parser is a base class that acts as a serializer/deserializer for the elements inside of it, it allows to be extended so new formats can be easily added.

The base class is exposing 4 Mojo::Collections available, according to what openQA would require to map the results correctly, 1 extra collection is provided for arbitrary data that can be exposed. The collections represents respectively: test results, test definition and test output.

Structured data

In structured data mode, elements of the collections are objects. They can be of any type, even though subclassing or objects of type of OpenQA::Parser::Result are prefered.

One thing to keep in mind, is that in case deeply nested objects need to be parsed like hash of hashes, array of hashes, they would need to subclass OpenQA::Parser::Result or OpenQA::Parser::Results respectively.

As an example, JUnit format can be parsed this way:

use OpenQA::Parser::Format::JUnit;

my $parser_result = OpenQA::Parser::Format::JUnit->new->load("file.xml");

# Now we can access to parsed tests as seen by openQA:

$parser_result->tests->each(sub {

   my $test = shift;
   print "Test name: ".$test->name;

});

my @all = $parser_result->tests->each;
my @tests = $parser->tests->search(name => qr/1_running_upstream_tests/);
my $first = $parser->tests->search(name => qr/1_running_upstream_tests/)->first();

my $binary_data = $parser->serialize();

# Now, we can also store $binary_data and retrieve it later.

my $new_parser_from_binary = OpenQA::Parser::Format::JUnit->new->deserialize($binary_data);

# thus this works as expected:
$new_parser_from_binary->tests->each( sub {

   my $test = shift;
   print "Test name: ".$test->name;

});

# We can also serialize all to JSON

my $json_serialization = $parser->to_json;

# save it and access it later

my $from_json = OpenQA::Parser::Format::JUnit->from_json($json_serialization);

openQA internal test result storage

It is important to know that openQA’s internal mapping for test results works operating almost entirely on the filesystem, leaving only the test modules to be registered into the database, this leads to the following relation: A test module’s name is used to create a file with details (details-$testmodule.json), that will contain a reference to step details, which is a collection of references to files, using a field "text" as tie in, and expecting a filename.

openQA pitfalls

Needle editing

  • If a new needle is created based on a failed test, the new needle will not be listed in old tests. However, when opening the needle editor, a warning about the new needle will be shown and it can be selected as base.

  • If an existing needle is updated with a new image or different areas, the old test will display the new needle which might be confusing.

  • If a needle is deleted, old tests may display an error when viewing them in the web UI.

403 messages when using scripts

  • If you come across messages displaying ERROR: 403 - Forbidden, make sure that the correct API key is present in client.conf file.

  • If you are using a hostname other than localhost, pass --host foo to the script.

  • If you are using fake authentication method, and the message says also "api key expired" you can simply logout and log in again in the webUI and the expiration will be automatically updated

Mixed production and development environment

There are few things to take into account when running a development version and a packaged version of openqa:

If the setup for the development scenario involves sharing /var/lib/openqa, it would be wise to have a shared group openqa, that will have write and execute permissions over said directory, so that geekotest user and the normal development user can share the environment without problems.

This approach will lead to a problem when the openqa package is updated, since the directory permissions will be changed again, nothing a chmod -R g+rwx /var/lib/openqa/ and chgrp -R openqa /var/lib/openqa can not fix.

Performance impact

openQA workers can cause high I/O load, especially when creating VM snapshots. The impact therefore gets more severe when MAKETESTSNAPSHOTS is enabled. should not impact the stability of openQA jobs but can increase job execution time. If you run jobs on a machine where responsiveness of other services matter, for example your desktop machine, consider patching the IOSchedulingPriority of a workers service file as described in the systemd documentation, for example set IOSchedulingPriority=7 for the lowest priority. If not available then you can try to execute the worker processes with ionice to reduce the risk of your system becoming significantly impacted by snapshot creation. Loading VM snapshots can also have an impact on SUT behavior as the execution of the first step after loading a snapshot might be delayed. This can lead to problems if the executed tests do not foresee an appropriate timeout margin.

DB migration from SQlite to postgreSQL

As a first step to start using postgreSQL, please, configure postgreSQL database according to the postgreSQL setup guide

To migrate api keys run following commands:

  • Export data from the SQlite db:

sqlite3 db.sqlite -csv -separator ',' 'select * from api_keys;' > apikeys.csv

Note: SQlite database file is located in /var/lib/openqa/db by default.

  • Import data to the postgreSQL

# openqa is the postgreSQL database name and apikeys.csv is api keys export file
psql -U postgres -d openqa -c "copy api_keys from 'apikeys.csv' with (format csv);"

In case you need to migrate job groups, test suites, use dump_templates and load_templates scripts accordingly.

Steps to debug developer mode setup

This is basically a checklist to go through in case the developer mode is broken in your setup:

  1. Be sure to have everything up to date. That includes relevant packages on the machine hosting the web UI and on the worker.

  2. Check whether the web browser can reach the livehandler daemon. Go to a running test and open the live view. Then open the JavaScript console of the web browser. If it contains messages like Received message via ws proxy: ... the livehandler daemon can be reached. Otherwise, try the following sub-steps:

    1. The installation guide has been updated to cover the developer mode. In case you installed your instance before the developer mode has been introduced, make sure that the Apache module rewrite is enabled (via a2enmod rewrite). Also be sure the vhost configuration looks like the one found in the openQA Git repository (especially the part for the reverse proxies).

    2. Check whether openqa-livehandler.service is running. It is supposed to be run on the same machine as the web UI and should actually be started automatically as a dependency of openqa-webui.service.

  3. Check whether the livehandler can reach the os-autoinst command server. Go to a running test and open the live view. Then open the JavaScript console of the web browser. If it contains messages like Received message via ws proxy: {...,"type":"info","what":"cmdsrvmsg"} + the os-autoinst command server can be reached. Otherwise there should be at least a message like +Received message via ws proxy: {"what":"connecting to os-autoinst command server at ws:\/\/hostname:20053\/xhB84lUuPlMfhDEF\/ws",...} which contains the URL the livehandler is attempting to query. In this case try the following sub-steps:

    1. If the hostname is wrong, add WORKER_HOSTNAME = correcthostname to workers.ini. The worker should then tell the web UI that it is reachable via correcthostname resulting in a correct URL for the os-autoinst command server.

    2. It might also be the case that the firewall is blocking the HTTP/websocket connection on the required port. The required port is QEMUPORT plus 1. By default, QEMUPORT is $worker_instance_number \* 10 \ 20002+.

Networking in OpenQA

Important
This overview is valid only when using the QEMU backend!

The networking type used is controlled by the NICTYPE variable. If unset or empty NICTYPE defaults to user, i.e. QEMU user networking which requires no further configuration.

For more advanced setups or tests that require multiple jobs to be in the same networking the TAP or VDE based modes can be used.

QEMU User Networking

With QEMU user networking each jobs gets its own isolated network with TCP and UDP routed to the outside. DHCP is provided by QEMU. The MAC address of the machine can be controlled with the NICMAC variable. If not set, it is 52:54:00:12:34:56.

TAP Based Network

os-autoinst can connect QEMU to TAP devices of the host system to leverage advanced network setups provided by the host by setting NICTYPE=tap.

The TAP device to use can be configured with the TAPDEV variable. If not defined, it is automatically set to "tap" + ($worker_instance - 1), i.e. worker1 uses tap0, worker 2 uses tap1 and so on.

For multiple networks per job (see NETWORKS variable), the following numbering scheme is used:

worker1: tap0 tap64 tap128 ...
worker2: tap1 tap65 tap129 ...
worker3: tap2 tap66 tap130 ...
...

The MAC address of each virtual NIC is controlled by the NICMAC variable or automatically computed from $worker_id if not set.

In TAP mode the system administrator is expected to configure the network, required internet access, etc. on the host manually.

VDE Based Network

Virtual Distributed Ethernet provides a software switch that runs in user space. It allows to connect several QEMU instances without affecting the system’s network configuration.

The openQA workers need a vde_switch instance running. The workers reconfigure the switch as needed by the job.

Basic, Single Machine Tests

To start with a basic configuration like QEMU user mode networking, create a machine with the following settings:

  • VDE_SOCKETDIR=/run/openqa

  • NICTYPE=vde

  • NICVLAN=0

Start the switch and user mode networking:

systemctl enable --now openqa-vde_switch
systemctl enable --now openqa-slirpvde

With this setting all jobs on the same host would be in the same network and share the same SLIRP instance.

Multi Machine Tests Setup

The section provides one of the ways for setting up the openQA environment to run tests that require network connection between several machines (e.g. client — server tests).

The example of the configuration is applicable for openSUSE and will use Open vSwitch for virtual switch, firewalld (or SuSEfirewall2 for older versions) for NAT and wicked as network manager. Keep in mind that a firewall is not strictly necessary for operation. The operation without firewall is not covered in all necessary details in this documentation.

Note
Another way to setup the environment with iptables and firewalld is described on the Fedora wiki.

Set Up Open vSwitch

Compared to VDE setup, Open vSwitch is slightly more complicated to configure, but provides a more robust and scalable network.

  • Install and Run Open vSwitch:

zypper in openvswitch
systemctl enable --now openvswitch
  • Install and configure os-autoinst-openvswitch.service:

Note
os-autoinst-openvswitch.service is a support service that sets the vlan number of Open vSwitch ports based on NICVLAN variable - this separates the groups of tests from each other. The NICVLAN variable is dynamically assigned by the OpenQA scheduler. Install, start and enable the service:
zypper in os-autoinst-openvswitch
systemctl enable --now os-autoinst-openvswitch

The service os-autoinst-openvswitch.service uses br0 bridge by default. As it might be used by KVM already it is suggested to configure br1 instead:

# /etc/sysconfig/os-autoinst-openvswitch
OS_AUTOINST_USE_BRIDGE=br1
  • Create the virtual bridge br1:

ovs-vsctl add-br br1

Configure Virtual Interfaces

  • Add a tap interface for every multi-machine worker instance:

Note
Create as many interfaces as needed for a test. The instructions are provided for three interfaces tap0, tap1, tap2 to be used by worker@1, worker@2, worker@3 worker instances. The TAP interfaces have to be owned by the _openqa-worker user for the openQA worker instances to be able to access them.

To create tap interfaces automatically on startup, add appropriate configuration files to the /etc/sysconfig/network/ directory. Files have to be named as ifcfg-tap<N>, replacing <N> with the number for the interface, such as 0, 1, 2 (e.g. ifcfg-tap0, ifcfg-tap1):

# /etc/sysconfig/network/ifcfg-tap0
BOOTPROTO='none'
IPADDR=''
NETMASK=''
PREFIXLEN=''
STARTMODE='auto'
TUNNEL='tap'
TUNNEL_SET_GROUP='nogroup'
TUNNEL_SET_OWNER='_openqa-worker'

Symlinks can be used to reference the same configuration file for each tap interface.

  • Add the bridge config with all tap devices that should be connected to it. The file has to be located in the /etc/sysconfig/network/ directory. File name is ifcfg-br<N>, where <N> is the id of the bridge (e.g. 1):

# /etc/sysconfig/network/ifcfg-br1
BOOTPROTO='static'
IPADDR='10.0.2.2/15'
STARTMODE='auto'
OVS_BRIDGE='yes'
OVS_BRIDGE_PORT_DEVICE_1='tap0'
OVS_BRIDGE_PORT_DEVICE_2='tap1'
OVS_BRIDGE_PORT_DEVICE_3='tap2'

Configure NAT with firewalld

To configure NAT with firewalld assign the bridge interface to the internal zone and the interface with access to the network to the external zone:

firewall-cmd --zone=external --add-interface=eth0
firewall-cmd --zone=internal --add-interface=br1

To enable the virtual machines used by openQA to fully access the external network masquerading needs to be enabled on all involved zones:

firewall-cmd --zone=external --add-masquerade
firewall-cmd --zone=internal --add-masquerade

IP forwarding is enabled automatically if masquerading is enabled:

grep 1 /proc/sys/net/ipv4/ip_forward
1

In case the interface is in a trusted network it is possible to accept connections by default by changing the zone target:

firewall-cmd --zone=external --set-target=ACCEPT

Alternatively, you can assign the interface to the trusted zone. Make sure to enable masquerading for the trusted zone as well in this case.

If you are happy with the changes make them persistent:

firewall-cmd --runtime-to-permanent

If you do not currently have the firewalld service running, you can instead use the firewall-cmd-offline command for the configuration. In this case start the firewall and enable the service to run on system startup:

systemctl enable --now firewalld

Also, the firewall-config GUI tool for firewalld can be used for configuration.

For older versions of openSUSE/SLE: Configure NAT with SuSEfirewall2

The IP 10.0.2.2 can be also served as a gateway to access the outside network. For this, NAT between br1 and eth0 must be configured with SuSEfirewall2 or iptables:

# /etc/sysconfig/SuSEfirewall2
FW_DEV_INT="br1"
FW_ROUTE="yes"
FW_MASQUERADE="yes"

Start SuSEfirewall2 and enable the service to start on system startup:

systemctl enable --now SuSEfirewall2

Configure OpenQA Worker Instances

  • Allow worker intstances to run multi-machine jobs:

# /etc/openqa/workers.ini
[global]
WORKER_CLASS = qemu_x86_64,tap
Note
The number of tap devices should correspond to the number of the running worker instances. For example, if you have set up 3 tap devices, the same number of worker instances should be configured.
  • Enable worker instances to be started on system boot:

systemctl enable openqa-worker@1
systemctl enable openqa-worker@2
systemctl enable openqa-worker@3

Grant CAP_NET_ADMIN Capabilities to QEMU

In order to let QEMU create TAP devices on demand it is required to set CAP_NET_ADMIN capability on QEMU binary file:

zypper in libcap-progs
setcap CAP_NET_ADMIN=ep /usr/bin/qemu-system-x86_64

Configure network interfaces

  • Check the configuration for the eth0 interface:

Important
Ensure, that eth0 interface is configured in /etc/sysconfig/network/ifcfg-eth0. Otherwise, wicked will not be able to bring up the interface on start and the host will loose network connection:
# /etc/sysconfig/network/ifcfg-eth0
BOOTPROTO='dhcp'
BROADCAST=''
ETHTOOL_OPTIONS=''
IPADDR=''
MTU=''
NAME=''
NETMASK=''
REMOTE_IPADDR=''
STARTMODE='auto'
DHCLIENT_SET_DEFAULT_ROUTE='yes'
  • Pros of wicked over NetworkManager:

    • Proper IPv6 support

    • openvswitch/vlan/bonding/bridge support - wicked can manage your advanced configuration transparently without the need of extra tools

    • Backwards compatible with ifup scripts

  • Check the network service currently being used:

systemctl show -p Id network.service

If the result is different from Id=wicked.service (e.g. NetworkManager.service), stop the network service:

systemctl disable --now network.service
  • Then switch to wicked and start the service:

systemctl enable --force wicked
systemctl start wicked
  • Bring up the br1 interface:

wicked ifup br1
  • Reboot

Note
It is also possible to switch the network configuration using YaST.

Debugging Open vSwitch Configuration

Boot sequence with wicked (version 0.6.23 and newer):

  1. openvswitch (as above)

  2. wicked - creates the bridge br1 and tap devices, adds tap devices to the bridge,

  3. SuSEfirewall

  4. os-autoinst-openvswitch - installs openflow rules, handles vlan assignment

The configuration and operation can be checked with the following commands:

ovs-vsctl show # shows the bridge br1, the tap devices are assigned to it
ovs-ofctl dump-flows br1 # shows the rules installed by os-autoinst-openvswitch in table=0

When everything is ok and the machines are able to communicate, the ovs-vsctl should show something like the following:

Bridge "br0"
    Port "br0"
        Interface "br0"
            type: internal
    Port "tap0"
        Interface "tap0"
    Port "tap1"
        tag: 1
        Interface "tap1"
    Port "tap2"
        tag: 1
        Interface "tap2"
  ovs_version: "2.11.1"
Note
Notice the tag numbers are assigned to tap1 and tap2. They should have the same number.
Note
If the balance of the tap devices is wrong in the workers.ini the tag cannot be assigned and the communication will be broken.

Check the flow of packets over the network:

  • packets from tapX to br1 create additional rules in table=1

  • packets from br1 to tapX increase packet counts in table=1

  • empty output indicates a problem with os-autoinst-openvswitch service

  • zero packet count or missing rules in table=1 indicate problem with tap devices

iptables -L -v

As long as the SUT has access to external network, there should be a non-zero packet count in the forward chain between the br1 and external interface.

GRE Tunnels

By default all multi-machine workers have to be on single physical machine. You can join multiple physical machines and its ovs bridges together by a GRE tunnel.

If the workers with TAP capability are spread across multiple hosts, the network must be connected. See Open vSwitch documentation for details.

Create a gre_tunnel_preup script (change the remote_ip value correspondingly on both hosts):

# /etc/wicked/scripts/gre_tunnel_preup.sh
#!/bin/sh
action="$1"
bridge="$2"
ovs-vsctl --may-exist add-port $bridge gre1 -- set interface gre1 type=gre options:remote_ip=<IP address of other host>

And call it by PRE_UP_SCRIPT="wicked:gre_tunnel_preup.sh" entry:

# /etc/sysconfig/network/ifcfg-br1
<..>
PRE_UP_SCRIPT="wicked:gre_tunnel_preup.sh"

Allow GRE in firewall:

# /etc/sysconfig/SuSEfirewall2
FW_SERVICES_EXT_IP="GRE"
FW_SERVICES_EXT_TCP="1723"
Note
When using GRE tunnels keep in mind that virtual machines inside the ovs bridges have to use MTU=1458 for their physical interfaces (eth0, eth1). If you are using support_server/setup.pm the MTU will be set automatically to that value on support_server itself and it does MTU advertisement for DHCP clients as well.

openQA developer guide

Introduction

openQA is an automated test tool that makes it possible to test the whole installation process of an operating system. It’s free software released under the GPLv2 license. The source code and documentation are hosted in the os-autoinst organization on GitHub.

This document provides the information needed to start contributing to the openQA development improving the tool, fixing bugs and implementing new features. For information about writing or improving openQA tests, refer to the Tests Developer Guide. In both documents it’s assumed that the reader is already familiar with openQA and has already read the Starter Guide. All those documents are available at the official repository.

Development guidelines

As mentioned, the central point of development is the os-autoinst organization on GitHub where several repositories can be found:

As in most projects hosted on GitHub, pull request are always welcome and are the right way to contribute improvements and fixes.

Rules for commits

  • Every commit is checked in CI as soon as you create a pull request, but you should run the tidy script locally, i.e. before every commit call:

./script/tidy

to ensure your Perl code changes are consistent with the style rules.

  • You may also run local tests on your machine or in your own development environment to verify everything works as expected. Call:

make test

for style checks, unit and integration tests.

To execute a single test, one can tweak the test execution with the variables in the Makefile or use prove after pointing to a local test database in the environment variable TEST_PG. Also, If you set a custom base directory, be sure to unset it when running tests.

Example:

TEST_PG='DBI:Pg:dbname=openqa_test;host=/dev/shm/tpg' OPENQA_BASEDIR= prove -v t/14-grutasks.t

In the case of wanting to tweak the tests as above, to speed up the test initialization, start PostgreSQL using t/test_postgresql instead of using the system service. E.g.

t/test_postgresql /dev/shm/tpg

To check the coverage by individual test files easily call e.g.

env CHECKSTYLE=0 PROVE_ARGS=t/24-worker-engine.t make coverage

and take a look into the generated coverage HTML report in cover_db/coverage.html.

We use annotations in some places to mark "uncoverable" code such as this:

# uncoverable subroutine

See the docs for details https://metacpan.org/pod/Devel::Cover

  • For git commit messages use the rules stated on How to Write a Git Commit Message as a reference

  • Every pull request is reviewed in a peer review to give feedback on possible implications and how we can help each other to improve

If this is too much hassle for you feel free to provide incomplete pull requests for consideration or create an issue with a code change proposal.

Getting involved into development

But developers willing to get really involved into the development of openQA or people interested in following the always-changing roadmap should take a look at the openQAv3 project in openSUSE’s project management tool. This Redmine instance is used to coordinate the main development effort organizing the existing issues (bugs and desired features) into 'target versions'.

Currently developers meet in IRC channel #opensuse-factory and in a weekly jangouts call of the core developer team.

In addition to the ones representing development sprints there is another version that is always open. Future improvements groups features that are in the developers' and users' wish list but that have little chances to be addressed in the short term, either because the return of investment is not worth it or because they are out of the current scope of the development. Developers looking for a place to start contributing are encouraged to simply go to that list and assign any open issue to themselves.

openQA and os-autoinst repositories also include test suites aimed at preventing bugs and regressions in the software. codecov is configured in the repositories to encourage contributors to raise the tests coverage with every commit and pull request. New features and bug fixes are expected to be backed with the corresponding tests.

Technologies

Everything in openQA, from os-autoinst to the web frontend and from the tests to the support scripts is written in Perl. So having some basic knowledge about that language is really desirable in order to understand and develop openQA. Of course, in addition to bare Perl, several libraries and additional tools are required. The easiest way to install all needed dependencies is using the available os-autoinst and openQA packages, as described in the Installation Guide.

In the case of os-autoinst, only a few CPAN modules are required. Basically Carp::Always, Data::Dump. JSON and YAML. On the other hand, several external tools are needed including QEMU, Tesseract and OptiPNG. Last but not least, the OpenCV library is the core of the openQA image matching mechanism, so it must be available on the system.

The openQA package is built on top of Mojolicious, an excellent Perl framework for web development that will be extremely familiar to developers coming from other modern web frameworks like Sinatra and that have nice and comprehensive documentation available at its home page.

In addition to Mojolicious and its dependencies, several other CPAN modules are required by the openQA package. For a full list of hard dependencies, see the file cpanfile at the root of the openQA repository.

openQA relies on PostgreSQL to store the information. It used to support SQLite, but that is no longer possible.

As stated in the previous section, every feature implemented in both packages should be backed by proper tests. Test::More is used to implement those tests. As usual, tests are located under the /t/ directory. In the openQA package, one of the tests consists of a call to Perltidy to ensure that the contributed code follows the most common Perl style conventions.

Starting the webserver from local Git checkout

  • To start the webserver for development, use the scripts/openqa daemon.

  • The other daemons (mentioned in the architecture diagram) are started in the same way, e.g. script/openqa-scheduler daemon.

  • openQA will pull the required assets on the first run.

  • openQA uses SASS. Under openSUSE, installing rubygem(sass) should be sufficient.

  • It is also useful to start openQA with morbo which allows applying changes without restarting the server: morbo -m development -w assets -w lib -w templates -l http://localhost:9526 script/openqa daemon

  • In case you have problems with broken rendering of the web page it can help to delete the asset cache and let the webserver regenerate it on first startup. For this delete the subdirectories .sass-cache/, assets/cache/ and assets/assetpack.db. Make sure to look for error messages on startup of the webserver and to force the refresh of the web page in your browser.

Handling of dependencies

  • Add 3rd party JavaScript and CSS file to assets/assetpack.def. When restarting the web server the new/updated files are pulled automatically. Also take care to update the asset cache for the openSUSE RPM package.

  • Other dependencies need to be added to openQA.spec or os-autoinst.spec.

  • Perl dependencies need to be added additionally to cpanfile.

  • To easily get all necessary dependencies on openSUSE you can install the package openQA-devel. In other cases one can rely on the cpanfile and read out the dependencies from the spec file for the rest.

Remarks

  • New dependencies are only available in the Docker container which is used to run CI tests after the PR adding these dependencies has been merged. Besides, the build of that container must not be broken (see build results on OBS).

  • The os-autoinst repository uses the same container as the openQA repository which is made using docker/travis_test/Dockerfile within the openQA repository.

Update asset cache for openSUSE RPM package

  1. Clone the repository (or a branch to it if you do not have the rights to push directly) locally, e.g. osc co devel:openQA/openQA.

  2. Run bash update-cache.sh inside the repository folder. Follow the log checking no download errors occurred.

  3. Do a sanity check on the generated cache.txz. It usually should not be smaller than before, contain the newly added sources and must not contain any empty files.

  4. Add an entry to the changes file using osc vc openQA.changes.

  5. osc ci -m 'Update asset cache'

Managing the database

During the development process there are cases in which the database schema needs to be changed. there are some steps that have to be followed so that new database instances and upgrades include those changes.

When is it required to update the database schema?

After modifying files in lib/OpenQA/Schema/Result. However, not all changes require to update the schema. Adding just another method or altering/adding functions like has_many doesn’t require an update. However, adding new columns, modifying or removing existing ones requires to follow the steps mentioned above.

How to update the database schema

  1. First, you need to increase the database version number in the $VERSION variable in the lib/OpenQA/Schema.pm file. Note that it’s recommended to notify the other developers before doing so, to synchronize in case there are more developers wanting to increase the version number at the same time.

  2. Then you need to generate the deployment files for new installations, this is done by running ./script/initdb --prepare_init.

  3. Afterwards you need to generate the deployment files for existing installations, this is done by running ./script/upgradedb --prepare_upgrade. After doing so, the directories dbicdh/$ENGINE/deploy/<new version> and dbicdh/$ENGINE/upgrade/<prev version>-<new version> for PosgreSQL should have been created with some SQL files inside containing the statements to initialize the schema and to upgrade from one version to the next in the corresponding database engine.

  4. Migration scripts to upgrade from previous versions can be added under dbicdh/_common/upgrade. Create a <prev_version>-<new_version> directory and put some files there with DBIx commands for the migration. For examples just have a look at the migrations which are already there.

The above steps are only for preparing the required SQL statements, but do not actually alter the database. Before doing so, it is recommended to backup your database to be able to downgrade again if something goes wrong or you just need to continue working on another branch. To do so, the following command can be used to create a copy:

createdb -O ownername -T originaldb newdb

To actually create or update the database (after creating a backup as described), you should run either ./script/initdb --init_database or ./script/upgradedb --upgrade_database. This is also required when the changes are installed in a production server.

How to add fixtures to the database

Note: This section is not about the fixtures for the testsuite. Those are located under t/fixtures.

Note: This section might not be relevant anymore. At least there are currently none of the mentioned directories with files containing SQL statements present.

Fixtures (initial data stored in tables at installation time) are stored in files into the dbicdh/_common/deploy/_any/<version> and dbicdh/_common/upgrade/<prev_version>-<next_version> directories.

You can create as many files as you want in each directory. These files contain SQL statements that will be executed when initializing or upgrading a database. Note that those files (and directories) have to be created manually.

Executed SQL statements can be traced by setting the DBIC_TRACE environment variable.

export DBIC_TRACE=1

How to setup PostgreSQL to test locally with production data

  1. Install PosgreSQL - under openSUSE the following package are required: postgresql-server postgresql-init

  2. Start the server: systemctl start postgresql

  3. The following steps need to be done by the user postgres: su - postgres

  4. Create user: createuser your_username where your_username must be the same as the UNIX user you start your local openQA instance with.

  5. Create database: createdb -O your_username openqa

  6. The next steps must be done by the user you start your local openQA instance with.

  7. Import dump: pg_restore -c -d openqa path/to/dump

  8. Configure openQA to use PostgreSQL as described in the section Database of the installation guide. User name and password are not required.

How to overwrite config files

It can be necessary during development to change the config files in etc/. For example you have to edit etc/openqa/database.ini to use another database. Or to increase the log level it’s useful to set the loglevel to debug in etc/openqa/openqa.ini.

To avoid these changes getting in your git workflow, copy them to a new directory and set OPENQA_CONFIG in your shell setup files.

cp -ar etc/openqa etc/mine
export OPENQA_CONFIG=$PWD/etc/mine

Note that OPENQA_CONFIG points to the directory containing openqa.ini, database.ini, client.conf and workers.ini.

Adding new authentication module

OpenQA comes with three authentication modules providing authentication methods: OpenID, iChain and Fake (see User authentication).

All authentication modules reside in lib/OpenQA/Auth directory. During OpenQA start, [auth]/method section of /etc/openqa/openqa.ini is read and according to its value (or default OpenID) OpenQA tries to require OpenQA::WebAPI::Auth::$method. If successful, module for given method is imported or the OpenQA ends with error.

Each authentication module is expected to export auth_login and auth_logout functions. In case of request-response mechanism (as in OpenID), auth_response is imported on demand.

Currently there is no login page because all implemented methods use either 3rd party page or none.

Authentication module is expected to return HASH:


%res = (
    # error = 1 signals auth error
    error => 0|1
    # where to redirect the user
    redirect => ''
);

Authentication module is expected to create or update user entry in OpenQA database after user validation. See included modules for inspiration.

Customize base directory

It is possible to customize the openQA base directory (which is for instance used to store test results) by setting the environment variable OPENQA_BASEDIR. The default value is /var/lib. Be sure to clear that variable when running unit tests locally (see next section).

Running tests of openQA itself

Beside simply running the testsuite, it is also possible to use containers. Using containers, tests are executed in the same environment as on CircleCI. This allows to reproduce issues specific to that environment.

Run tests without container

Be sure to install all required dependencies. The package openQA-devel will provide them.

If the package is not available the dependencies can also be found in the file openQA.spec in the openQA repository. In this case also the package perl-Selenium-Remote-Driver is required to run UI tests. You also need to install chromedriver and either chrome or chromium for the UI tests.

To execute the testsuite use make test. This will also initialize a temporary PostgreSQL database used for testing. To do this step manually run t/test_postgresql /dev/shm/tpg to initialize a temporary PostgreSQL database and export the environment variable as instructed by that script. It is also possible to run a particular test, for example prove t/api/01-workers.t.

To watch the execution of the UI tests, set the environment variable NOT_HEADLESS.

Run tests with Docker

To run tests in Docker please be sure that Docker is installed and the Docker daemon is running. To launch the test suite first it is required to pull the docker image:

docker pull registry.opensuse.org/devel/openqa/containers/openqa_dev:latest

This Docker image is provided by the OBS repository https://build.opensuse.org/package/show/devel:openQA/openqa_dev and based on the Dockerfile within the docker/travis_test sub directory of the openQA repository.

Build the image using Makefile target:

make docker-test-build

Note that the image created by that target is called openqa:latest while the raw container pulled from OBS is called openqa_dev:latest.

Launch the tests using Makefile target:

make launch-docker-to-run-tests-within

Run tests by invoking Docker manually, e.g.:

docker run -v OPENQA_LOCAL_CODE:/opt/openqa -e VAR1=1 -e VAR2=1 openqa:latest make run-tests-within-container

Replace OPENQA_LOCAL_CODE with the location where you have the openQA code.

The command line to run tests manually reveals that the Makefile target run-tests-within-container is used to run the tests inside the container. It does some preparations to be able to run the full stack test within Docker and considers a few environment variables defining our test matrix:

CHECKSTYLE=1

FULLSTACK=0

UITESTS=0

FULLSTACK=0

UITESTS=1

FULLSTACK=1

SCHEDULER_FULLSTACK=1

DEVELOPER_FULLSTACK=1

GH_PUBLISH=true

So by replacing VAR1 and VAR2 with those values one can trigger the different tests of the matrix.

Of course it is also possible to run (specific) tests directly via prove instead of using the Makefile targets.

Tips

Commands passed to docker run will be executed after the initialization script (which does database creation and so on). So if there is the need to run an interactive session after it just do:

docker run -it -v OPENQA_LOCAL_CODE:/opt/openqa openqa:latest bash

Of course you can also use make run-tests-within-container \; bash to run the tests first and then open a shell for further investigation.

There is also the possibility to change the initialization scripts with the --entrypoint switch. This allows us to go into an interactive session without any initialization script run:

docker run -it --entrypoint /bin/bash -v OPENQA_LOCAL_CODE:/opt/openqa registry.opensuse.org/devel/openqa/containers/openqa_dev

In case there is the need to follow what is happening in the currently running container (the execution will terminate the session):

docker exec -ti $(docker ps | awk '!/CONTAINER/{print $1}') /bin/bash

Running UI tests in non-headless mode is also possible, eg.:

xhost +local:root
docker run --rm -ti --name openqa-testsuite -v /tmp/.X11-unix:/tmp/.X11-unix:rw -e DISPLAY="$DISPLAY" -e NOT_HEADLESS=1 openqa:latest prove -v t/ui/14-dashboard.t
xhost -local:root

It is also possible to use a custom os-autoinst checkout using the following arguments:

docker run … -e CUSTOM_OS_AUTOINST=1 -v /path/to/your/os-autoinst:/opt/os-autoinst make run-tests-within-container

By default, configure and make are still executed (so a clean checkout is expected). If your checkout is already prepared to use, set CUSTOM_OS_AUTOINST_SKIP_BUILD to prevent this. Be aware that the build produced outside of the container might not work inside the container if both environments provide different, incompatible library versions (eg. OpenCV).

It is also important to mention that your local repositories will be copied into the container. This can take very long if those are big, e.g. when the openQA repo contains a lot of profiling data because you enabled Mojolicious::Plugin::NYTProf.

In general, if starting the tests via Docker seems to hang, it is a good idea to inspect the process tree to see which command is currently executed.

Logging behavior

Logs are redirected to a logfile when running tests within the CI. The output can therefore not be asserted using Test::Output. This can be worked around by temporarily assigning a different Mojo::Log object to the application. To test locally under the same condition set the environment variable OPENQA_LOGFILE.

Note that redirecting the logs to a logfile only works for tests which run OpenQA::Setup::setup_log. In other tests the log is just printed to the standard output. This makes use of Test::Output simple but it should be taken care that the test output is not cluttered by log messages which can be quite irritating.

openQA circleci workflow

Goal

Provide a way to run tests with pre-approved list of dependencies both in CI and locally

Dependency artefacts

  • dependencies.txt list of dependencies to test against.

  • autoinst.sha contains sha of os-autoinst commit for integration testing. When value is empty, the testing will run against latest master

Managing and troubleshooting dependencies

dependencies.txt and autoinst.sha are aimed to represent those dependencies which change often. In normal workflow these files are generated automatically by dedicated Bot, then go in PR through CI, then reviewed and accepted by human. So, in normal workflow it is guaranteed that everyone always works on list of correct and approved dependencies (unless they explicitly tell CI to use custom dependencies).

The Bot tracks dependencies only in master branch by default, but this may be extended in circleci config file. The Bot uses .circleci/build_dependencies.sh script to detect any changes. This script can be used manually as well. Alternatively just add newly introduced dependencies into dependencies.txt, so CI will run tests with them.

Occasionally it may be a challenge to work with dependencies.txt (e.g. package version is not available anymore). In such case you can either try to rebuild dependencies.txt using .circleci/build_dependencies.sh or just remove all entries and put only openQA-devel into it Script .circleci/build_dependencies.sh can be also modified when major changes are performed, e.g. different OS version or packages from forked OBS project, etc.

Run tests locally using docker

One way is to build image using build_local_docker.sh script, start container and then use the same commands one would use to test locally.

# Optionally pull recent base image, otherwise it may be outdated
docker pull registry.opensuse.org/devel/openqa/ci/containers/base:latest
.circleci/build_local_docker.sh # will create image based on content of dependnencies.txt and autoinst
docker run -it --rm -v $(pwd):/opt/testing_area localtest bash -c 'eval "$(t/test_postgresql | grep TEST_PG=)" && PERL5LIB=lib prove -v t/ui/25*'

Alternatively, start container and execute commands in it, then

docker run --rm --name t1 -v $(pwd):/opt/testing_area localtest tail -f /dev/null & sleep 1
docker exec -it t1 bash -c 'eval "$(t/test_postgresql | grep TEST_PG=)" && PERL5LIB=lib prove -v t/ui/25-developer_mode.t'
docker stop -t 0 t1

Run tests using circleci tool

After installing circleci tool following commands will be available. They will build container and use committed changes from current local branch

circleci local execute --job test1
circleci local execute --job testui
circleci local execute --job testfullstack
circleci local execute --job testdeveloperfullstack

Changing config.cnf

Command to verify yaml with circleci tool

circleci config process .circleci/config.yml

Building Plugins

Not all code needs to be included in openQA itself. openQA also supports the use of 3rd party plugins that follow the standards for plugins used by the Mojolicious web framework. These can be distributed as normal CPAN modules and installed as such alongside openQA.

Plugins are a good choice especially for extensions to the UI and HTTP API, but also for notification systems listening to various events inside the web server.

If your plugin was named OpenQA::WebAPI::Plugin::Hello, you would install it in one of the include directories of the Perl used to run openQA, and then configure it in openqa.ini. The plugins setting in the global section will tell openQA what plugins to load.

# Tell openQA to load the plugin
[global]
plugins = Hello

# Plugin specific configuration (optional)
[hello_plugin]
some = value

The plugin specific configuration is optional, but if defined would be available in $app→config→{hello_plugin}.

To extend the UI or HTTP API there are various named routes already defined that will take care of authentication for your plugin. You just attach the plugin routes to them and only authenticated requests will get through.

package OpenQA::WebAPI::Plugin::Hello;
use Mojo::Base 'Mojolicious::Plugin';

sub register {
    my ($self, $app, $config) = @_;

    # Only operators may use our plugin
    my $ensure_operator = $app->routes->find('ensure_operator');
    my $plugin_prefix = $ensure_operator->any('/hello_plugin');

    # Plain text response (under "/admin/hello_plugin/")
    $plugin_prefix->get('/' => sub {
      my $c = shift;
      $c->render(text => 'Hello openQA!');
    })->name('hello_plugin_index');

    # Add a link to the UI menu
    $app->config->{plugin_links}{operator}{'Hello'} = 'hello_plugin_index';
}

1;

The plugin_links configuration setting can be modified by plugins to add links to the operator and admin sections of the openQA UI menu. Route names or fully qualified URLs can be used as link targets. If your plugin uses templates, you should reuse the bootstrap layout provided by openQA. This will ensure a consistent look, and make the UI menu available everywhere.

% layout 'bootstrap';
% title 'Hello openQA!';
<div>
  <h2>Hello openQA!</h2>
</div>

For UI plugins there are two named authentication routes defined:

  1. ensure_operator: under /admin/, only allows logged in users with operator privileges

  2. ensure_admin: under /admin/, only allows logged in users with admin privileges

And for HTTP API plugins there are four named authentication routes defined:

  1. api_public: under /api/v1/, allows access to everyone

  2. api_ensure_user: under /api/v1/, only allows authenticated users

  3. api_ensure_operator: under /api/v1/, only allows authenticated users with operator privileges

  4. api_ensure_admin: under /api/v1/, only allows authenticated nusers with admin privileges

To generate a minimal installable plugin with a CPAN distribution directory structure you can use the Mojolicious tools. It can be packaged just like any other Perl module from CPAN.

$ mojo generate plugin -f OpenQA::WebAPI::Plugin::Hello
...
$ cd OpenQA-WebAPI-Plugin-Hello/
$ perl Makefile.PL
...
$ make test
...

And if you need code examples, there are some plugins included with openQA.

openQA branding

You can alter the appearance of the openQA web UI to some extent through the 'branding' mechanism. The 'branding' configuration setting in the 'global' section of /etc/openqa/openqa.ini specifies the branding to use. It defaults to 'openSUSE', and openQA also includes the 'plain' branding, which is - as its name suggests - plain and generic.

To create your own branding for openQA, you can create a subdirectory of /usr/share/openqa/templates/branding (or wherever openQA is installed). The subdirectory’s name will be the name of your branding. You can copy the files from branding/openSUSE or branding/plain to use as starting points, and adjust as necessary.

Web UI template

openQA uses the Mojolicious framework’s templating system; the branding files are included into the openQA templates at various points. To see where each branding file is actually included, you can search through the files in the templates tree for the text include_branding. Anywhere that helper is called, the branding file with the matching name is being included.

The branding files themselves are Mojolicious 'Embedded Perl' templates just like the main template files. You can read the Mojolicious Documentation for help with the format.