Checks Specification
A language allowing to declare best practices to be adhered on target SAP Infrastructures.
Introduction
The need this Specification aims to fulfill is to provide users a simple way to declare what we (the Trento Team) often refer to as “Checks”.
Checks Execution
Checks Execution is the process that determines whether the best practices defined in the Checks Specifications are being followed on a target infrastructure.
Requesting an Execution
An Execution can be requested to start by providing Wanda the following information:
-
an execution identifier
-
an execution Group identifier
-
the Checks Selection for the targets (a list of checks to be executed on the targets)
When the Execution starts running, its current state is stored in the Database and the targets are notified - via the message broker - about Facts to be gathered.
Then the Execution waits for the Facts Gathering to complete.
Facts Gathering
After an Execution Request the targets are notified about the facts they need to gather.
Whenever a target has gathered all the needed facts for an Execution, it notifies Wanda - via the message broker - about the Gathered Facts.
Expectation Evaluation
Expectation Evaluation is the process of evaluating the Expectations using the received Gathered Facts to obtain the result of a check.
This will only happen once Gathered Facts are received from all the targets.
After the result has been determined, the currently running
Execution transitions to completed
and its new state is tracked on
the Database.
At this point the Execution is considered Completed and interested parties are notified about the Execution Completion.
Checks Results
Once an execution is completed, a checks result should give feedback on what aspects of a target infrastructure adhere to the best practices and which don’t.
Possible results:
-
passing
, everything ok -
warning
, best practice not followed, should fix -
critical
, best practice not followed, must fix
See also Check Severity.
Anatomy of a Check
A Check declaration comes in the form of a yaml
file and all the
Checks together build up the Checks Catalog
Here’s an example:
id: "156F64"
name: Corosync `token` timeout
group: Corosync
description: Corosync `token` timeout is set to the correct value
remediation: |
## Abstract
The value of the Corosync `token` timeout is not set as recommended.
## Remediation
Adjust the corosync `token` timeout as recommended...
severity: warning
metadata:
target_type: cluster
cluster_type: hana_scale_up
hana_scenario: performance_optimized
provider: [azure, nutanix, kvm, vmware]
facts:
- name: corosync_token_timeout
gatherer: corosync.conf
argument: totem.token
customization_disabled: true
values:
- name: expected_token_timeout
customization_disabled: true
default: 5000
conditions:
- value: 30000
when: env.provider == "azure" || env.provider == "aws"
- value: 20000
when: env.provider == "gcp"
expectations:
- name: token_timeout
expect: facts.corosync_token_timeout == values.expected_token_timeout
Filename Convention
Note that a Check’s filename MUST be in the form <check_id>.yaml
(ie: 156F64.yaml
)
Structure
Following are listed the top level properties of a Check definition yaml.
Key | Required/Not Required | Details |
---|---|---|
|
required |
|
|
required |
|
|
required |
|
|
required |
|
|
required |
|
|
not required |
|
|
not required |
|
|
required |
|
|
not required |
|
|
not required |
|
|
required |
id
Uniquely identifies a Check in the Catalog. The value must be a hexadecimal number formatted as string using quotes.
ie:
id: "156F64"
id: "845CC9"
id: "B089BE"
name
A, preferably one-line, string representing the name for the Check being declared.
ie:
name: Corosync `token` timeout
name: Corosync `consensus` timeout
name: SBD Startmode
group
A, preferably one-line, string representing the group where the Check being declared belongs.
Example:
group: Corosync
group: Pacemaker
group: SBD
description
A text providing a description for the Check being declared.
can be a one-liner
description: Some plain description
can be a multiline text
description: |
Some plain multiline
description that carries a lot
of information
format is markdown
description: |
A `description` is a **markdown**
remediation
A text providing an comprehensive description about the remediation to apply for the Check being declared.
It has the same properties of the description
-
can be a one-liner (it usually is not)
-
can be a multiline (it usually is)
-
format is markdown
Example:
remediation: |
## Abstract
The value of the Corosync `token` timeout is not set as recommended.
## Remediation
Adjust the corosync `token` timeout as recommended on the best
...
2. Reload the corosync configuration:
...
severity
A string determining the severity of the Check being declared, in case the check is not passing, so that the appropriate result is reported.
Allowed values: warning
, critical
Default: if no severity is provided, the system would default to
critical
Example:
Reports a warning
When the Check expectations do not pass
severity: warning
Reports a critical
When the Check expectations do not pass
severity: critical
metadata
A key-value map that enriches the Check being declared by providing extra information about when to consider it as applicable given a specific env
-
keys must be non empty strings (
foo
,bar
,foo_bar
,qux1
) -
values can be any of the following types
string
,number
,boolean
,string[]
(list of strings) -
target_type
is a required key of themetadata
map. It’s value is astring
.
Example:
metadata:
target_type: example_target
foo: bar
bar: 42
baz: true
qux: [foo, bar, baz]
Metadata is used when: - querying checks from the catalog - loading relevant checks for an execution (when requesting an execution to start either via the rest API or via a message through the message broker)
How does the matching work?
For each of the metadata key-value the system checks whether a matching key is present in the current context (catalog or execution env) and if so, whether the value matches the one declared in the check.
For a check to be considered applicable all the metadata key-value pairs should match something in the env.
Any extra key in the env not having a corresponding one in the check metadata is ignored.
Notes: - a string in the env (ie env.qux
being baz
) can match
either a plain string as in qux: baz
and a string contained in a
list as in qux: [foo, bar, baz]
- an empty env always matches any
metadata - an empty metadata always matches any env
Matching example
let env = #{
foo: "bar",
qux: "baz"
}
metadata:
foo: bar
bar: 42
baz: true
qux: baz
Not matching example
let env = #{
foo: "bar",
qux: "baz",
baz: false
}
metadata:
foo: bar
bar: 42
baz: true
qux: [foo, bar, baz]
Facts, Values, Expectations
See main sections Facts, Values, Expectations
Facts
Facts are the core data on which the engine evaluates the state of the target infrastructure. Examples include (but are not limited to) installed packages, cluster state, and configuration files content.
The process of determining the value of a declared fact during Check execution is referred to as Facts Gathering and it is the responsibility of the Gatherers. Gatherers could be seen as functions that have a name and accept argument(s).
That said, a fact declaration contains:
-
the fact name
-
the gatherer used to retrieve the fact
-
the argument(s) to be provided to the gatherer
Note:
-
many facts can be declared
-
all the declared facts would be registered in the
facts
namespaced evaluation scope.
facts:
- name: <fact_name>
gatherer: <gatherer_name>
argument: <gatherer_argument>
- name: <another_fact_name>
gatherer: <another_gatherer_name>
argument: <another_gatherer_argument>
The following example declares a fact named
corosync_token_timeout
, retrievable via the built-in
corosync.conf
gatherer to which will be provided the argument
totem.token
facts:
- name: corosync_token_timeout
gatherer: corosync.conf
argument: totem.token
# other facts maybe
Finally, gathered facts, are used in Check’s Expectations to determine whether expected conditions are met for the best practice to be adhered.
Disable Customization
Users can modify a check’s expected values to accommodate specific system and environmental configurations.
By default, built-in checks are customizable. The
customization_disabled
flag provides a way to disable
customizability when needed.
To disable customization for a check the following bit of specification is required:
customization_disabled: true
Opting out from customizability at the root of a check’s specification makes all the values of the given check not customizable.
Notes:
-
Setting
customization_disabled: false
has no real effect as by default a check is customizable -
The
customization_disabled
flag can be also applied to specific values
Values
Values are named variables that may evaluate differently based on the execution context and are used with Facts for Contextual Expectations Evaluation.
Hardcoded Values
Direct usage of a simple hardcoded value
expectations:
- name: awesome_expectation
expect: facts.awesome_fact == "wanda"
Constant Values
Define a Value with only the default
specified (omitting
conditions
) for constants regardless of the context.
values:
- name: awesome_constant_value
default: "wanda"
expectations:
- name: awesome_expectation
expect: facts.awesome_fact == values.awesome_constant_value
Contextual Values
This is needed because the same check might expect facts to be treated differently based on the context.
Let’s clarify with an example:
A Check might define a fact named awesome_fact
which is expected to
be different given the color of the execution.
-
it has to be
cat
when thecolor
in the execution context isred
-
it has to be
dog
when thecolor
in the execution context isblue
-
it has to be
rabbit
in all other cases, regardless of the execution context
so we define a named variable awesome_expectation
that resolves to
cat|dog|rabbit
when proper conditions are met
allowing us to have an expectation like this
expect: facts.awesome_fact == values.awesome_expectation
A Value declaration contains:
-
the value name
-
the default value
-
a list of conditions that determine the value given the context (optional, see constant values)
values:
- name: <value_name>
default: <default_value>
conditions:
- value: <value_on_condition_a>
when: <expression_a>
- value: <value_on_condition_b>
when: <expression_b>
It could read as:
the value named <value_name>
resolves to
-
<value_on_condition_a>
when<expression_a>
is true -
<value_on_condition_b>
when<expression_b>
is true -
<default_value>
in all other cases
Example:
Check 156F64 Corosync token timeout is set to expected value
defines
a fact corosync_token_timeout
which is expected to be different
given the platform (aws/azure/gcp), so we define a named variable
expected_token_timeout
resolving to the appropriate value.
expected_token_timeout
resolves to:
-
30000
whenazure
/aws
are detected -
20000
ongcp
-
5000
in all other cases (ie: bare metal, VMs…)
values:
- name: expected_token_timeout
default: 5000
conditions:
- value: 30000
when: env.provider == "azure" || env.provider == "aws"
- value: 20000
when: env.provider == "gcp"
expectations:
- name: corosync_token_timeout_is_correct
expect: facts.corosync_token_timeout == values.expected_token_timeout
that conditions is a cascading chain of contextual inspection
to determine which is the resolved value.
|
-
there may be many conditions
-
first condition that passes determines the value, following are not evaluated
-
when
entry Expression has access to gathered facts and env evaluation scopes
All the resolved declared values would be registered in the
values
namespaced evaluation scope.
Customizable Values
A check’s expected values are customizable by default, and to provide finer control to the global-level customizability opt-out it is possible to opt-out customizability on a per-value basis.
values:
- name: non_customizable_check_value
customization_disabled: true
default: 5000
Setting customization_disabled: false
for a specific value
prevents the modification of the default value.
Expectations
Expectations are assertions on the state of a target infrastructure for a given scenario. By using fact and values they are able to determine if a check passes or not.
An Expectation declaration contains:
-
the expectation name
-
the expectation expression itself with access to gathered facts and resolved values
-
an optional failure message
-
an optional warning message, only available in expect_enum expectations
expectations:
- name: <expectation_name>
expect: <expectation_expression>
- name: <another_expectation_name>
expect: <another_expectation_expression>
failure_message: <something_went_wrong>
- name: <yet_another_expectation_name>
expect_same: <yet_another_expectation_expression>
Extra considerations:
-
there can be many expectations for a single Check
-
an expectation can be one of three types:
expect
,expect_same
orexpect_enum
-
a Check passes when all the expectations are satisfied
Example
expectations:
- name: token_timeout
expect: facts.corosync_token_timeout == values.expected_token_timeout
- name: awesome_expectation
expect: facts.awesome_fact == values.awesome_expected_value
In the previous example a Checks passes (is successful) if all expectations are met, meaning that
facts.corosync_token_timeout == values.expected_token_timeout AND facts.awesome_fact == values.awesome_expected_value
expect
This type of expectation is satisfied when, after facts gathering, the
expression is true
for all the targets involved in the current
execution.
Execution Scenario:
-
2 targets [
A
,B
] -
selected Checks [
corosync_check
] -
some environment (context)
facts:
- name: corosync_token_timeout
gatherer: corosync.conf
argument: totem.token
values: ...
expectations:
- name: corosync_token_timeout_is_correct
expect: facts.corosync_token_timeout == values.expected_token_timeout
Considering the previous scenario what happens is that:
-
the fact
corosync_token_timeout
is gathered on all targets (A
andB
in this case) -
the expectation expression gets executed against the
corosync_token_timeout
fact gathered on every targets.-
targetA.corosync_token_timeout == values.expected_token_timeout
-
targetB.corosync_token_timeout == values.expected_token_timeout
-
-
every evaluation has to be
true
expect_same
This type of expectation is satisfied when, after facts gathering, the expression’s return value is the same for all the targets involved in the current execution, regardless of the value itself.
Execution Scenario:
-
2 targets [
A
,B
,C
] -
selected Checks [
some_check
] -
some environment (context)
expectations:
- name: awesome_expectation
expect_same: facts.awesome_fact
Considering the previous scenario what happens is that:
-
the fact
awesome_fact
is gathered on all targets (A
,B
andC
in this case) -
the expectation expression gets executed for every target involved.
-
targetA.facts.awesome_fact
-
targetB.facts.awesome_fact
-
targetC.facts.awesome_fact
-
-
the expressions results has to be the same for every target
-
targetA.facts.awesome_fact == targetB.facts.awesome_fact == targetC.facts.awesome_fact
-
Example:
RPM version must be the same on all the targets, regardless of what version it is
facts:
- name: installed_rpm_version
gatherer: package_version
argument: rpm
expectations:
- name: installed_rpm_version_must_be_the_same_on_all_targets
expect_same: facts.installed_rpm_version
expect_enum
This type of expectation is satisfied when, after facts gathering, the
expression returns passing
, warning
or critical
. If no value
is returned, the result defaults to critical
. The final result of
this expectation is the aggretation of all the expectation evaluations
gathered in all the involved targets.
The aggregation returns: - passing
if all the targets evaluation is
passing
- warning
if any of the evaluations is warning
and
there is not any critical
result - critical
if any of the
evaluations is critical
In this expectation type the severity field of the check is ignored.
Execution Scenario:
-
2 targets [
A
,B
] -
selected Checks [
sbd_check
] -
some environment (context)
facts:
- name: sbd_devices
gatherer: sbd_config@v1
argument: SBD_DEVICE
values: ...
expectations:
- name: multiple_sbd_devices_configured
expect_enum: |
if facts.sbd_devices > values.passing_sbd_devices_count {
"passing"
} else if facts.sbd_devices == values.warning_sbd_devices_count {
"warning"
} else {
"critical"
}
- name: multiple_sbd_devices_configured_simple
expect_enum: |
if facts.sbd_devices > values.passing_sbd_devices_count {
"passing"
} else if facts.sbd_devices == values.warning_sbd_devices_count {
"warning"
}
Considering the previous scenario what happens is that:
-
the fact
sbd_devices
is gathered on all targets (A
andB
in this case) -
the expectation expression gets executed against the
sbd_devices
fact gathered on every targets. -
the evaluated value is exactly what the expression returns. If there is not any returned value,
critical
is returned, as in the 2nd expectation example. -
the evaluation result of all the targets is aggregated to compose the final expectation result.
failure_message
An optional failure message can be declared for every expectation.
In case of an expect
one, the failure message can interpolate
facts
and values
present in the check definition to provide more
meaningful insights:
expectations:
- name: awesome_expectation
expect: values.awesome_constant_value == facts.awesome_fact
failure_message: The expectation did not match ${values.awesome_constant_value}
The outcome of the interpolation is available in
ExpectationEvaluation
inside the API response.
In case of an expect_same
one, the failure message has to be a plain
string:
expectations:
- name: awesome_expectation
expect_same: facts.awesome_fact
failure_message: Boom!
This plain string is available in ExpectationResult
inside the API
response.
warning_message
An optional warning message that works exactly as the previous
failure message. This field is only available for
expect_enum expectations, and it is interpolated when
the expectation outcome is warning
.
expectations:
- name: awesome_expectation
expect_enum: |
if values.passing_value == facts.awesome_fact {
"passing"
} else if values.warning_value == facts.awesome_fact {
"warning"
}
failure_message: Critical!
warning_message: Warning!
The outcome of the interpolation is available in
ExpectationEvaluation
inside the API response, in the
failure_message
field.
Expression Language
Different parts of the Check declaration are places where an evaluation is needed.
Determine to what a value resolves during execution
when: <expression>
part of a Value’s condition
values:
- name: expected_token_timeout
default: 5000
conditions:
- value: 30000
when: env.provider == "azure" || env.provider == "aws"
- value: 20000
when: env.provider == "gcp"
Defining the Expectation of a Check
expect|expect_same: <expression>
expectations:
- name: token_timeout
expect: facts.corosync_token_timeout == values.expected_token_timeout
Evaluation Scope
Every expression has access to an evaluation scope, allowing to access relevant piece of information to run the expression.
Scopes are namespaced and access to items in the scope is name based.
env
env
is a map of information about the context of the running
execution, it is set by the system on each execution/check compilation.
Examples of entries in the scope. What is actually available during the execution depends on the scenario. Find the updated values in the reference column link.
name | Type | Reference | Applicable |
---|---|---|---|
|
one of |
No enum available |
All |
|
one of |
All |
|
|
one of |
|
|
|
one of |
|
|
|
one of |
|
|
|
one of |
|
|
|
one of |
|
|
|
one of |
|
facts
facts
is the map of the gathered facts, thus the scope varies based
on which facts have been declared in the relative section,
and are accessible in other sections by fact name.
facts:
- name: an_interesting_fact
gatherer: <some_gatherer>
argument: <some_argument>
- name: another_interesting_fact
gatherer: <another_gatherer_name>
argument: <another_gatherer_argument>
Available entries in scope, the value is what has been gathered on the targets:
Name |
---|
|
|
values
values
is the map of resolved variable names defined in the
relative section
values:
- name: expected_token_timeout
default: 5000
conditions:
- value: 30000
when: env.provider == "azure" || env.provider == "aws"
- value: 20000
when: env.provider == "gcp"
- name: another_variable_value
default: "blue"
conditions:
- value: "red"
when: env.should_be_red == true
Available entries in scope:
Name | Resolved to |
---|---|
|
|
|
|
Best practices and conventions
To have a standardized format for writing checks, follow the next best practices and conventions as much as possible:
-
The
id
field must be wrapped in double quotes to avoid any type of ambiguity, as this field must be of string format. -
The remaining
name
,description
,group
, andremediation
fields must not be wrapped in quotes, as they are text-based values always. -
Take advantage of markdown tags in the
name
,description
, andremediation
fields to make the text easy and compelling to read. -
The
name
field offacts
,values
, andexpectations
must followcamel_case
format.
For example:facts: - name: some_fact values: - name: expected_some_fact expectations: - name: some_expectation
-
Use 2 spaces to indent multiline expectation expressions.
-
Naming hardcoded values in the
values
section with thedefault
field is encouraged instead of putting hardcoded values in the expectation expression itself. This gives some meaning to the expected value and improves potential interaction with the Wanda API.
So this:expectations: - name: some_expectation expect: facts.foo == 30
would be:
values: - name: expected_foo default: 30 expectations: - name: some_expectation expect: facts.foo == values.expected_foo
-
If the gathered fact is compared to a value, using
value
andexpected_value
names for facts and values respectively is recommended, as it improves the meaning of the comparison.
For example:facts: - name: some_fact values: - name: expected_some_fact
-
Avoid adding prefixes such as
facts
orvalues
to the entries of these sections, as they already use this as a namespace. For example, the next example should be avoided, as thefacts
prefix would be redundant in the expectation expression:facts: - name: facts_some_fact
-
If the implemented expectation expression contains any kind of
&&
to combine multiple operations, consider adding them as individual expectations, as the final result is the combination of all of them.
So this:expectations: - name: some_expectation expect: facts.foo == values.expected_foo && facts.bar == values.expected_bar
would be:
expectations: - name: foo_expectation expect: facts.foo == values.expected_foo - name: bar_expectation expect: facts.bar == values.expected_bar
-
Pipe the expression language functions vertically in order to provide a better visual output of the code.
So this:expectations: - name: some_expectation expect: facts.foo.find(|item| item.id == "super").properties.find(|prop| prop.name == "good").value
would be:
expectations: - name: some_expectation expect: | facts.foo .find(|item| item.id == "super").properties .find(|prop| prop.name == "good").value
Keep in mind that some functions such as sort
anddrain
run in-place modifications, so they cannot be piped.