Operations

Operations in Trento follow five principles:

  1. Permissions: Only users with operation:all permissions can perform operations.

  2. Contextual UI paths: The path to request an operation depends on the target. To enable maintenance on an entire cluster, use the Operation button in the cluster details view. To enable maintenance on a specific node or resource, use the options menu in the corresponding row.

  3. Single operation concurrency: Trento permits only one operation at a time on a target. While an operation runs, all other operations for that target (host, cluster, SAP system, or SAP HANA database) are disabled.

  4. Safety policies: Internal policies prevent users from executing operations that violate established best practices, even if the user has the permission to so. For example, if a user attempts to stop a SAP HANA database managed by a cluster, Trento forbids the operation and displays a reason. Each operation has specific internal policies that are listed in the use cases below.

  5. Target heartbeat: The execution of an operation depends on the heartbeat status of the target host(s). If the heartbeat is not active the operation is not allowed.

    1. Single-host resource operations: The target host must have an active heartbeat. This applies to operations in resources like: Host, Cluster node, SAP instance, etc.

    2. Multi-host resource operations: At least one constituent target host must have an active heartbeat. This applies to operations in resources like: Cluster, Database, SAP system, etc.

Internal policies prevent users from making basic operational mistakes, but they cannot guarantee that an operation will not damage the environment. The user is ultimately responsible for performing operations.

Every operation request generates an entry in the activity log with a specific activity type. After a successful operation request, Trento attempts to perform the operation. The activity log records the completion of the operation, regardless of success, with the activity type Operation Completed. This entry correlates to the request entry (see Activity Log for details). If an operation fails, the Operation Completed entry provides additional troubleshooting data: the errors field provides detailed reasons for each affected target and the failed_step field identifies where the operation failed. In such cases, Trento restores the original state by rolling back the operation.

To prevent endless execution when an agent is unresponsive, an internal timeout constrains every operation. When the timeout is reached, the operation execution resets.

Host operations

For any registered host, the details view provides the following operations:

  • Apply saptune Solution: Enabled when an SAP workload is discovered on the host, but no saptune solution is applied. Trento restricts the available solutions based on the SAP workload type. If the SAP workload is a SAP HANA instance, the choice is between HANA and S4HANA-DESERVER. If the SAP workload is an application instance, the choice is between NETWEAVER and S4HANA-APPSERVER.

    • Activity type: Host Operation Requested

    • Internal policies: The SAP workload must be stopped.

    • Internal timeout: 5m

  • Change saptune Solution: Enabled when an SAP workload is discovered on the host and a saptune solution is already applied. As with the apply operation, Trento restricts the available solutions based on the SAP workload type.

    • Activity type: Host Operation Requested

    • Internal policies: The SAP workload must be stopped.

    • Internal timeout: 5m

  • Reboot Host: Schedules a reboot of the target host within one minute of the request.

    • Activity type: Host Operation Requested

    • Internal policies:

      • If an SAP workload is discovered on the host, it must be stopped.

      • If the host is a cluster node, the Pacemaker service must be disabled at boot and stopped (the node must be offline).

    • Internal timeout: 5m

Cluster operations

When a user requests a cluster operation, Trento checks the cluster state running crmadmin -qS NODE (NODE being the DC node). If the command returns any state other than IDLE, the operation fails. This restriction prevents users from executing operations while the cluster is in transition.

For any registered cluster, the details view provides the following operations:

  • At cluster level:

    • Cluster maintenance: Enabled when at least one node in the cluster is online. It turns maintenance on or off in the cluster, depending on the current status.

      • Activity type: Cluster Operation Requested

      • Internal policies: Not applicable.

      • Internal timeout: 5m

    • Refresh resources: Enabled when at least one node in the cluster is online. It refreshes all the resources in the cluster.

      • Activity type: Cluster Operation Requested

      • Internal policies: Not applicable.

      • Internal timeout: 5m

  • At node level:

    • Node maintenance: Enabled when the node is online. It turns maintenance on or off in the node, depending on its current status.

      • Activity type: Operation Requested on a cluster host

      • Internal policies: Not applicable.

      • Internal timeout: 5m

    • Enable Pacemaker at boot: Enabled when the service is disabled at boot in the corresponding host.

      • Activity type: Operation Requested on a cluster host

      • Internal policies: Not applicable.

      • Internal timeout: 5m

    • Disable Pacemaker at boot: Enabled when the service is enabled at boot in the corresponding host.

      • Activity type: Operation Requested on a cluster host

      • Internal policies: Not applicable.

      • Internal timeout: 5m

    • Set node online in cluster: Enabled when the node is offline.

      • Activity type: Operation Requested on a cluster host

      • Internal policies: In a SAP HANA cluster, if the node is managing a secondary instance, all the nodes managing primary instances must be online.

      • Internal timeout: 5m

    • Set node offline in cluster: Enabled when the node is online.

      • Activity type: Operation Requested on a cluster host

      • Internal policies: In a SAP HANA cluster, if the node is managing a primary instance, all the nodes managing secondary instances must be offline.

      • Internal timeout: 5m

  • At resource level:

    • Resource maintenance: Enabled when at least one node in the cluster is online. It turns maintenance on or off in the resource, depending on its current status.

      • Activity type: Cluster Operation Requested

      • Internal policies: Not applicable.

      • Internal timeout: 5m

    • Refresh resource: Enabled when at least one node in the cluster is online. It refreshes the resource.

      • Activity type: Cluster Operation Requested

      • Internal policies: Not applicable.

      • Internal timeout: 5m

SAP HANA operations

For any registered SAP HANA database, the details view provides the following operations at the top:

  • Start database: Enabled when the database, or any database site in a HANA replication setup, is stopped. It calls sapcontrol with the function StartSystem. In a HANA replication setup, it starts the entire database layer, calling the sapcontrol function in each database site ordered by the tier number from smallest to highest.

    • Options:

      • Timeout: Establishes the time in minutes that Trento waits for the database to start before initiating a rollback.

    • Activity type: Database Operation Requested

    • Internal policies:

      • If the database is managed by a Pacemaker cluster, the corresponding multistate resource or the cluster itself must be in maintenance mode.

    • Internal timeout: 12 hours

  • Stop database: Enabled when the database, or any database site in a HANA replication setup, is started. It calls sapcontrol with the function StopSystem. In a HANA replication setup, it stops the entire database layer, calling the sapcontrol function in each database site ordered by the tier number from highest to smallest.

    • Options:

      • Timeout: Establishes the time in minutes that Trento waits for the database to stop before initiating a rollback.

    • Activity type: Database Operation Requested

    • Internal policies:

      • If the database is managed by a Pacemaker cluster, the corresponding multistate resource or the cluster itself must be in maintenance mode.

      • If there is an application layer (SAP system) on top of the database, all application server instances must be stopped.

    • Internal timeout: 12 hours

For any registered SAP HANA database that is part of a HANA replication system, the details view provides the following operations in the different layout sections:

  • Start database: Enabled when the database site is stopped. It calls sapcontrol with the function StartSystem.

    • Options:

      • Timeout: Establishes the time in minutes that Trento waits for the database to start before initiating a rollback.

    • Activity type: Database Operation Requested

    • Internal policies:

      • If the database is managed by a Pacemaker cluster, the corresponding multistate resource or the cluster itself must be in maintenance mode.

      • If the database site is a secondary one, the database site being replicated must be started.

    • Internal timeout: 12 hours

  • Stop database: Enabled when the database site is started. It calls sapcontrol with the function StopSystem.

    • Options:

      • Timeout: Establishes the time in minutes that Trento waits for the database to stop before initiating a rollback.

    • Activity type: Database Operation Requested

    • Internal policies:

      • If the database is managed by a Pacemaker cluster, the corresponding multistate resource or the cluster itself must be in maintenance mode.

      • If the database site is being replicated (by a secondary or disaster recovery site), the replicating database site must be stopped.

      • If the database site is a primary one and there is an application layer (SAP system) on top of the database, all application server instances must be stopped.

    • Internal timeout: 12 hours

SAP operations

For any registered SAP system, the details view provides the following operations:

  • Start system: Enabled when at least one instance of the system is stopped. It calls sapcontrol with the function StartSystem and the parameter selected as Instance Type:

    • Options:

      • Instance Type: The available values are:

        • All instances: Select to start all the instances in the system.

        • ABAP: Select to start the instances with ABAP work processes.

        • J2EE: Select to start the instances with J2EE work processes.

        • ASCS/SCS: Select to start the instance with a message server and an enqueue server.

        • ENQREP: Select to start the instance with an enqueue replication server.

      • Timeout: Establishes the time in minutes that Trento waits for the SAP system to start before initiating a rollback.

    • Activity type: SAP System Operation Requested

    • Internal policies: If any of the instances included in the Instance Type selection is managed by a cluster, the corresponding resource or the cluster itself must be in maintenance mode.

    • Internal timeout: 1h

  • Stop system: Enabled when at least one instance of the system is started. It calls sapcontrol with the function StopSystem and the parameter selected as Instance Type:

    • Options:

      • Instance Type:

        • All instances: Select to stop all the instances in the system.

        • ABAP: Select to stop the instances with ABAP work processes.

        • J2EE: Select to stop the instances with J2EE work processes.

        • ASCS/SCS: Select to stop the instance with a message server and an enqueue server.

        • ENQREP: Select to stop the instance with an enqueue replication server.

      • Timeout: Establishes the time in minutes that Trento waits for the SAP system to stop before initiating a rollback.

    • Activity type: SAP System Operation Requested

    • Internal policies: If any of the instances included in the Instance Type selection is managed by a cluster, the corresponding resource or the cluster itself must be in maintenance mode.

    • Internal timeout: 1h

  • Start instance: Enabled when the instance is stopped. It calls sapcontrol with the corresponding instance number and function Start in the host where the instance was discovered.

    • Activity type: Application Instance Operation Requested

    • Internal policies:

      • If the instance is managed by a cluster, the corresponding resource or the cluster itself must be in maintenance mode.

      • In the case of an application server instance, the ASCS/SCS instance and the database must be running.

      • In the case of an ERS instance, the ASCS/SCS instance must be running.

    • Internal timeout: 5m

  • Stop instance: Enabled when the instance is started. It calls sapcontrol with the corresponding instance number and function Stop in the host where the instance was discovered.

    • Activity type: Application Instance Operation Requested

    • Internal policies:

      • If the instance is managed by a cluster, the corresponding resource or the cluster itself must be in maintenance mode.

      • In the case of an ASCS instance, the application server instances and, if it exists, the ERS instance must be stopped.

    • Internal timeout: 5m