SUSE AI Integration

This document provides a guide on how to use the Trento MCP Server with SUSE AI, including configuration and deployment instructions.

Getting Started with SUSE AI

SUSE AI is a platform that allows you to deploy and manage AI models and applications in a Kubernetes environment. It provides tools for model management, deployment, and integration with various AI frameworks.

Refer to the official SUSE AI documentation for detailed information. In this guide, we will focus on deploying the Trento MCP Server using SUSE AI, specifically with Ollama and Open Web UI. Always refer to the latest instructions in the SUSE AI deployment guide for the most accurate and up-to-date information.

Prerequisites

This guide assumes:

A Kubernetes cluster set up and running (with an ingress controller and cert-manager installed).
Access to the SUSE Application Collection.

When running Ollama models, this guide assumes you have one of the following:

A Kubernetes cluster with sufficient resources (GPU, memory, etc.).
A cloud provider and enough permissions to deploy Ollama models remotely.

Limitations

Deploying the entire SUSE AI stack requires plenty of resources, especially for running Ollama models. Alternatively, you can deploy only the Open Web UI and connect it to an Ollama instance running elsewhere. This is the approach we will take here: on-premises Open Web UI with Ollama running on a remote server (e.g., Google Cloud).

Getting the artifacts from SUSE Application Collection

You need access to SUSE Application Collection and the proper entitlements to download the required artifacts. Always refer to the SUSE Application Collection documentation for the latest instructions on how to authenticate and access the collection. This guide assumes you have the necessary credentials and access to the collection.

NOTE:To run the entire stack, you can also use the SUSE AI Deployer Helm Chart.

# Replace REPLACE_WITH_YOUR_USERNAME and REPLACE_WITH_YOUR_PASSWORD with your actual credentials
helm registry login dp.apps.rancher.io/charts -u <REPLACE_WITH_YOUR_USERNAME@apps.rancher.io> -p REPLACE_WITH_YOUR_PASSWORD

Create a namespace for SUSE AI

kubectl create namespace suse-ai

Create a Kubernetes Pull Secret (application-collection) for the SUSE Application Collection:

# Replace REPLACE_WITH_YOUR_USERNAME and REPLACE_WITH_YOUR_PASSWORD with your actual credentials
kubectl create -n suse-ai secret docker-registry application-collection --docker-server=dp.apps.rancher.io --docker-username=<REPLACE_WITH_YOUR_USERNAME@apps.rancher.io> --docker-password=REPLACE_WITH_YOUR_PASSWORD

Install Open Web UI

This section describes how to install the Open Web UI, which provides a user-friendly interface for interacting with AI models.

Create a values.openwebui.yaml file with the values for Open Web UI.

Install the Open Web UI using Helm and the values file (values.openwebui.yaml) created above. Note that you need to have cert-manager properly installed. Besides, an ingress controller is required. If you don’t have any, you might need to tweak up the Kubernetes services.

helm -n suse-ai upgrade --install open-webui oci://dp.apps.rancher.io/charts/open-webui -f values.openwebui.yaml

Deploying a model with Ollama remotely

If your infrastructure does not have enough resources to run Ollama models, you can deploy them on a remote server (e.g., Google Cloud Run).

This section describes how to deploy the Qwen3:8b model using Ollama in Google Cloud Run. This guide assumes you have a Google Cloud account and the necessary permissions to deploy applications on Google Cloud Run.

Always refer to the official Google documentation for the most accurate and up-to-date information.

Create a Dockerfile for the Ollama model.

In the same directory as the Dockerfile, run the following command to build the Docker image:

gcloud run deploy qwen3-8b --source . --concurrency 4 --cpu 8 --set-env-vars OLLAMA_NUM_PARALLEL=4 --gpu 1 --gpu-type nvidia-l4 --max-instances 1 --memory 32Gi --no-allow-unauthenticated --no-cpu-throttling --no-gpu-zonal-redundancy --timeout=600

for making requests you need either to allow unauthenticated requests or use a service account with the necessary permissions. Follow the Google Cloud Run documentation for more information on how to set up service accounts and permissions.

As a result, you will have a Google Cloud Run service running the Qwen3:8b model, which can be accessed via the URL provided by Google Cloud Run. It might look like https://qwen3-8b-<project-id>.<region>-.run.app.

Adding the remote model to Open Web UI

Once you have the Ollama model running on Google Cloud Run, you can add it to the Open Web UI for easy access.

Go to Open Web UI settings page (eg. suse-ai.example.com/admin/settings) and navigate to the “Connections” section.
Add a new Ollama connection with the URL of your deployed model (eg. https://qwen3-8b-<project-id>.<region>-.run.app).
Navigate to the “Models” section and click on “Manage Models”, select the Ollama connection you just created, and pull the Qwen3:8b model.

Deploying the MCP Server Trento

To deploy the Trento MCP Server using SUSE AI, you can use the provided Helm chart in the Trento helm-charts repository. This section describes how to deploy the MCP Server Trento with the necessary configurations.

This section will be improved in the future. For now, please refer to the official guide on how to use MCPO and OpenWebUI.

Adding the MCP Server Trento to Open Web UI

The MCP Server Trento can be added to the Open Web UI in two ways: as a user-defined connection or as a pre-configured connection for the model. The former needs a publicly accessible URL (the HTTP call happens in the user’s browser), while the latter can be used with a Kubernetes service (the HTTP call happens in the Open Web UI backend). In this guide, we will use the pre-configured connection.

Go to Open Web UI settings page (eg. suse-ai.example.com/admin/settings) and navigate to the “Tools” section.
Click on “Add connection” and add the FQDN internal URL for the Kubernetes service (eg. http://trento-for-suse-ai-trento-mcp-server-mcpo.suse-ai.svc.cluster.local/trento).
Navigate to the “Models” section and click on “Manage Models”, select the MCP Server Trento tool for the model. This will enable the MCP Server for Trento to be used with the model by any user in the Open Web UI.
Optionally, you can tweak the model configuration. For instance:
1. System prompt: /no_think You are focused on solving issues in SAP Systems. You will use the tools. Trento has several endpoints to discover SAP Systems, including HANA clusters and information about the hosts. Refer to the tools whenever possible. Never switch to Chinese, always English.
2. Context Length (Ollama): 8000