NVIDIA UFM Integration Plugin for Netris Controller
Overview
The Netris-UFM plugin provides seamless integration between Netris Controller and NVIDIA UFM (Unified Fabric Manager) for AI infrastructures with hybrid InfiniBand and Ethernet networks. This integration allows infrastructure operators to define compute multi-tenancy in a single place through Netris, significantly simplifying management across both network types.
Key Benefits
Unified Management Interface: Define tenant isolation by simply listing servers in a server-cluster object
Automated Provisioning: Automatically configure both Ethernet (via Netris) and InfiniBand (via UFM) networks
Simplified Operations: Eliminate the need to manage SwitchPorts, VLANs, VRFs on Ethernet and GUIDs, PKeys, SHARP groups on InfiniBand separately
Architecture
The Netris-UFM plugin acts as the integration layer between Netris Controller and NVIDIA UFM:
Netris Controller: Orchestrates the Ethernet switches and provides the primary user interface
NVIDIA UFM: Manages the InfiniBand switches and provides specialized InfiniBand functionality
Netris-UFM Plugin: Synchronizes configurations between both systems
When you define a server-cluster in Netris, the plugin automatically:
Discovers InfiniBand port GUIDs from UFM
Creates and manages appropriate PKeys in UFM
Sets up SHARP reservations for high-performance operations
Prerequisites
Before installing the Netris-UFM plugin, ensure:
A functioning Netris Controller environment
A properly configured NVIDIA UFM installation with limited membership enabled (see below)
Network connectivity between both systems
Appropriate access credentials for both platforms
UFM Configuration Requirements
To enable the UFM PKey REST API functionality required by the Netris-UFM integration, you must configure UFM to use limited membership by default.
Configure gv.cfg
Edit the UFM configuration file /opt/ufm/files/conf/gv.cfg
and add or modify the following section:
[MngNetwork]
default_membership = limited
Restart UFM Service
After making this configuration change, restart the UFM enterprise service:
systemctl restart ufm-enterprise
Important
This UFM configuration change is essential for the Netris-UFM integration to function properly. The setting enables limited membership by default, which allows servers on the subnet to communicate only when they have full membership in non-default partitions managed by the Netris-UFM plugin.
Installation
Option 1: Deploy within an existing Netris Controller Kubernetes cluster
This option is recommended if you already have a Netris Controller running in a Kubernetes environment.
Download the Kubernetes deployment YAML file:
wget https://get.netris.io/netris-controller-ufm.yaml
Edit the YAML file to update the secret values based on your environment:
apiVersion: v1 kind: Secret metadata: name: netris-controller-nvidia-ufm-agent-envs namespace: netris-controller type: Opaque stringData: NETRIS_CONTROLLER_ADDR: "https://netris.example.com" NETRIS_CONTROLLER_LOGIN: "netris" NETRIS_CONTROLLER_PASSWORD: "newNet0ps" NETRIS_VERIFY_SSL: "true" NETRIS_SITE_NAME: "Site" UFM_ADDR: "https://ufm.example.com" UFM_LOGIN: "admin" UFM_PASSWORD: "123456" UFM_VERIFY_SSL: "false" UFM_ID: "ufm-lab" UFM_PKEY_RANGE: "100-7ffe"
Apply the configuration to your Kubernetes cluster:
kubectl apply -f netris-controller-ufm.yaml
Option 2: Deploy as a standalone Docker container
This option is ideal for environments without Kubernetes or when you want to deploy on a separate host.
Create an environment file (e.g.,
env
) with the following content:NETRIS_CONTROLLER_ADDR="https://netris.example.com" NETRIS_CONTROLLER_LOGIN="netris" NETRIS_CONTROLLER_PASSWORD="newNet0ps" NETRIS_VERIFY_SSL="true" NETRIS_SITE_NAME="Site" UFM_ADDR="https://ufm.example.com" UFM_LOGIN="admin" UFM_PASSWORD="123456" UFM_VERIFY_SSL="false" UFM_ID="ufm-lab" UFM_PKEY_RANGE="100-7ffe" LOG_LEVEL="info"
Run the Docker container:
docker run -d \ --env-file=env \ --name=netris-ufm \ --entrypoint "/app/servicebin" \ netrisai/bare-metal-netris-ufm-agent:0.3.0
Configuration Parameters
Netris Controller Configuration
Parameter |
Description |
Example |
---|---|---|
NETRIS_CONTROLLER_ADDR |
The URL of your Netris Controller |
|
NETRIS_CONTROLLER_LOGIN |
Username for authenticating with Netris Controller |
netris |
NETRIS_CONTROLLER_PASSWORD |
Password for authenticating with Netris Controller |
newNet0ps |
NETRIS_VERIFY_SSL |
Whether to verify SSL certificates when connecting to Netris Controller |
true or false |
NETRIS_SITE_NAME |
The name of the site in Netris Controller to manage |
Datacenter-1 |
NVIDIA UFM Configuration
Parameter |
Description |
Example |
---|---|---|
UFM_ADDR |
The URL of your NVIDIA UFM server |
|
UFM_LOGIN |
Username for authenticating with UFM |
admin |
UFM_PASSWORD |
Password for authenticating with UFM |
123456 |
UFM_VERIFY_SSL |
Whether to verify SSL certificates when connecting to UFM |
true or false |
UFM_ID |
Unique identifier for this UFM instance |
ufm-lab |
UFM_PKEY_RANGE |
Range of PKey IDs that can be allocated to clusters, in hexadecimal format |
100-7ffe |
Agent Configuration
Parameter |
Description |
Default |
Example |
---|---|---|---|
LOG_LEVEL |
Logging level for the agent |
info |
info or debug |
RECONCILE_INTERVAL |
Interval in seconds between reconciliation operations |
10 |
10 |
Usage Guide
After successfully installing and configuring the Netris-UFM agent, follow these steps to set up and use the integration:
1. Server Configuration in Netris
The first step is to create servers in the Netris Controller inventory that match exactly with the servers in UFM:
In Netris Controller, navigate to Network → Topology → +Add.
Create servers with identical names as they appear in UFM (this is crucial for proper GUID mapping)
Once created, the Netris-UFM agent will automatically sync the InfiniBand GUIDs from UFM into Netris
Important
Server names must match exactly between UFM and Netris Controller for the integration to work properly.
2. Create a Server Cluster Template
Next, create a Server Cluster Template.
Navigate to Services → Server Cluster Template.
Click Add to create a new template
Configure the template using JSON with specific sections for different network fabrics. Use Infiniband Fabric Example.
4. Create Server Clusters
After setting up the template, create server clusters as described in Creating Server Cluster.
3. Verification
Once the server cluster is created:
The Netris-UFM agent will automatically:
Identify the InfiniBand GUIDs associated with the servers in the cluster
Provision appropriate PKeys in UFM
Create necessary SHARP reservations if applicable
Verify the configuration:
Check the Netris Controller UI for successful cluster creation
Examine the UFM UI to confirm PKey assignments
Test connectivity between servers in the cluster via InfiniBand
4. Monitoring Integration Status
To monitor the status of the integration:
Check the Netris-UFM agent logs (as described in the Monitoring section)
Verify the synchronization state:
# For Kubernetes kubectl logs -f deployment/netris-controller-nvidia-ufm-agent -n netris-controller # For Docker docker logs -f netris-ufm
Functional Workflow
Discovery Phase:
Plugin connects to both Netris Controller and NVIDIA UFM
InfiniBand port GUIDs are discovered from UFM and stored in Netris inventory
Cluster Creation:
When a server cluster is created or modified in Netris Controller
Plugin identifies affected servers and their InfiniBand GUIDs
Appropriate PKeys are automatically provisioned in UFM
SHARP Integration:
For high-performance network operations, SHARP reservations are created
These correspond to the server clusters defined in Netris
Continuous Reconciliation:
Plugin periodically synchronizes between Netris and UFM
Ensures consistency between Ethernet and InfiniBand configurations
Reconciliation interval is configurable (default: 10 seconds)
Monitoring and Troubleshooting
Viewing Logs
For Kubernetes deployment:
kubectl logs -f deployment/netris-controller-nvidia-ufm-agent -n netris-controller
For Docker container:
docker logs -f netris-ufm
Common Issues and Solutions
Connection Issues to Netris Controller or UFM
Symptoms:
Log messages indicating connection timeouts or authentication failures
Missing data in Netris inventory
Solutions:
Verify network connectivity between the plugin and both systems:
ping netris.example.com ping ufm.example.com
Check credentials in the configuration:
Verify username/password combinations for both systems
Ensure API permissions are sufficient
Verify SSL certificate settings:
If using self-signed certificates, set NETRIS_VERIFY_SSL/UFM_VERIFY_SSL to “false”
For production, use valid certificates and set verification to “true”
PKey Assignment Issues
Symptoms:
Server clusters don’t have proper isolation in InfiniBand
Errors about PKey allocation failures in logs
Solutions:
Ensure the UFM_PKEY_RANGE has sufficient available IDs:
Check current PKey usage in UFM
Adjust the range if needed
Verify server naming consistency:
Server names must match exactly between Netris and UFM
Check for any server name discrepancies
Examine the PKey allocation process in debug logs:
# For Kubernetes kubectl logs -f deployment/netris-controller-nvidia-ufm-agent -n netris-controller | grep "PKey" # For Docker docker logs -f netris-ufm | grep "PKey"
Synchronization Delays
Symptoms:
Changes in Netris don’t appear quickly in UFM
Inconsistent behavior after making configuration changes
Solutions:
Adjust the RECONCILE_INTERVAL to a shorter time period for faster synchronization
Check for high CPU or memory usage on the plugin host
Verify network latency between the plugin and both systems
Restart the plugin service if synchronization issues persist:
kubectl rollout restart deployment/netris-controller-nvidia-ufm-agent -n netris-controller
or
docker restart netris-ufm
Version Compatibility
Netris Controller Version |
NVIDIA UFM Version |
Plugin Version |
---|---|---|
4.4.1+ |
6.15.4+ |
0.3.0+ |
Getting Started Guide
Quick Setup Example
Install the plugin using the Kubernetes or Docker method above
Verify the plugin is running properly:
# For Kubernetes kubectl get pods -n netris-controller | grep ufm # For Docker docker ps | grep netris-ufm
Create a Server Cluster Template in Netris Controller UI or API
Create Server Cluster with the servers that have InfiniBand connections
Verify PKey assignments in UFM:
Check the UFM UI for PKey assignments
Verify servers in the cluster can communicate via InfiniBand
Additional Resources
—
You are welcome to join our Slack channel to get additional support from our engineers and community.