Red Hat OpenShift AI Intro/Workbenches

In this series of posts, I am going to cover the main features of OpenShift AI (referred to as RHOAI).

The ideas behind these articles will be to help someone new to data science, machine learning, AI, MLOps (Machine Learning Operations), etc and provide just enough information for one to come up with more and newer ways to use RHOAI based on this brief understanding.

Another key point to this is that I am designing the tutorials to be able to run on minimal hardware. These will either run on a small GPU or even on CPU. The cluster that I am using is a SNO cluster (version 4.18). I would recommend a minimum of 16VCPUs and 64GB of RAM but this may run in a smaller footprint as well.

I want to make this useable for someone in their home lab. The concepts you learn will be smaller implementations

Layout of RHOAI Dashboard

Main Landing Page

View from Within a Project

Definitions/Concepts

I will use this opportunity to provide some definitions. We will learn more on these concepts throughout.

Projects: The RHOAI project is tied to an OpenShift project (namespace), but not every OpenShift project contains data science resources. You can think of an OpenShift project as an organizational unit, which may represent a group of developers, a specific product, or a particular outcome.

Workbench: The workbench is a Jupyter Notebook-based development environment. When creating a workbench, you can choose from several variants, typically based on the desired Python version, pre-installed Python packages, and specific drivers for particular GPU types or hardware architectures.

Pipelines: Pipelines are a way to run multiple steps in a data science workflow. Each step can be its own Jupyter notebook. For example, a typical workflow might start with cleaning the data, followed by generating a prediction. A pipeline ties these notebooks together, ensuring they run in sequence automatically.

Models: Models are tools that are created by analyzing data. They learn patterns from the data and can then be used to make predictions or decisions. Once a model is built, you can use it by sending it data—often through an API—and it will return a prediction or result. Models are usually saved as files in storage systems, such as an S3 bucket or similar data store. These storage locations are accessed through connections, which will be explained shortly.

Cluster Storage: Cluster storage uses a persistent volume claim (PVC) to store your Jupyter notebooks and any associated data, ensuring your work is saved and available even if the notebook server restarts.

Connections: Connections allow you to link your project to external storage systems and services. Typically, models are saved using S3-compatible storage through a connection, but connections can also be used for other purposes, such as:

OCI-compliant registries to integrate with container registries.
S3-compatible object storage for saving models and other files.
URIs to connect to various data sources.

More details on how to use connections will be explained later.

Permissions: Permissions in RHOAI follow the RBAC (role-based access control) model. Users and groups can be assigned specific roles—either as contributors (regular users with standard access) or admins (super-users with full access). More details on permission management will be covered later.

Workbenches

This is where data scientists and/or developers will develop code. The images included by default are all based on Python.

Here is a list of the default images that are included.

Minimal Python 2025.1: Jupyter notebook image with minimal dependency set to start experimenting with Jupyter environment.

CUDA 2025.1: Jupyter notebook image with GPU (Nvidia) support and minimal dependency set to start experimenting with Jupyter environment.

TensorFlow 2025.1: Jupyter notebook image with TensorFlow libraries and dependencies to start experimenting with advanced AI/ML notebooks.

code-server 2025.1: code-server workbench allows users to code, build, and collaborate on projects directly from web.

ROCm-Pytorch 2025.1: Jupyter ROCm (AMD) optimized PyTorch notebook image for ODH notebooks.

Standard Data Science 2025.1: Jupyter notebook image with a set of data science libraries that advanced AI/ML notebooks will use as a base image to provide a standard for libraries avialable in all notebooks

PyTorch 2025.1: Jupyter notebook image with PyTorch libraries and dependencies to start experimenting with advanced AI/ML notebooks.

TrustyAI: Jupyter TrustyAI notebook integrates the TrustyAI Explainability Toolkit on Jupyter environment.

ROCm 2025.1: Jupyter ROCm notebook image for ODH notebooks.

ROCm-TensorFlow 2025.1: Jupyter ROCm optimized TensorFlow notebook image for ODH notebooks.

Starting of Workbench

This workbench example relates to some of the work we will do later with model-serving/pipelines in RHOAI. In this example, you perform the first step in a series of Pipelines (loading and printing the dataframe).

From within one of your projects, click on the Workbenches tab and then "Create Workbench".

I had a workbench called test already defined in my environment. You won't need this.

Fill-in the following values

Name: Event Frequency Analysis
Description: Event Frequency Analysis
Image Selection: Standard Data Science
Version Selection: 2025.1
Deployment Size: Small (2 CPU 8Gib Memory)

Leave everything else as default including 20GB PVC

Click "Create Workbench"

Once the workbench is running, click on the "Event Frequency Analysis" link.

This will open up launcher for the Jupyter Notebook/Workbench. Select Python 3.11.

Now, the workbench/notebook will appear.

On the left-hand side is an icon (third one down) called Git.

Click on this.

Choose "Clone a Repository"

For the URI, input:

https://github.com/kcalliga/rhoai-demos

Click Clone.

On the left-hand side of the screen, you will see the Untitled.ipynb that you started and a directory called "rhoai-demos". Double-click on "rhoai-demos" folder and then select the "1_load_event.ipynb" file.

6. Go to the top cell in this workbench and click the "Play button"

Alternatively, you can hit "Shift-Enter".

There will be no output after this is run.

Now, create a new cell. Type "df" in this cell. You will see the output of the dataframe which took the input json file of Openshift event data and formatted it into tabular form.

That's all for this part of the article. If you noticed, the Git repo you cloned has number 1 and number 2 for steps. This will be a simplified example of a two step pipeline. We will later take each of these numbered Jupyter notebooks and create a pipeline (sequence of steps).