Oracle Kubernetes

Connect Valohai with Oracle Kubernetes Engine (OKE) for ML workloads

This guide helps you connect Valohai with Oracle Kubernetes Engine (OKE).

Prerequisites

Tools:

Oracle Cloud:

  • Oracle Cloud account

  • Permissions to create and manage OKE clusters

Step 1: Create the OKE Cluster

Set Up the Cluster

1. Navigate to cluster management

Log in and navigate to https://cloud.oracle.com/containers/clusters

2. Create cluster

Click Create cluster and select Quick Create.

3. Configure the cluster

Official setup guide: Oracle OKE Cluster Creation

Configuration:

  • Name: Give your cluster a name

  • Endpoint: Select Public Endpoint (unless in an air-gapped environment)

  • Worker type: Select Managed (if utilizing an autoscaler)

  • Worker visibility: Select Private Workers

  • Resources: Pick the resources you wish to allocate, including number of nodes

4. Review and create

Proceed with the Review section and click Create.

Step 2: Configure Local Access

Set Up OCI CLI

Create the .oci directory and configure CLI access:

oci setup config

This command will prompt you for various OCIDs. Refer to Oracle's documentation on finding OCIDs.

Add Public Key to Oracle

1. Generate and display the public key

cat ~/.oci/oci_api_key_public.pem

2. Copy the public key

Copy the output of the command.

3. Add to API Keys

Add the public key to the API Keys associated with your Oracle profile.

Refer to Oracle's documentation on API signing keys.

Create kubeconfig File

Create the kubeconfig file with cluster and endpoint information:

oci ce cluster create-kubeconfig \
  --cluster-id <CLUSTER-OCID> \
  --file $HOME/.kube/_ociconfig \
  --kube-endpoint PUBLIC_ENDPOINT \
  --profile DEFAULT

Parameters:

  • --file $HOME/.kube/_ociconfig - Specifies the location and creates the kubeconfig file

  • --kube-endpoint PUBLIC_ENDPOINT - Generates config for a public endpoint

  • --profile DEFAULT - Specifies the profile to use when interacting with Oracle Cloud

Authenticate with Oracle CLI

Authenticate before proceeding with kubectl commands:

oci session authenticate

Step 3: Install Kubernetes Workers

Install Valohai workers using Helm.

Install with Helm

helm upgrade --install \
    -n valohai-workers \
    --create-namespace \
    valohai-workers \
    valohai/valohai-workers \
    -f ~/custom-values.yaml \
    --kubeconfig /Users/<REPLACE>/.kube/_ociconfig

Replace <REPLACE> with your username.

Note: Reach out to the Valohai team at [email protected] to get your custom-values.yaml file.

Custom Values File

The custom-values.yaml file contains:

siteName: SITE_NAME
imagePullCredentials:
  email: EMAIL
  username: USERNAME
  password: PASSWORD
cleaner:
  sentryDsn: SENTRY_URL

These values will be provided by Valohai.

Step 4: Complete Setup

Send Information to Valohai

Securely send the output from the Helm command to Valohai support at [email protected].

This allows Valohai to access the namespace of the cluster, which will finalize the process and enable Valohai to work with Oracle Kubernetes Engine.

Information Needed

The Helm output should include:

  • Namespace details

  • Service account information

  • Cluster access credentials

Step 5: Verify the Setup

After Valohai confirms the environment is configured:

1. Log in to app.valohai.com

  • Check that Oracle Kubernetes environments appear in your organization

2. Run a test execution

  • Create a test project

  • Run a simple execution

  • Verify it runs on your OKE cluster

3. Check results

  • Verify outputs are saved correctly

  • Check execution logs

Troubleshooting

Cannot authenticate with OCI CLI

Verify OCI configuration:

cat ~/.oci/config

Check that all OCIDs and paths are correct.

Test authentication:

oci iam user get --user-id <YOUR-USER-OCID>

kubeconfig not working

Verify kubeconfig path:

echo $KUBECONFIG

Should point to /Users/<your-username>/.kube/_ociconfig

Test connection:

kubectl get nodes --kubeconfig /Users/<your-username>/.kube/_ociconfig

Helm installation fails

Check namespace:

kubectl get namespaces --kubeconfig /Users/<your-username>/.kube/_ociconfig

Verify Helm can access cluster:

helm list -n valohai-workers --kubeconfig /Users/<your-username>/.kube/_ociconfig

Check custom-values.yaml:

  • Ensure all values are properly set

  • Verify no syntax errors in YAML

Pods not starting

Check pod status:

kubectl get pods -n valohai-workers --kubeconfig /Users/<your-username>/.kube/_ociconfig

Check pod logs:

kubectl logs <pod-name> -n valohai-workers --kubeconfig /Users/<your-username>/.kube/_ociconfig

Common issues:

  • Image pull errors (check credentials)

  • Insufficient resources (check node capacity)

  • Network policies blocking traffic

Additional Resources

Oracle Documentation:

Valohai Documentation:

Getting Help

Valohai Support: [email protected]

Include in support requests:

  • Oracle Cloud region

  • Cluster OCID

  • kubectl version

  • OCI CLI version

  • Helm output or error messages

  • Pod logs if available

Last updated

Was this helpful?