TensorFlow/Keras

TensorFlow and Keras provide a callback system that makes metric logging clean and automatic. Create a custom callback to log metrics at the end of each epoch without cluttering your training code.


Quick Example

import tensorflow as tf
import valohai


class ValohaiMetricsCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        with valohai.metadata.logger() as logger:
            logger.log("epoch", epoch + 1)
            logger.log("accuracy", logs["accuracy"])
            logger.log("loss", logs["loss"])
            logger.log("val_accuracy", logs["val_accuracy"])
            logger.log("val_loss", logs["val_loss"])


# Use the callback
model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=10,
    callbacks=[ValohaiMetricsCallback()],
)

Why Use Callbacks?

Keras callbacks run at specific points during training. They let you:

  • Access all training metrics automatically

  • Keep metric logging separate from model code

  • Reuse the same callback across projects


Complete Working Example

Here's a full training script with Valohai integration:


valohai.yaml Configuration

Make sure to change the input data and environment to match your own values.


Logging Without valohai-utils

You can also log metrics using plain JSON:


Logging Learning Rate

Track learning rate changes during training:


Combining Multiple Callbacks

Use multiple callbacks together:


Logging Custom Metrics

Add your own computed metrics:


Using LambdaCallback (Shorter Syntax)

For simple logging, use LambdaCallback:


Logging Per-Batch Metrics (Advanced)

For very long epochs, you might want to log progress mid-epoch:

Use sparingly: Logging every batch creates a lot of data. Only use for debugging or very long epochs.


Best Practices

Always Convert to Python Types

Keras metrics are NumPy types. Convert to Python types for JSON serialization:


Handle Missing Metrics

Not all metrics are available in every callback:


Use Descriptive Metric Names

Keep names consistent with Keras conventions:


Common Issues

Metrics Not Appearing

Symptom: Callback runs but no metrics in Valohai

Causes & Fixes:

  • Missing validation_data → Add validation split or data

  • Incorrect metric names → Check available keys in logs

  • JSON serialization error → Convert NumPy/Tensor types to float

Debug:


Validation Metrics Missing

Symptom: Only training metrics logged, no validation metrics

Solution: Make sure you provide validation data:


Example Project

Check out our complete working example on GitHub:

valohai/tensorflow-example

The repository includes:

  • Complete training script with Valohai integration

  • valohai.yaml configuration

  • Example notebooks

  • Step-by-step setup instructions


Next Steps

Last updated

Was this helpful?