# Automatic Speech Recognition with NVIDIA NeMo

***

## Overview

This example demonstrates:

* Preparing and preprocessing LibriSpeech data
* Fine-tuning QuartzNet ASR models
* Evaluating Word Error Rate (WER)

***

### Steps

{% stepper %}
{% step %}
**Data Preparation**

Prepare and preprocess the LibriSpeech dataset to ensure it is ready for training. Convert the data into the required format compatible with the QuartzNet model.
{% endstep %}

{% step %}
**Environment Setup**

Configure the environment for the QuartzNet ASR model to enable efficient fine-tuning and evaluation. Ensure all dependencies and tools are installed.
{% endstep %}

{% step %}
**Model Fine-tuning**

Fine-tune the QuartzNet ASR model on the prepared LibriSpeech data to enhance its transcription capabilities.
{% endstep %}

{% step %}
**Evaluation Process**

Assess the model's performance by calculating the Word Error Rate (WER) on a test dataset to determine its accuracy.
{% endstep %}

{% step %}
**Results Analysis**

Analyze the model's predictions and WER results to determine areas of improvement and refine the model if necessary.
{% endstep %}
{% endstepper %}

***

### GitHub repository

The repository walks you through how to go through the above steps:

{% embed url="<https://github.com/valohai/NVIDIA-NeMo-Valohai>" %}
