How to use Whisper CPP in Python: Complete Guide

How to use Whisper CPP in Python: Complete Guide

Introduction

Whisper.cpp is a custom inference implementation of the Whisper speech recognition model created and released by OpenAI. It is implemented in C/C++ and runs only on the CPU. Whisper.cpp aims to provide the same functionality as the original Whisper model, but there are differences in speed and accuracy. While Whisper running on a GPU is generally more efficient, whisper.cpp can be advantageous when running the model on a CPU, such as on Apple Silicon. However, in terms of accuracy, Whisper is considered the gold standard, while whisper.cpp is similar and sometimes slightly worse. Whisper.cpp is developed by Georgi Gerganov for transcribing WAV audio files to text or speech

The Whisper.cpp model is capable of superhuman performance in speech recognition and is open-source, allowing for modifications and custom implementations. It has virtually no dependencies, making it unique in the modern software landscape. Whisper.cpp supports advanced CPU features such as ARM NEON and x86 AVX. The model can be used for transcribing WAV audio files to text or speech

Python bindings for Whisper.cpp are still in development, and some existing libraries may be outdated or broken. Developers have been working on creating Python bindings for Whisper.cpp to enable its use in Python applications. However, it's important to ensure compatibility and functionality with the latest version of Whisper.cpp and its Python bindings.

How to use Whisper.cpp in Python

To use Whisper.cpp in Python, you can follow these steps:

  1. Install Whisper.cpp: Clone the Whisper.cpp repository and build it. For example, you can use the following commands:
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make base.en

2. Install Python Dependencies: Create a Python environment and install the necessary dependencies, such as ane_transformers, openai-whisper, and coremltools. For instance, using Miniconda, you can create an environment and install the dependencies as follows:

conda create -n py310-whisper python=3.10 -y
conda activate py310-whisper
pip install ane_transformers openai-whisper coremltools

3. Generate Core ML Model: If you plan to use Whisper.cpp with Apple's Core ML, you can generate a Core ML model off the downloaded Whisper model. For example:


./models/generate-coreml-model.sh base.en

4. Compile Whisper.cpp with Core ML Support: If you need Core ML support, you can compile Whisper.cpp with Core ML support using the following commands:


make clean WHISPER_COREML=1
make -j

5. Run Whisper.cpp: You can then run Whisper.cpp with the desired model and audio input. For example:


./main -m models/ggml-base.en.bin -f samples/audio.wav

It's important to note that the Python bindings for Whisper.cpp are still in development, and some existing libraries may be outdated or broken. Therefore, it's recommended to refer to the official Whisper.cpp documentation and the latest updates on Python bindings to ensure compatibility and functionality

Using whisper-cpp-python package

Here is an example Python code that uses the whisper-cpp-python module to transcribe an audio file using the Whisper.cpp model:
https://pypi.org/project/whisper-cpp-python/

from whisper_cpp_python import Whisper

# Create an instance of the Whisper class, passing the path to the model file as a parameter
model = Whisper('path/to/model/file.bin')

# Call the transcribe method with the path to the audio file as a parameter
transcription = model.transcribe('path/to/audio/file.wav')

# Print the transcription
print(transcription)

Note that you need to install the whisper-cpp-python module before running this code. You can install it using pip:


pip install whisper-cpp-python

Also, make sure to replace the path/to/model/file.bin and path/to/audio/file.wav with the actual paths to the Whisper.cpp model file and the audio file you want to transcribe, respectively.It's important to note that the Python bindings for Whisper.cpp are still in development, and some existing libraries may be outdated or broken. Therefore, it's recommended to refer to the official Whisper.cpp documentation and the latest updates on Python bindings to ensure compatibility and functionality.

Difference between Whisper CPP and  Whisper

The Whisper model and its C/C++ implementation, Whisper.cpp, have several differences in terms of implementation, performance, and use cases. Here's a detailed comparison based on the provided information:

  1. Implementation:
  • Whisper: The original Whisper model is implemented in Python and supports running on both the CPU and the GPU.
  • Whisper.cpp: Whisper.cpp is a custom inference implementation of the Whisper model. It is implemented in C/C++ and runs only on the CPU. It aims to provide the same functionality as the original model but with differences in the implementation.
  1. Performance:
  • Whisper: Whisper running on a GPU is generally more efficient than running on a CPU. It is considered the "gold standard" in terms of accuracy.
  • Whisper.cpp: Whisper.cpp, while running only on the CPU, can be advantageous in some cases, such as on Apple Silicon, where it is expected to be faster. However, in terms of accuracy, Whisper is considered the "gold standard," while whisper.cpp should be similar and sometimes slightly worse1.
  1. Use Cases:
  • Whisper: The original Whisper model is implemented in Python and supports running on both the CPU and the GPU, making it suitable for a wide range of applications.
  • Whisper.cpp: Whisper.cpp is designed for scenarios where running the model on a CPU is necessary or advantageous. It is a custom inference implementation that can be used for transcribing WAV audio files to text or speech.
  1. Dependencies:
  • Whisper: The original Whisper model in Python may have dependencies typical of Python-based machine learning models.
  • Whisper.cpp: Whisper.cpp is unique in that it has virtually no dependencies, making it suitable for scenarios where minimal dependencies are preferred.

In summary, Whisper and Whisper.cpp are two implementations of the same model, with Whisper being the original Python implementation supporting both CPU and GPU, and Whisper.cpp being a custom C/C++ implementation running only on the CPU. While Whisper is the "gold standard" in terms of accuracy, Whisper.cpp can be advantageous in certain scenarios, such as on Apple Silicon, and is designed for use cases where running the model on a CPU is necessary or preferred

In conclusion

In conclusion, the Whisper model and its C/C++ implementation, Whisper.cpp, offer distinct advantages and are tailored to different use cases. Whisper, the original Python implementation, provides the flexibility of running on both the CPU and the GPU, making it suitable for a wide range of applications. It is considered the "gold standard" in terms of accuracy and performance, particularly when running on a GPU.

On the other hand, Whisper.cpp is designed for scenarios where running the model on a CPU is necessary or advantageous. It is implemented in C/C++ and runs only on the CPU, making it potentially faster in certain environments, such as on Apple Silicon.

Additionally, Whisper.cpp has virtually no dependencies, which can be beneficial in scenarios where minimal dependencies are preferred.While Whisper.cpp aims to provide the same functionality as the original model, it may exhibit slight differences in accuracy compared to Whisper.

Therefore, the choice between Whisper and Whisper.cpp depends on the specific requirements of the application, the available hardware, and the importance of accuracy versus minimal dependencies.Developers and practitioners should carefully consider these factors when choosing between the two implementations to ensure that they align with the performance, accuracy, and hardware constraints of their particular use case. Additionally, staying informed about the latest developments and updates for both Whisper and Whisper.cpp is essential for making well-informed decisions regarding their utilization.