whisper.m

Automatic speech recognition in MATLAB/Octave based on the excellent whisper.cpp from Georgi Gerganov and models from OpenAI's Whisper.

Installation

First, clone the repository with submodules:

git clone --recurse-submodules https://github.com/gllmflndn/whisper.m.git

MATLAB

Then compile the MEX file using make in a Terminal:

make

The Accelerate and Metal frameworks will be used on macOS. On Windows, use MSYS2 and MinGW-w64, see MATLAB Support.

GNU Octave

If compiling for Octave, execute the following instead from a Terminal:

make MEXBIN="mkoctfile --mex" MEXEXT=mex MEXOPT=""

Usage

To run whisper.m on a pre-recorded audio file (mono, 16kHz) called input.wav:

w = whisper('small');
[segments,tokens] = w.transcribe('input.wav',...
                                 'print_realtime', true,...
                                 'print_progress', false);
whisper.display_tokens(tokens);

Pre-trained models will be downloaded automatically from Hugging Face when needed and stored in a models directory. Model options are tiny, tiny.en, base, base.en, small, small.en, medium, medium.en and large.

Another example to record audio data and run whisper.m:

Fs = 16000;
nbits = 16;
nchannels = 1;
id = 1; % see audiodevinfo to select the audio device
rec = audiorecorder(Fs, nbits, nchannels, id);

recDuration = 10;
disp('Begin speaking.')
recordblocking(rec, recDuration);
disp('End of recording.')
y = getaudiodata(rec);

w = whisper('small');
[segments,tokens] = w.transcribe(y','print_progress', false);
whisper.display_tokens(tokens);

To extrac the audio track from a video at 16kHz mono, you can use ffmpeg:

ffmpeg -i video.mp4 -f wav -ar 16000 -ac 1 -vn  audio.wav

There is also a demo that uses an audio file shipped with whisper.cpp:

>> whisper.demo()
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.62 MB
whisper_model_load: model size    =   73.54 MB
whisper_init_state: kv self size  =    2.62 MB
whisper_init_state: kv cross size =    8.79 MB
whisper_init_state: compute buffer (conv)   =   11.17 MB
whisper_init_state: compute buffer (encode) =   61.76 MB
whisper_init_state: compute buffer (cross)  =    3.67 MB
whisper_init_state: compute buffer (decode) =   18.82 MB

And so my fellow Americans ask not what your country can do for you ask what you can do for your country

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.github/workflows		.github/workflows
@whisper		@whisper
extra		extra
whisper.cpp @ 049b3a0		whisper.cpp @ 049b3a0
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper.m

Installation

MATLAB

GNU Octave

Usage

About

Releases

Languages

License

gllmflndn/whisper.m

Folders and files

Latest commit

History

Repository files navigation

whisper.m

Installation

MATLAB

GNU Octave

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages