Skip to content

Implementation of Transfer Learning from Speaker Verification to Multi-speaker Text-To-Speech Synthesis (SV2TTS) in Persian language.

Notifications You must be signed in to change notification settings

Adibian/Persian-MultiSpeaker-Tacotron2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MultiSpeaker Tacotron2 in Persian language

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) in Persian language. The main code is from this repository and has been changed to the Persian language.

Quickstart

Data structures:

dataset/persian_date/
    train_data/
        speaker1/book-1/
            sample1.txt
            sample1.wav
            ...
        ...
    test_data/
        ...

Preprocessing:

python synthesizer_preprocess_audio.py dataset --datasets_name persian_data --subfolders train_data --no_alignments
python synthesizer_preprocess_embeds.py dataset/SV2TTS/synthesizer

Train synthesizer:

python synthesizer_train.py my_run dataset/SV2TTS/synthesizer

For synthesizing wav file you must put all final models in saved_models/final_models directory. If you do not train speaker encoder and vocoder models you can use pretrained models in saved_models/default.

Inference using WavRNN as vocoder:

python inference.py --vocoder "WavRNN" --text "یک نمونه از خروجی" --ref_wav_path "/path/to/sample/refrence.wav" --test_name "test1"

But WavRNN is an old vocoder and if you want to use HiFiGAN you must first download a pretrained model in English.

First, install the parallel_wavegan package. See this package for more information.

pip install parallel_wavegan

Then download pretrained HiFiGAN to your saved models:

from parallel_wavegan.utils import download_pretrained_model
download_pretrained_model("vctk_hifigan.v1", "saved_models/final_models/vocoder_HiFiGAN")

Now you can use HiFiGAN as a vocoder in inference command:

python inference.py --vocoder "HiFiGAN" --text "یک نمونه از خروجی" --ref_wav_path "/path/to/sample/refrence.wav" --test_name "test1"

Demo

There are some output samples of the trained model in this directory.

References:

About

Implementation of Transfer Learning from Speaker Verification to Multi-speaker Text-To-Speech Synthesis (SV2TTS) in Persian language.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages