Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: The size of tensor a (1174) must match the size of tensor b (903) at non-singleton dimension 1 #10

Open
KeckiTizii opened this issue May 11, 2024 · 0 comments

Comments

@KeckiTizii
Copy link

In my code, I included infer_file() in the async def rvc_tts() function and called it in the while loop. When the rvc_tts() function is called for the first time, it works fine, but when the loop starts again, it gives this error:

2024-05-11 13:54:26 | WARNING | rvc_python.modules.vc.modules | Traceback (most recent call last):
  File "d:\AI\MeguminAIProject\venv\lib\site-packages\rvc_python\modules\vc\modules.py", line 184, in vc_single
    audio_opt = self.pipeline.pipeline(
  File "d:\AI\MeguminAIProject\venv\lib\site-packages\rvc_python\modules\vc\pipeline.py", line 415, in pipeline
    self.vc(
  File "d:\AI\MeguminAIProject\venv\lib\site-packages\rvc_python\modules\vc\pipeline.py", line 268, in vc
    feats = feats * pitchff + feats0 * (1 - pitchff)
RuntimeError: The size of tensor a (1174) must match the size of tensor b (903) at non-singleton dimension 1

In which tensor a (1174) and tensor b (903) have continuously changing values, for example: The size of tensor a (2590) must match the size of tensor b (1322) at non-singleton dimension 1.
Here is my code:

import asyncio
import edge_tts
import os
from pydub import AudioSegment, playback
from rvc_python.infer import infer_file
from googletrans import Translator
from characterai import aiocai

translator = Translator()
char = ""
client = aiocai.Client("")
OUTPUT_EDGE = "audio/megumin-edge.wav"
OUTPUT_RVC = "audio/megumin-rvc.wav"
VOICES = [ 'ja-JP-NanamiNeural']
VOICE = VOICES[0]

async def edge_tts(translation):
    communicate = edge_tts.Communicate(translation, VOICE, rate = "+20%")
    await communicate.save(OUTPUT_EDGE)

async def rvc_tts():
    infer_file(
    input_path=OUTPUT_EDGE,
    model_path="model/megumin.pth",
    device="cuda", # Use cpu or cuda
    f0method="harvest",  # Choose between 'harvest', 'crepe', 'rmvpe', 'pm'
    f0up_key=2,  # Transpose setting
    opt_path=OUTPUT_RVC,  # Output file path
    filter_radius=3,
    resample_sr=0,  # Set to desired sample rate or 0 for no resampling.
    rms_mix_rate=0.25,
    protect=0.33,
    version="v2"
)

async def main():
    me = await client.get_me()
    async with await client.connect() as chat:
        new, answer = await chat.new_chat(
            char, me.id
        )

        print(f'{answer.name}: {answer.text}')
        
        while True:
            text = input("You: ")

            message = await chat.send_message(
                char, new.chat_id, text
            )

            translation = translator.translate(message.text, dest='ja').text
            print(f'{message.name}: {translation}')
            await edge_tts(translation)
            await rvc_tts()
            PLAY_RVC = AudioSegment.from_wav(OUTPUT_RVC)
            playback.play(PLAY_RVC)
            os.remove(OUTPUT_EDGE)
            os.remove(OUTPUT_RVC)

asyncio.run(main())

Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant