Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast error details: Unable to cast [Array] to Tensor #110

Open
Tony-Starkus opened this issue Jul 19, 2024 · 9 comments
Open

Cast error details: Unable to cast [Array] to Tensor #110

Tony-Starkus opened this issue Jul 19, 2024 · 9 comments

Comments

@Tony-Starkus
Copy link

Hello. I downloaded the pretrained modal ljspeech v3.1 and when I try to run python gen_forward.py --alpha 1 --checkpoint pretrained-forward_step90k.pt --input_text 'this is whatever you want it to be' griffinlim I get the following error:

Traceback (most recent call last):
  File "/home/usertest/PycharmProjects/ForwardTacotron/gen_forward.py", line 116, in <module>
    dsp.save_wav(wav, out_path / f'{wav_name}.wav')
  File "/home/usertest/PycharmProjects/ForwardTacotron/utils/dsp.py", line 103, in save_wav
    torchaudio.save(filepath=path, src=waveform, sample_rate=self.sample_rate)
  File "/home/usertest/.virtualenvs/ForwardTacotron-Python3.10/lib/python3.10/site-packages/torchaudio/backend/sox_io_backend.py", line 429, in save
    torch.ops.torchaudio.sox_io_save_audio_file(
  File "/home/usertest/.virtualenvs/ForwardTacotron-Python3.10/lib/python3.10/site-packages/torch/_ops.py", line 502, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: torchaudio::sox_io_save_audio_file() Expected a value of type 'Tensor' for argument '_1' but instead found type 'ndarray'.
Position: 1
Value: array([0.00272604, 0.00512884, 0.00484867, ..., 0.00298105, 0.00193049,
       0.00093417], dtype=float32)
Declaration: torchaudio::sox_io_save_audio_file(str _0, Tensor _1, int _2, bool _3, float? _4, str? _5, str? _6, int? _7) -> ()
Cast error details: Unable to cast [0.00272604 0.00512884 0.00484867 ... 0.00298105 0.00193049 0.00093417] to Tensor

Someone can help me?

I am runing Python 3.10 with following packages versions:

absl-py==2.1.0
attrs==23.2.0
audioread==3.0.1
Babel==2.15.0
bibtexparser==2.0.0b7
certifi==2024.7.4
cffi==1.16.0
charset-normalizer==3.3.2
clldutils==3.22.2
cmake==3.30.0
colorama==0.4.6
colorlog==6.8.2
contourpy==1.2.1
csvw==3.3.0
cycler==0.12.1
Cython==3.0.10
dataclasses==0.6
decorator==5.1.1
dlinfo==1.2.1
filelock==3.13.1
fonttools==4.53.1
fsspec==2024.2.0
grpcio==1.65.1
idna==3.7
inflect==7.3.1
isodate==0.6.1
Jinja2==3.1.3
joblib==1.4.2
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
language-tags==1.2.0
lazy_loader==0.4
librosa==0.10.0
lit==18.1.8
llvmlite==0.39.1
lxml==5.2.2
Markdown==3.6
MarkupSafe==2.1.5
matplotlib==3.9.1
more-itertools==10.3.0
mpmath==1.3.0
msgpack==1.0.8
networkx==3.2.1
numba==0.56.4
numpy==1.23.5
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
packaging==24.1
pandas==2.2.2
phonemizer==3.2.1
pillow==10.2.0
platformdirs==4.2.2
pooch==1.8.2
protobuf==4.25.3
pycparser==2.22
pylatexenc==2.10
pyparsing==3.1.2
python-dateutil==2.9.0.post0
pytz==2024.1
pyworld==0.3.4
PyYAML==6.0.1
rdflib==7.0.0
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
Resemblyzer==0.1.3
rfc3986==1.5.0
rpds-py==0.19.0
scikit-learn==1.5.1
scipy==1.14.0
segments==2.2.1
six==1.16.0
soundfile==0.12.1
soxr==0.3.7
sympy==1.12
tabulate==0.9.0
tensorboard==2.17.0
tensorboard-data-server==0.7.2
threadpoolctl==3.5.0
torch==2.0.1
torchaudio==2.0.2
tqdm==4.66.4
triton==2.0.0
typeguard==4.3.0
typing==3.7.4.3
typing_extensions==4.12.2
tzdata==2024.1
Unidecode==1.3.8
uritemplate==4.1.1
urllib3==2.2.2
webrtcvad==2.0.10
Werkzeug==3.0.3
@rmcpantoja
Copy link

Hi,
Make sure you have torchaudio installed properly, with its dependencies to work, or use a vocoder like hifigan or istft-based vocoders like vocos, vocoders are better than griffinlim, honestly.

@Tony-Starkus
Copy link
Author

Hi @rmcpantoja , thanks for the reply.

About the torchaudio, the requirements.txt has torch>=1.2.0 and torchaudio==2.0.2. The torchaudio 2 is compatible with pytorch 2. This is why i installed torch==2.0.1

My objective is convert text to audio file, and looking on the gen_forward.py the griffinlim is the one that created a wav file.
Do you know another way to do it? I tried many codes to convert .mel and .npy to wav but no success.

Reference: https://github.com/pytorch/audio/releases/tag/v2.0.2

@rmcpantoja
Copy link

Hi @rmcpantoja , thanks for the reply.

About the torchaudio, the requirements.txt has torch>=1.2.0 and torchaudio==2.0.2. The torchaudio 2 is compatible with pytorch 2. This is why i installed torch==2.0.1

My objective is convert text to audio file, and looking on the gen_forward.py the griffinlim is the one that created a wav file. Do you know another way to do it? I tried many codes to convert .mel and .npy to wav but no success.

Reference: https://github.com/pytorch/audio/releases/tag/v2.0.2

Hi,
If you add hifigan to gen_forward's command line, the script will convert npy automatically, and you need to pass the npy to any vocoder. But, I have a script that synthesizes ForwardTacotron and HiFi-GAN at same time, directly, without passing files. We have also a GUI app supporting this TTS, see here

@Tony-Starkus
Copy link
Author

I checked the code of tts-remix. Can you give a little explanation about how to use it?!

@stavrosmachinima
Copy link

Hey, I had the same issue. Fixed it with two lines on gen_forward.py.
I created a PR about it.

@rmcpantoja
Copy link

I checked the code of tts-remix. Can you give a little explanation about how to use it?!

Hi,
Just use the GUI using:

python tts_remix.py

The interphase will open.
Just you need to put ForwardTacotron and HiFiGan checkpoints, something like:
models
models/forward
models/forward/voicename
models/forward/voicename/voicename.pt
models/forward/voicename/vocoder-voicename.pt
models/forward/voicename/vocoder-voicename.json

@Tony-Starkus
Copy link
Author

Hey, I had the same issue. Fixed it with two lines on gen_forward.py. I created a PR about it.

Looks good, i am going to try it later, thanks!

Which python version are you using? Also can you share your pip freeze please?!

@Tony-Starkus
Copy link
Author

I checked the code of tts-remix. Can you give a little explanation about how to use it?!

Hi, Just use the GUI using:

python tts_remix.py

The interphase will open. Just you need to put ForwardTacotron and HiFiGan checkpoints, something like: models models/forward models/forward/voicename models/forward/voicename/voicename.pt models/forward/voicename/vocoder-voicename.pt models/forward/voicename/vocoder-voicename.json

Got it, i will try this. Thanks!

@stavrosmachinima
Copy link

Hey, I had the same issue. Fixed it with two lines on gen_forward.py. I created a PR about it.

Looks good, i am going to try it later, thanks!

Which python version are you using? Also can you share your pip freeze please?!

Python 3.10 as you. We have some slight differences in pip freeze but they shouldn't matter.

absl-py==2.1.0
attrs==23.2.0
audioread==3.0.1
Babel==2.15.0
bibtexparser==2.0.0b7
certifi==2024.7.4
cffi==1.16.0
charset-normalizer==3.3.2
clldutils==3.22.2
cmake==3.30.1
colorama==0.4.6
colorlog==6.8.2
contourpy==1.2.1
csvw==3.3.0
cycler==0.12.1
Cython==3.0.10
dataclasses==0.6
decorator==5.1.1
dlinfo==1.2.1
filelock==3.15.4
fonttools==4.53.1
grpcio==1.65.1
idna==3.7
inflect==7.3.1
isodate==0.6.1
Jinja2==3.1.4
joblib==1.4.2
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
language-tags==1.2.0
lazy_loader==0.4
librosa==0.10.0
lit==18.1.8
llvmlite==0.39.1
lxml==5.2.2
Markdown==3.6
MarkupSafe==2.1.5
matplotlib==3.9.1
more-itertools==10.3.0
mpmath==1.3.0
msgpack==1.0.8
networkx==3.3
numba==0.56.4
numpy==1.23.5
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
packaging==24.1
pandas==2.2.2
phonemizer==3.2.1
pillow==10.4.0
platformdirs==4.2.2
pooch==1.8.2
protobuf==4.25.4
pycparser==2.22
pylatexenc==2.10
pyparsing==3.1.2
python-dateutil==2.9.0.post0
pytz==2024.1
pyworld==0.3.4
PyYAML==6.0.1
rdflib==7.0.0
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
Resemblyzer==0.1.3
rfc3986==1.5.0
rpds-py==0.19.1
scikit-learn==1.5.1
scipy==1.14.0
segments==2.2.1
six==1.16.0
soundfile==0.12.1
soxr==0.4.0
sympy==1.13.1
tabulate==0.9.0
tensorboard==2.17.0
tensorboard-data-server==0.7.2
threadpoolctl==3.5.0
torch==2.0.1
torchaudio==2.0.2
tqdm==4.66.4
triton==2.0.0
typeguard==4.3.0
typing==3.7.4.3
typing_extensions==4.12.2
tzdata==2024.1
Unidecode==1.3.8
uritemplate==4.1.1
urllib3==2.2.2
webrtcvad==2.0.10
Werkzeug==3.0.3
   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants