RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list. #37

vb87 · 2023-11-20T00:03:18Z

I'm using the latest master and running on CUDA.

Here's the mp3 file I'm using as input:
https://drive.google.com/file/d/1xR2mV-SctUknIvjKqlTYyFKHRl5annCX/view?usp=sharing

command line:
python -m audiosr -i 5.01_22303.037073170733_23517.438009756097.mp3 -s . -d cuda

getting this error:

Loading AudioSR: speech
Loading model on cuda
D:\Soft\Python\Python38\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3484.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
D:\Soft\Python\Python38\lib\site-packages\torchaudio\transforms_transforms.py:611: UserWarning: Argument 'onesided' has been deprecated and has no influence on the behavior of this module.
warnings.warn(
DiffusionWrapper has 258.20 M params.
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 0%| | 0/50 [00:05<?, ?it/s]
Traceback (most recent call last):
File "D:\Soft\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\Soft\Python\Python38\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr_main.py", line 115, in
waveform = super_resolution(
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\pipeline.py", line 168, in super_resolution
waveform = latent_diffusion.generate_batch(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1525, in generate_batch
samples, _ = self.sample_log(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1431, in sample_log
samples, intermediates = ddim_sampler.sample(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 143, in sample
samples, intermediates = self.ddim_sampling(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 237, in ddim_sampling
outs = self.p_sample_ddim(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 293, in p_sample_ddim
model_t = self.model.apply_model(x_in, t_in, c)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1030, in apply_model
x_recon = self.model(x_noisy, t, cond_dict=cond)
File "D:\Soft\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1686, in forward
out = self.diffusion_model(
File "D:\Soft\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\modules\diffusionmodules\openaimodel.py", line 879, in forward
h = th.cat([h, concate_tensor], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list.

Could be related to the length of the audio, just 0.936 seconds, I used Losslesscut to append another mp3 file with same exact configuration without reencoding, and output it with all the same settings - same sample rate, bitrate etc. and then run audiosr with that and got no error.

The text was updated successfully, but these errors were encountered:

yuzuda283 · 2023-12-26T09:50:43Z

same question

Susukerow45 · 2023-12-30T06:41:00Z

try 0.512 sec wav file

DrBrule · 2024-01-02T05:30:34Z

Ran into this as well. It seems to be related to the audio file being too short. If you pad the input audio array with some trailing zeros it should function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list. #37

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list. #37

vb87 commented Nov 20, 2023 •

edited

Loading

yuzuda283 commented Dec 26, 2023

Susukerow45 commented Dec 30, 2023

DrBrule commented Jan 2, 2024

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list. #37

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list. #37

Comments

vb87 commented Nov 20, 2023 • edited Loading

yuzuda283 commented Dec 26, 2023

Susukerow45 commented Dec 30, 2023

DrBrule commented Jan 2, 2024

vb87 commented Nov 20, 2023 •

edited

Loading