Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list. #37

Open
vb87 opened this issue Nov 20, 2023 · 3 comments

Comments

@vb87
Copy link

vb87 commented Nov 20, 2023

I'm using the latest master and running on CUDA.

Here's the mp3 file I'm using as input:
https://drive.google.com/file/d/1xR2mV-SctUknIvjKqlTYyFKHRl5annCX/view?usp=sharing

command line:
python -m audiosr -i 5.01_22303.037073170733_23517.438009756097.mp3 -s . -d cuda

getting this error:

Loading AudioSR: speech
Loading model on cuda
D:\Soft\Python\Python38\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3484.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
D:\Soft\Python\Python38\lib\site-packages\torchaudio\transforms_transforms.py:611: UserWarning: Argument 'onesided' has been deprecated and has no influence on the behavior of this module.
warnings.warn(
DiffusionWrapper has 258.20 M params.
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 0%| | 0/50 [00:05<?, ?it/s]
Traceback (most recent call last):
File "D:\Soft\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\Soft\Python\Python38\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr_main
.py", line 115, in
waveform = super_resolution(
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\pipeline.py", line 168, in super_resolution
waveform = latent_diffusion.generate_batch(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1525, in generate_batch
samples, _ = self.sample_log(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1431, in sample_log
samples, intermediates = ddim_sampler.sample(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 143, in sample
samples, intermediates = self.ddim_sampling(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 237, in ddim_sampling
outs = self.p_sample_ddim(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 293, in p_sample_ddim
model_t = self.model.apply_model(x_in, t_in, c)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1030, in apply_model
x_recon = self.model(x_noisy, t, cond_dict=cond)
File "D:\Soft\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1686, in forward
out = self.diffusion_model(
File "D:\Soft\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\modules\diffusionmodules\openaimodel.py", line 879, in forward
h = th.cat([h, concate_tensor], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list.

Could be related to the length of the audio, just 0.936 seconds, I used Losslesscut to append another mp3 file with same exact configuration without reencoding, and output it with all the same settings - same sample rate, bitrate etc. and then run audiosr with that and got no error.

@yuzuda283
Copy link

same question

@Susukerow45
Copy link

try 0.512 sec wav file

@DrBrule
Copy link

DrBrule commented Jan 2, 2024

Ran into this as well. It seems to be related to the audio file being too short. If you pad the input audio array with some trailing zeros it should function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants