You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compared to most visual-to-audio models that generate and evaluate on 10-second samples, the REWAS model produces audio with a duration of 5 seconds. I was wondering if you could explain why the duration is 5 seconds.
Additionally, I’m interested in extending the audio duration to 10 seconds. Would adjusting the shape of the initial noise latent be sufficient to achieve this, or would I need to generate two 5-second segments and combine them?
Thanks
The text was updated successfully, but these errors were encountered:
Hi,
Thanks for open-sourcing your wonderful work.
Compared to most visual-to-audio models that generate and evaluate on 10-second samples, the REWAS model produces audio with a duration of 5 seconds. I was wondering if you could explain why the duration is 5 seconds.
Additionally, I’m interested in extending the audio duration to 10 seconds. Would adjusting the shape of the initial noise latent be sufficient to achieve this, or would I need to generate two 5-second segments and combine them?
Thanks
The text was updated successfully, but these errors were encountered: