Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stretched videos not at the correct length #12

Open
LongMingWei opened this issue Jul 20, 2024 · 0 comments
Open

Stretched videos not at the correct length #12

LongMingWei opened this issue Jul 20, 2024 · 0 comments

Comments

@LongMingWei
Copy link

LongMingWei commented Jul 20, 2024

I am trying to sync translated audio segments with a video using timestamps returned alongside the audio segment itself from a speech to text package. However, even with the stretch ratio calculated correctly, the duration of certain audio segments become too long, particularly because of a strange long pause at the end of the audio segment. For example in the attached zip folder there is the original audio and the stretched one. When calculating the stretch ratio based on the timestamp, the result duration should be about 5-6 seconds, a stretch ratio of around 1.1. However when inputting it into the stretch audio function, the video becomes 8 seconds instead with a 3 second pause. It will be great to know what's causing the problem and if there's something I am unaware of. The relevant code and audio files are below. Thank you!

`

def generate_segment_audio(segment, speaker_id):
    start, end, translated_text = segment  # Gets start and end timestamps from the audio segment
    segment_path = os.path.join(output_dir, f'segment_{start}_{end}.wav')
    stretched_path = os.path.join(output_dir, f'segment_{start}_{end}_stretched.wav')
    duration = end - start
    # Generate the audio file with the TTS model
    model.tts_to_file(translated_text, speaker_id, segment_path, speed=speed)

    # Adjust the audio speed to match the duration
    segment_audio = AudioSegment.from_file(segment_path)
    current_duration = len(segment_audio) / 1000  # Convert to seconds
    stretch_ratio = duration / current_duration
    print(f'{stretch_ratio} = {duration} / {current_duration}')
    stretch_audio(segment_path, stretched_path, ratio=stretch_ratio)
    return segment_path

`

audiofiles.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant