Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute Requirements and Execution Time #61

Open
Sedherthe opened this issue Jun 29, 2024 · 7 comments
Open

Compute Requirements and Execution Time #61

Sedherthe opened this issue Jun 29, 2024 · 7 comments

Comments

@Sedherthe
Copy link

Sedherthe commented Jun 29, 2024

Failed utterly to run it on 16 GB RAM. Can anyone add the compute requirements and the time it took to execute AudioSR? Looking for a benchmark across GPU vs CPU vs Audio Duration.

@JuvenileLocksmith
Copy link

Your file is likely too big, on 24GB vram best I could do was 30-second chunks. Try this script as your orchestrater and start with 10 second chunks:

`#!/bin/bash

set -e # Exit immediately if a command exits with a non-zero status.

Input file and output directories

input_file="/workspace/2band.wav"
output_dir="/workspace/output_chunks"
final_output_dir="/workspace/final_output"

Create output directories

mkdir -p "$output_dir" "$final_output_dir"

Set chunk size to 30 seconds

chunk_size=30

echo "Processing audio in $chunk_size-second chunks"

Get the total duration of the input file

duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$input_file")
num_chunks=$(echo "($duration + $chunk_size - 1) / $chunk_size" | bc)

Process audio in chunks

for i in $(seq 0 $((num_chunks-1))); do
start_time=$(echo "$i * $chunk_size" | bc)

echo "Processing chunk $i starting at $start_time seconds"

# Extract a chunk of audio
ffmpeg -i "$input_file" -ss $start_time -t $chunk_size -c copy "$output_dir/chunk_$i.wav"

# Process the chunk
audiosr -i "$output_dir/chunk_$i.wav" --ddim_steps 150 -gs 7.5 --model_name basic -d cuda --seed 42

# Find the most recent output file
output_file=$(ls -t ./output/*/chunk_${i}_AudioSR_Processed_48K.wav | head -n 1)

# Check if the output file was created
if [ ! -f "$output_file" ]; then
    echo "Error: Output file for chunk $i was not created."
    exit 1
fi

# Move the processed file to the final output directory
mv "$output_file" "$final_output_dir/chunk_${i}_processed.wav"

# Remove the original chunk file to save space
rm "$output_dir/chunk_$i.wav"

# Optionally, clear GPU memory (remove if causing issues)
nvidia-smi --gpu-reset || true

echo "Chunk $i processed and cleaned up."

done

echo "All chunks processed. Concatenating final output..."

Prepare a list of processed files

processed_files=$(ls "$final_output_dir"/chunk_*_processed.wav | sort -V | sed 's/^/file /')
echo "$processed_files" > "$final_output_dir/file_list.txt"

Concatenate processed chunks

ffmpeg -f concat -safe 0 -i "$final_output_dir/file_list.txt" -c copy "$final_output_dir/output_upscaled.wav"

Check if the final output file was created

if [ ! -f "$final_output_dir/output_upscaled.wav" ]; then
echo "Error: Final output file was not created."
exit 1
fi

Clean up

rm "$final_output_dir"/chunk_*_processed.wav "$final_output_dir/file_list.txt"

echo "Processing complete. Final output file: $final_output_dir/output_upscaled.wav"`

@wen0320
Copy link

wen0320 commented Aug 17, 2024

If segmented and then synthesized, will it affect the quality of the audio?

@JuvenileLocksmith
Copy link

I have no idea but intuitively I do not think so. But you can experiment and find out.

@Sedherthe
Copy link
Author

I didn't get a chance to try this back, but based on my experiments with other such models, it does affect the overall tone / and even the quality sometimes too.

@wen0320
Copy link

wen0320 commented Aug 20, 2024

Yes,I tested it, and the result is that it changes the timbre of the voice, making the sound deeper.

@JuvenileLocksmith
Copy link

An intriguing observation: At what length does this phenomenon begin to manifest? Does it occur in clips under 15 seconds, or perhaps around 30 seconds? If this issue is widely experienced, there must be a specific threshold where it becomes apparent, suggesting that tools might be ineffective for clips below that duration.

Applio, for instance, seems to utilise a similar tool with extensive chunking. If memory serves, the tool on Replicate also employs chunking. Most tools performing tasks like audio super-resolution, particularly those grounded in deep learning, rely on chunking to process audio in segments. This approach is necessary because processing entire long audio clips in one go can be computationally prohibitive or may result in degradation due to the challenges of maintaining coherence over extended durations.

@wen0320
Copy link

wen0320 commented Aug 20, 2024

My conclusion is that no matter the length of the audio, this issue will occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants