Compute Requirements and Execution Time #61

Sedherthe · 2024-06-29T08:31:26Z

Failed utterly to run it on 16 GB RAM. Can anyone add the compute requirements and the time it took to execute AudioSR? Looking for a benchmark across GPU vs CPU vs Audio Duration.

JuvenileLocksmith · 2024-08-03T11:05:55Z

Your file is likely too big, on 24GB vram best I could do was 30-second chunks. Try this script as your orchestrater and start with 10 second chunks:

`#!/bin/bash

set -e # Exit immediately if a command exits with a non-zero status.

Input file and output directories

input_file="/workspace/2band.wav"
output_dir="/workspace/output_chunks"
final_output_dir="/workspace/final_output"

Create output directories

mkdir -p "$output_dir" "$final_output_dir"

Set chunk size to 30 seconds

chunk_size=30

echo "Processing audio in $chunk_size-second chunks"

Get the total duration of the input file

duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$input_file")
num_chunks=$(echo "($duration + $chunk_size - 1) / $chunk_size" | bc)

Process audio in chunks

for i in $(seq 0 $((num_chunks-1))); do
start_time=$(echo "$i * $chunk_size" | bc)

echo "Processing chunk $i starting at $start_time seconds"

# Extract a chunk of audio
ffmpeg -i "$input_file" -ss $start_time -t $chunk_size -c copy "$output_dir/chunk_$i.wav"

# Process the chunk
audiosr -i "$output_dir/chunk_$i.wav" --ddim_steps 150 -gs 7.5 --model_name basic -d cuda --seed 42

# Find the most recent output file
output_file=$(ls -t ./output/*/chunk_${i}_AudioSR_Processed_48K.wav | head -n 1)

# Check if the output file was created
if [ ! -f "$output_file" ]; then
    echo "Error: Output file for chunk $i was not created."
    exit 1
fi

# Move the processed file to the final output directory
mv "$output_file" "$final_output_dir/chunk_${i}_processed.wav"

# Remove the original chunk file to save space
rm "$output_dir/chunk_$i.wav"

# Optionally, clear GPU memory (remove if causing issues)
nvidia-smi --gpu-reset || true

echo "Chunk $i processed and cleaned up."

done

echo "All chunks processed. Concatenating final output..."

Prepare a list of processed files

processed_files=$(ls "$final_output_dir"/chunk_*_processed.wav | sort -V | sed 's/^/file /')
echo "$processed_files" > "$final_output_dir/file_list.txt"

Concatenate processed chunks

ffmpeg -f concat -safe 0 -i "$final_output_dir/file_list.txt" -c copy "$final_output_dir/output_upscaled.wav"

Check if the final output file was created

if [ ! -f "$final_output_dir/output_upscaled.wav" ]; then
echo "Error: Final output file was not created."
exit 1
fi

Clean up

rm "$final_output_dir"/chunk_*_processed.wav "$final_output_dir/file_list.txt"

echo "Processing complete. Final output file: $final_output_dir/output_upscaled.wav"`

wen0320 · 2024-08-17T07:50:26Z

If segmented and then synthesized, will it affect the quality of the audio？

JuvenileLocksmith · 2024-08-19T18:23:08Z

I have no idea but intuitively I do not think so. But you can experiment and find out.

Sedherthe · 2024-08-20T07:45:45Z

I didn't get a chance to try this back, but based on my experiments with other such models, it does affect the overall tone / and even the quality sometimes too.

wen0320 · 2024-08-20T11:15:43Z

Yes，I tested it, and the result is that it changes the timbre of the voice, making the sound deeper.

JuvenileLocksmith · 2024-08-20T11:43:18Z

An intriguing observation: At what length does this phenomenon begin to manifest? Does it occur in clips under 15 seconds, or perhaps around 30 seconds? If this issue is widely experienced, there must be a specific threshold where it becomes apparent, suggesting that tools might be ineffective for clips below that duration.

Applio, for instance, seems to utilise a similar tool with extensive chunking. If memory serves, the tool on Replicate also employs chunking. Most tools performing tasks like audio super-resolution, particularly those grounded in deep learning, rely on chunking to process audio in segments. This approach is necessary because processing entire long audio clips in one go can be computationally prohibitive or may result in degradation due to the challenges of maintaining coherence over extended durations.

wen0320 · 2024-08-20T11:50:55Z

My conclusion is that no matter the length of the audio, this issue will occur.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute Requirements and Execution Time #61

Compute Requirements and Execution Time #61

Sedherthe commented Jun 29, 2024 •

edited

Loading

JuvenileLocksmith commented Aug 3, 2024

wen0320 commented Aug 17, 2024

JuvenileLocksmith commented Aug 19, 2024

Sedherthe commented Aug 20, 2024

wen0320 commented Aug 20, 2024

JuvenileLocksmith commented Aug 20, 2024

wen0320 commented Aug 20, 2024

Compute Requirements and Execution Time #61

Compute Requirements and Execution Time #61

Comments

Sedherthe commented Jun 29, 2024 • edited Loading

JuvenileLocksmith commented Aug 3, 2024

Input file and output directories

Create output directories

Set chunk size to 30 seconds

Get the total duration of the input file

Process audio in chunks

Prepare a list of processed files

Concatenate processed chunks

Check if the final output file was created

Clean up

wen0320 commented Aug 17, 2024

JuvenileLocksmith commented Aug 19, 2024

Sedherthe commented Aug 20, 2024

wen0320 commented Aug 20, 2024

JuvenileLocksmith commented Aug 20, 2024

wen0320 commented Aug 20, 2024

Sedherthe commented Jun 29, 2024 •

edited

Loading