Skip to content

Senzt/CNNBSL

Repository files navigation

Binaural Sound Localization with CNN - TensorFlow 2

This Convolutional Neural Network (CNN) model was built based on (Pang et al., 2019) with some adjustment (see below), and implemented by TensorFlow 2. I created this model to help people interested in learning how to build deep learning for binaural localization tasks.

  • C. Pang, H. Liu and X. Li, "Multitask Learning of Time-Frequency CNN for Sound Source Localization," in IEEE Access, vol. 7, pp. 40725-40737, 2019, doi: 10.1109/ACCESS.2019.2905617.

This model aims to predict only the azimuth (without elevation) range from -90 to 90. The features used are Interaural Phase Difference (IPD), and Interaural Level Difference (ILD), extracted in the time–frequency domain.

The general data fed into this code are spatialized monaural sounds generated by convolving with head-related impulse responses (HRIRs). The HRIRs used in this model come from MIT Media Lab. You can adjust the code to learn from the actual recordings of binaural signals.

Here are some differences of the CNN model compared to the model this paper advances:

  1. Instead of concatenating IPD and ILD into the same matrix, the CNN model stores them separately in two channels. Also, the length of the features is longer.
  2. The CNN model performs regression prediction instead of classification because the range of azimuth from -90 to 90 is a continuous number instead of 37 classes. However, the evaluation of this model treats the prediction result with five azimuths difference as correct .
  3. There are some minor setting differences, such as batch size, optimizer, and regularization.

Directions for use:

  1. You need to put the .wav files (at least one) into four folders.
  2. The directories for speech/sound and noises for training sets are SpeechTRAIN and NoiseTRAIN, respectively.
  3. The directories for speech/sound and noises for testing sets are SpeechTEST and NoiseTEST, respectively.
  4. Run SpatialiseTRAIN.py and SpatialiseTEST.py to generate binaural samples and labels for training and testing, respectively.
  5. Fit the model with TrainingCNN.py.
  6. Test the accuracy by running Testing.py.

Software requirements:

  1. Librosa, https://librosa.org/doc/latest/index.html
  2. Python 3.6, https://www.python.org/
  3. TensorFlow 2, https://www.tensorflow.org/

About

Binaural Sound Localization with CNN - TensorFlow 2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages