Binaural Sound Localization with CNN - TensorFlow 2

This Convolutional Neural Network (CNN) model was built based on (Pang et al., 2019) with some adjustment (see below), and implemented by TensorFlow 2. I created this model to help people interested in learning how to build deep learning for binaural localization tasks.

C. Pang, H. Liu and X. Li, "Multitask Learning of Time-Frequency CNN for Sound Source Localization," in IEEE Access, vol. 7, pp. 40725-40737, 2019, doi: 10.1109/ACCESS.2019.2905617.

This model aims to predict only the azimuth (without elevation) range from -90 to 90. The features used are Interaural Phase Difference (IPD), and Interaural Level Difference (ILD), extracted in the time–frequency domain.

The general data fed into this code are spatialized monaural sounds generated by convolving with head-related impulse responses (HRIRs). The HRIRs used in this model come from MIT Media Lab. You can adjust the code to learn from the actual recordings of binaural signals.

https://sound.media.mit.edu/resources/KEMAR.html

Here are some differences of the CNN model compared to the model this paper advances:

Instead of concatenating IPD and ILD into the same matrix, the CNN model stores them separately in two channels. Also, the length of the features is longer.
The CNN model performs regression prediction instead of classification because the range of azimuth from -90 to 90 is a continuous number instead of 37 classes. However, the evaluation of this model treats the prediction result with five azimuths difference as correct .
There are some minor setting differences, such as batch size, optimizer, and regularization.

Directions for use:

You need to put the .wav files (at least one) into four folders.
The directories for speech/sound and noises for training sets are SpeechTRAIN and NoiseTRAIN, respectively.
The directories for speech/sound and noises for testing sets are SpeechTEST and NoiseTEST, respectively.
Run SpatialiseTRAIN.py and SpatialiseTEST.py to generate binaural samples and labels for training and testing, respectively.
Fit the model with TrainingCNN.py.
Test the accuracy by running Testing.py.

Software requirements:

Librosa, https://librosa.org/doc/latest/index.html
Python 3.6, https://www.python.org/
TensorFlow 2, https://www.tensorflow.org/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Binaural Sound Localization with CNN - TensorFlow 2

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
NoiseTEST		NoiseTEST
NoiseTRAIN		NoiseTRAIN
README.md		README.md
SpatialiseTEST.py		SpatialiseTEST.py
SpatialiseTRAIN.py		SpatialiseTRAIN.py
SpeechTEST		SpeechTEST
SpeechTRAIN		SpeechTRAIN
Testing.py		Testing.py
TrainingCNN.py		TrainingCNN.py
hrir_MIT.mat		hrir_MIT.mat

License

Senzt/CNNBSL

Folders and files

Latest commit

History

Repository files navigation

Binaural Sound Localization with CNN - TensorFlow 2

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages