Generating human faces through conditional GANs which are conditioned on emotions identified from a human speech using SER (Speech Emotion Recognition)
An image showing the overall pipeline
Below is a short demo of the web app showing generation of human faces based on emotion identified from human speech.
pandas==1.0.4
Keras==2.3.1
librosa==0.7.2
streamlit==0.61.0
tensorflow==2.0.0
numpy==1.18.1
tqdm==4.42.0
scipy==1.4.1
tensorflow_hub==0.8.0
matplotlib==3.1.3
Flask==1.1.2
ipython==7.17.0
Pillow==7.2.0
pyaudio==0.2.11
scikit_learn==0.23.2
Project
├── speech_emotion_recognition
│ ├── code
│ │ ├── ser_training.ipynb
│ │ ├── ser_prediction.ipynb
│ ├── data
│ │ ├── Audio_Speech_Actors_01-24
│ │ │ ├── Actor_01
│ │ │ │ ├── 03-01-01-01-01-01-01.wav
│ │ │ │ ├── 03-01-01-01-01-02-01.wav
│ │ │ │ ...
│ │ │ ├── Actor_02
│ │ │ ...
│ │ │ ├── Actor_24
│ ├── weights
├── conditional_gan
│ ├── code
│ │ ├── cgan_training.ipynb
│ │ ├── cgan_prediction.ipynb
│ ├── data
│ │ ├── fer2013.csv
│ ├── weights
├── streamlit_webapp
The dataset can be downloaded at:
https://www.kaggle.com/uwrfkaggler/ravdess-emotional-speech-audio
and should be put it in the location
./speech_emotion_recognition/data/
It consists of speech audios in the voice of 24 actors. 5 sample audio file by the first actor has been put in the above location as an example.
The dataset can be downloaded at:
https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data
and should be put it in the location
./conditional_gan/data/
We are interested in the "fer2013.csv" file from the data bundle. A sample file containing data for only 5 faces has been put as an example.
[Note1: Please host and run these files on Google Colab]
[Note2: Please mount the drive where data files are present(follow the directory structure)]
For each of SER and cGAN, there are two separate Jupyter Notebook files, one for training and one for prediction.
./speech_emotion_recognition/code/ser_training.ipynb
The weights obtained are stored in ./speech_emotion_recognition/weights
The pretrained weights corresponding to the best model are already put at this location.
./speech_emotion_recognition/code/ser_prediction.ipynb
./conditional_gan/code/cgan_training.ipynb
./conditional_gan/code/cgan_prediction.ipynb
- Mirza, M. and Osindero, S., 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.
- Livingstone, S.R. and Russo, F.A., 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS one, 13(5), p.e0196391.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y., 2014. Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
- Francois Chollet. 2017. Deep Learning with Python (1st. ed.). Manning Publications Co., USA.
- https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge
- https://medium.com/@ma.bagheri/a-tutorial-on-conditional-generative-adversarial-nets-keras-implementation-694dcafa6282
- https://machinelearningmastery.com/how-to-develop-a-conditional-generative-adversarial-network-from-scratch/