Citation:
@article{chen2023pre,
title={Pre-Finetuning for Few-Shot Emotional Speech Recognition},
author={Chen, Maximillian and Yu, Zhou},
journal={INTERSPEECH 2023},
year={2023}
}
Paper Link: https://arxiv.org/abs/2302.12921
Request Access to Wav2Vec2.0 Base pre-finetuned on four corpora: https://drive.google.com/file/d/1N1JxqN8Ts2OWcoBTiHYt693DZF2sackV/view?usp=share_link
Repository under construction.
Please additionally cite the corresponding corpora if you use any of them for fine-tuning or pre-finetuning.
Emotional Speech Dataset: https://github.com/HLTSingapore/Emotional-Speech-Data
IEMOCAP: https://sail.usc.edu/iemocap/
Mandarin Affective Speech: https://catalog.ldc.upenn.edu/LDC2007S09
MSP-Podcast: https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html
MSP-Improv: https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Improv.html
transformers 4.18.0
Currently, the Trainer class for multitask learning only has single GPU support.