Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Emilia and Emilia-Pipe #227

Merged
merged 55 commits into from
Jul 9, 2024
Merged

Release Emilia and Emilia-Pipe #227

merged 55 commits into from
Jul 9, 2024

Conversation

yuantuo666
Copy link
Collaborator

@yuantuo666 yuantuo666 commented Jul 1, 2024

✨ Description

We release the Emilia, an extensive, multilingual, and diverse dataset, and Emilia-Pipe, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation.

Major contribution for this PR: @HarryHe11 @shangqwe123 @yuantuo666 @lixuyuan102

🚧 Related Issues

None

👨‍💻 Changes Proposed

  • Update README.md News
  • Add a README.md in preprocessors/Emilia directory to introduce Emilia-Pipe
  • Integrate processing pipeline Emilia-Pipe under preprocessors/Emilia directory

🧑‍🤝‍🧑 Who Can Review?

@HarryHe11 @jiaqili3 @RMSnow @HeCheng0625

🛠 TODO

None

✅ Checklist

  • Code has been reviewed
  • Code complies with the project's code standards and best practices
  • Code has passed all tests
  • Code does not affect the normal use of existing features
  • Code has been commented properly
  • Documentation has been updated (if applicable)
  • Demo/checkpoint has been attached (if applicable)

@HarryHe11
Copy link
Collaborator

Hi Chaoren @yuantuo666 , thank you so much for raising this PR and all of your efforts @yuantuo666 @shangqwe123 @lixuyuan102!

I notice that we haven't provide the list for the source audios yet in this pr. Maybe we could complete this part before merging!

Copy link
Collaborator

@HarryHe11 HarryHe11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuantuo666 @lixuyuan102 @shangqwe123

Thanks so much for Chaoren's efforts in preparing this final version of Emilia-Pipe, @yuantuo666! I think most of the code is of very good quality, neat, and well-commented!

Before merging this PR, I think we should provide the lists for the source audios and address my comments, as well as the important comments from Xueyao, Jiaqi, and Yuancheng. @RMSnow @jiaqili3 @HeCheng0625

preprocessors/Emilia/README.md Outdated Show resolved Hide resolved
preprocessors/Emilia/README.md Show resolved Hide resolved
preprocessors/Emilia/models/dnsmos.py Outdated Show resolved Hide resolved
preprocessors/Emilia/models/separate_fast.py Outdated Show resolved Hide resolved
preprocessors/Emilia/models/silero_vad.py Show resolved Hide resolved
Copy link
Collaborator

@RMSnow RMSnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great works! The most important thing is to add some high-level descriptions for Emilia. Such a description should appear both in Emilia's readme and Amphion's main readme.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
preprocessors/Emilia/README.md Outdated Show resolved Hide resolved
preprocessors/Emilia/config.json Show resolved Hide resolved
preprocessors/Emilia/env.sh Show resolved Hide resolved
@HarryHe11
Copy link
Collaborator

@RMSnow Hi Xueyao, thank you so much for your helpful suggestions! I have addressed all of your comments accordingly!

@RMSnow RMSnow self-requested a review July 2, 2024 06:58
@HarryHe11
Copy link
Collaborator

HarryHe11 commented Jul 6, 2024

@yuantuo666 @shangqwe123 @lixuyuan102 Chaoren, Zengqiang, Xuyuan,

I think we'd better merge this pr after we got our arxiv link, and done all of the following tasks.

  • Codes are up to date (all tasks are completed)
  • Codes are tested.
  • Demopage is up to date (the abstract is consistent with our arxiv paper).
  • Licenses and copyright declarations have been appropriately added in Github, huggingface.
  • Huggingface releases the meta-information for the source audios.
  • Amphion's main README relayed the news of Emilia's release and provides a link to the sub-repo.
  • Emilia's README introduces the dataset adequately.
  • Emilia's README provides appropriate URLs to the demopage. 2. Hugging Face Page. 3. Arxiv paper
  • Emilia's README, Demopage, and huggingface display references to Emilia's arxiv work and amphion.

Copy link
Collaborator

@RMSnow RMSnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great works!

@RMSnow RMSnow merged commit ba335d2 into open-mmlab:main Jul 9, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants