Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Emilia and Emilia-Pipe #227

Merged
merged 55 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
d32867c
init: Emilia Pipeline
yuantuo666 Jun 22, 2024
35aa804
Update README.md
HarryHe11 Jun 22, 2024
b37d86e
Update README.md
HarryHe11 Jun 22, 2024
a62a796
Update README.md
HarryHe11 Jun 22, 2024
87ccefb
Update README.md
HarryHe11 Jun 22, 2024
cc0470a
Update README.md
HarryHe11 Jun 22, 2024
b6508f1
Update README.md
lixuyuan102 Jul 1, 2024
e71b6d5
Update README.md
HarryHe11 Jul 1, 2024
50ddee9
Update README.md
lixuyuan102 Jul 1, 2024
2c49f03
Merge branch 'main' into main
yuantuo666 Jul 1, 2024
0275fcc
Update README.md
shangqwe123 Jul 1, 2024
a6cf792
Update README.md
shangqwe123 Jul 1, 2024
be09e86
Update README.md
lixuyuan102 Jul 1, 2024
bf68cf0
Update README.md
HarryHe11 Jul 1, 2024
79f4352
Update README.md
RMSnow Jul 1, 2024
43356f7
Update README.md
HarryHe11 Jul 2, 2024
d81ffd7
Update README.md
HarryHe11 Jul 2, 2024
cbab338
Update README.md
HarryHe11 Jul 2, 2024
3b1f1fd
Update env.sh
HarryHe11 Jul 2, 2024
d282fea
Update README.md
HarryHe11 Jul 2, 2024
2d4d87c
fix: LICENSE & TODO
yuantuo666 Jul 2, 2024
7c33a9c
fix: reformat
yuantuo666 Jul 2, 2024
f2366d2
Update README.md
lixuyuan102 Jul 3, 2024
a5eb5b7
Update README.md
lixuyuan102 Jul 3, 2024
fecdbfd
Update README.md
HarryHe11 Jul 3, 2024
beb4858
Update README.md
HarryHe11 Jul 3, 2024
55009e4
Update README.md
HarryHe11 Jul 3, 2024
7d1ed25
Update README.md
HarryHe11 Jul 3, 2024
315d3b7
Update README.md
HarryHe11 Jul 3, 2024
eedeed9
Update README.md
HarryHe11 Jul 4, 2024
970fa7b
Update README.md
HarryHe11 Jul 4, 2024
e446013
Update README.md
HarryHe11 Jul 4, 2024
96586eb
Update main.py
HarryHe11 Jul 5, 2024
96fa2a0
Update main.py
HarryHe11 Jul 5, 2024
080fdd8
Update main.py
HarryHe11 Jul 5, 2024
dbfe7a3
Update TODOs
HarryHe11 Jul 5, 2024
3927074
Update silero_vad.py
HarryHe11 Jul 5, 2024
5f3157d
Add comments on main.py
HarryHe11 Jul 5, 2024
cd550e5
update: todos
yuantuo666 Jul 6, 2024
56d8ab5
Update README.md
lixuyuan102 Jul 7, 2024
cba17f6
Update README.md
lixuyuan102 Jul 7, 2024
8f73a15
Update README.md
lixuyuan102 Jul 7, 2024
ff2fa00
update: test bug fix
yuantuo666 Jul 7, 2024
0a7b330
update: license
yuantuo666 Jul 7, 2024
3d78ce7
update: README
yuantuo666 Jul 7, 2024
7aa066b
Update README.md
HarryHe11 Jul 9, 2024
183466b
Update README.md
HarryHe11 Jul 9, 2024
d1b2946
Update README.md
HarryHe11 Jul 9, 2024
cf17295
Update README.md
yuantuo666 Jul 9, 2024
98e9991
Adding Demo Page link
yuantuo666 Jul 9, 2024
d7e229a
Update README.md
HarryHe11 Jul 9, 2024
c94523d
Align Reference Indent
yuantuo666 Jul 9, 2024
034f32c
Update README.md
RMSnow Jul 9, 2024
1110e63
Update README.md
HarryHe11 Jul 9, 2024
5fea925
Update README.md
RMSnow Jul 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,9 @@

In addition to the specific generation tasks, Amphion includes several **vocoders** and **evaluation metrics**. A vocoder is an important module for producing high-quality audio signals, while evaluation metrics are critical for ensuring consistent metrics in generation tasks.

Here is the Amphion v0.1 demo, whose voice, audio effects, and singing voice are generated by our models. Just enjoy it!

[amphion-v0.1-en](https://github.com/open-mmlab/Amphion/assets/24860155/7fcdcea5-3d95-4b31-bd93-4b4da734ef9b
)

## πŸš€Β News
- **2024/6/17**: Amphion has a new release for its VALL-E models, it uses Llama as its underlying architecture and has better model performance, faster training speed, and more readable codes compared to our first version. [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](egs/tts/VALLE_V2/README.md)
- **2024/07/01**: Amphion now releases **Emilia**, the first open-source multilingual in-the-wild dataset for speech generation with over 101k hours of speech data, and the **Emilia-Pipe**, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation! [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](preprocessors/Emilia/README.md)
- **2024/06/17**: Amphion has a new release for its **VALL-E** model! It uses Llama as its underlying architecture and has better model performance, faster training speed, and more readable codes compared to our first version. [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](egs/tts/VALLE_V2/README.md)
- **2024/03/12**: Amphion now support **NaturalSpeech3 FACodec** and release pretrained checkpoints. [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2403.03100) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-model-yellow)](https://huggingface.co/amphion/naturalspeech3_facodec) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-demo-pink)](https://huggingface.co/spaces/amphion/naturalspeech3_facodec) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](models/codec/ns3_codec/README.md)
- **2024/02/22**: The first Amphion visualization tool, **SingVisio**, release. [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2402.12660) [![openxlab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/Amphion/SingVisio) [![Video](https://img.shields.io/badge/Video-Demo-orange)](https://github.com/open-mmlab/Amphion/assets/33707885/0a6e39e8-d5f1-4288-b0f8-32da5a2d6e96) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](egs/visualization/SingVisio/README.md)
- **2023/12/18**: Amphion v0.1 release. [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2312.09911) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Amphion-pink)](https://huggingface.co/amphion) [![youtube](https://img.shields.io/badge/YouTube-Demo-red)](https://www.youtube.com/watch?v=1aw0HhcggvQ) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](https://github.com/open-mmlab/Amphion/pull/39)
Expand Down Expand Up @@ -79,7 +75,8 @@ Amphion provides a comprehensive objective evaluation of the generated audio. Th

### Datasets

Amphion unifies the data preprocess of the open-source datasets including [AudioCaps](https://audiocaps.github.io/), [LibriTTS](https://www.openslr.org/60/), [LJSpeech](https://keithito.com/LJ-Speech-Dataset/), [M4Singer](https://github.com/M4Singer/M4Singer), [Opencpop](https://wenet.org.cn/opencpop/), [OpenSinger](https://github.com/Multi-Singer/Multi-Singer.github.io), [SVCC](http://vc-challenge.org/), [VCTK](https://datashare.ed.ac.uk/handle/10283/3443), and more. The supported dataset list can be seen [here](egs/datasets/README.md) (updating).
- Amphion unifies the data preprocess of the open-source datasets including [AudioCaps](https://audiocaps.github.io/), [LibriTTS](https://www.openslr.org/60/), [LJSpeech](https://keithito.com/LJ-Speech-Dataset/), [M4Singer](https://github.com/M4Singer/M4Singer), [Opencpop](https://wenet.org.cn/opencpop/), [OpenSinger](https://github.com/Multi-Singer/Multi-Singer.github.io), [SVCC](http://vc-challenge.org/), [VCTK](https://datashare.ed.ac.uk/handle/10283/3443), and more. The supported dataset list can be seen [here](egs/datasets/README.md) (updating).
- Amphion (exclusively) supports the [**Emilia**](preprocessors/Emilia/README.md) dataset and its preprocessing pipeline **Emilia-Pipe** for in-the-wild speech data!

### Visualization

Expand Down
123 changes: 123 additions & 0 deletions preprocessors/Emilia/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
## Emilia

This is the official repository for the **Emilia** dataset and the **Emilia-Pipe** source code.

Emilia is a comprehensive, multilingual dataset featuring over 101k hours of speech in six languages: English (En), Chinese (Zh), German (De), French (Fr), Japanese (Ja), and Korean (Ko). The dataset includes diverse speech samples with various speaking styles.

Emilia-Pipe is the first open-source preprocessing pipeline designed to transform raw, in-the-wild speech data into high-quality training data with annotations for speech generation. This pipeline can process one hour of raw audio into model-ready data in just a few minutes, requiring only the URLs of the audio or video sources.

By downloading the raw audio files from our provided list of URLs and processing them with Emilia-Pipe, users can obtain the Emilia dataset. Additionally, users can easily use Emilia-Pipe to preprocess their own raw speech data for custom needs. By open-sourcing the Emilia-Pipe code, we aim to enable the speech community to collaborate on large-scale speech generation research.

This README file will introduce the usage of the Emilia-Pipe and provide an installation guide.

## Pipeline Overview

The Emilia-Pipe includes the following major steps:

0. Standardization:Audio normalization
1. Source Separation: Long audio -> Long audio without BGM
2. Speaker Diarization: Get medium-length single-speaker speech data
3. Fine-grained Segmentation by VAD: Get 3-30s single-speaker speech segments
4. ASR: Get transcriptions of the speech segments
5. Filtering: Obtain the final processed dataset

## Setup Steps

### 0. Prepare Environment

1. Install Python and CUDA.
2. Run the following commands to install the required packages:

```bash
conda create -y -n AudioPipeline python=3.9
conda activate AudioPipeline

bash env.sh
```

3. Download the model files.
yuantuo666 marked this conversation as resolved.
Show resolved Hide resolved
Bgm Separator:[UVR-MDX-NET-Inst_HQ_3](https://github.com/TRvlvr/model_repo/releases/tag/all_public_uvr_models)
VAD:[Silero](https://github.com/snakers4/silero-vad)
SpeakerDiarization: [pyannote](https://github.com/pyannote/pyannote-audio)
ASR: [whisperx-medium](https://github.com/m-bain/whisperX)
AutoMOS:[DNSMOS P. 835](https://github.com/microsoft/DNS-Challenge)

### 1. Config File

```json
{
"language": {
"multilingual": true,
"supported": [
"zh",
"en",
"fr",
"ja",
"ko",
"de"
]
},
"entrypoint": {
// TODO: Fill in the input_folder_path.
"input_folder_path": "examples", // #1: Data input
"SAMPLE_RATE": 24000
},
"separate": {
"step1": {
// TODO: Fill in the source separation model's path.
"model_path": "/path/to/model/separate_model/UVR-MDX-NET-Inst_HQ_3.onnx", // #2: Model path
"denoise": true,
"margin": 44100,
"chunks": 15,
"n_fft": 6144,
"dim_t": 8,
"dim_f": 3072
}
},
"mos_model": {
// TODO: Fill in the DNSMOS prediction model's path.
"primary_model_path": "/path/to/model/mos_model/DNSMOS/sig_bak_ovr.onnx" // #3: Model path
},
// TODO: Fill in your huggingface acess token for pynannote.
"huggingface_token": "<HUGGINGFACE_ACCESS_TOKEN>" // #4: Huggingface access token for pyannote
}
```

- #1: Data to be processed
- #2 - #3: Model path configuration
- #4: Huggingface access token


### 2. Running Script

1. Change the `input_folder_path` in `config.json` to the folder path where the downloaded audio files are stored
2. Run the following command to process the audio files:

```bash
conda activate AudioPipeline
export CUDA_VISIBLE_DEVICES=0 # Setting the GPU to run the pipeline

python main.py
```

3. Processed audio will be saved into `input_folder_path_processed`.


### 3. Check the Results

The processed audio (default 24k sample rate) files will be saved into `input_folder_path_processed`. The results will be saved in the same folder and include the following information:

1. **MP3 file**: `<original_name>_<idx>.mp3`
2. **JSON file**: `<original_name>.json`

yuantuo666 marked this conversation as resolved.
Show resolved Hide resolved
```json
[
{
"text": "So, don't worry about that. But, like for instance, like yesterday was very hard for me to say, you know what, I should go to bed.", // Transcription
"start": 67.18, // Start timestamp
"end": 74.41, // End timestamp
"language": "en", // Language
"dnsmos": 3.44 // DNSMOS score
}
]
```
35 changes: 35 additions & 0 deletions preprocessors/Emilia/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"language": {
"multilingual": true,
"supported": [
"zh",
"en",
"fr",
"ja",
"ko",
"de"
]
},
"entrypoint": {
// TODO: Fill in the input_folder_path.
"input_folder_path": "examples",
"SAMPLE_RATE": 24000
},
"separate": {
"step1": {
// TODO: Fill in the source separation model's path.
"model_path": "/path/to/model/separate_model/UVR-MDX-NET-Inst_HQ_3.onnx",
yuantuo666 marked this conversation as resolved.
Show resolved Hide resolved
"denoise": true,
"margin": 44100,
"chunks": 15,
"n_fft": 6144,
"dim_t": 8,
"dim_f": 3072
}
},
"mos_model": {
// TODO: Fill in the DNSMOS prediction model's path.
"primary_model_path": "/path/to/model/mos_model/DNSMOS/sig_bak_ovr.onnx"
},
"huggingface_token": "<HUGGINGFACE_ACCESS_TOKEN>"
}
10 changes: 10 additions & 0 deletions preprocessors/Emilia/env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash
yuantuo666 marked this conversation as resolved.
Show resolved Hide resolved
# Copyright (c) 2024 Amphion.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.

conda install ffmpeg -y
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y
pip install -r requirements.txt
pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
Loading