wsj0-convert

A Python script to convert the WSJ0 speech corpus to more friendly file formats.

Requirements

sph2pipe in PATH. Get it from here.
ffmpeg in PATH (not required if using the --no-flac option)

Usage

Simply run:

python wsj0_convert.py <path-to-WSJ0> <output-dir>

This will create an audio directory inside <output-dir> containing all the audio files in .flac format, or in .wav format if using the --no-flac option. The audio files are organized in sub-directories by speakers.

Notes

Only .wv1 files are converted. .wv2 files are skipped.
The original folder structure is not respected. Output files are organized in sub-directories by speaker.
Total output size if 3.9 GB when using .flac format.
For extra speaker information (e.g. gender), see here.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
wsj0_convert.py		wsj0_convert.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wsj0-convert

Requirements

Usage

Notes

About

Releases

Packages

Languages

philgzl/wsj0-convert

Folders and files

Latest commit

History

Repository files navigation

wsj0-convert

Requirements

Usage

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages