Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support building Kaldi to WASM with OpenBLAS #4954

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

msqr1
Copy link

@msqr1 msqr1 commented Oct 15, 2024

Kaldi with OpenBLAS 0.3.28 with some mini hacks and performance increased by 20% (#4952)

@msqr1
Copy link
Author

msqr1 commented Oct 16, 2024

@jtrmal PTAL at the changes as well as the guide itself here: https://github.com/msqr1/kaldi-wasm2. I also have to force the number threads spawned by Kaldi to be 1 because WASM is quite complicated with multiple threads (we can support that later). I know g_num_threads control this, but is there any other place where kaldi spawn threads?

Thanks!

@jtrmal
Copy link
Contributor

jtrmal commented Oct 16, 2024 via email

@msqr1
Copy link
Author

msqr1 commented Oct 16, 2024

Could you answer my question so I can work on it?

@msqr1 msqr1 changed the title Support building Kaldi to WASM with OpenBLAS (#4952) Support building Kaldi to WASM with OpenBLAS Oct 16, 2024
@msqr1 msqr1 changed the title Support building Kaldi to WASM with OpenBLAS Support building Kaldi to WASM with OpenBLAS (#4952) Oct 16, 2024
@msqr1 msqr1 changed the title Support building Kaldi to WASM with OpenBLAS (#4952) Support building Kaldi to WASM with OpenBLAS Oct 16, 2024
@jtrmal
Copy link
Contributor

jtrmal commented Oct 16, 2024 via email

@msqr1
Copy link
Author

msqr1 commented Oct 16, 2024

Is there any other place where kaldi spawn threads other than the controlled g_num_threads in kaldi_thread.cc?

@danpovey
Copy link
Contributor

I think just from libraries e.g. some math libraries, like MKL, spawn their own threads. (This is usually not helpful and should be disabled by appopriate environment variables or liberary versions)

@msqr1
Copy link
Author

msqr1 commented Oct 17, 2024

OK, so I can force kaldi to spawn 1 thread by setting g_num_threads to 1. I will have to force all creations of std::thread to be 1 when building to WASM (except the CUDA ones), right?

Btw, what is the difference between g_num_threads =1 vs =0? @danpovey

@danpovey
Copy link
Contributor

The vast majority of Kaldi programs only use one thread anyway so you probably don't have to do anything in most cases.

@csukuangfj
Copy link
Contributor

By the way, sherpa-onnx also uses a single thread in its WebAssembly ASR and TTS APPs. And the speed also looks OK, e.g., it is able to do real-time speech recongition.

@msqr1
Copy link
Author

msqr1 commented Oct 17, 2024

Thanks! I will TAL at that later. For now, I'm just fixing the threading issue to get this donr!

@danpovey
Copy link
Contributor

I wouldn't attempt to complie the entirety of Kaldi to WASM because the binary size would be enormous. There are
lots of templates and many libraries. I'd compile a single binary at a time. IDK much about how WASM works though
and how the linking etc. is done.

@msqr1
Copy link
Author

msqr1 commented Oct 20, 2024

Would you PTAL again @danpovey, @jtrmal?

@jtrmal
Copy link
Contributor

jtrmal commented Oct 21, 2024 via email

@csukuangfj
Copy link
Contributor

ideally running in a console without needing a browser

Yes, I think that is possible.

You can also run wasm with NodeJS in a console. We have been doing this with sherpa-onnx and we even provide an npm package with wasm.

@msqr1
Copy link
Author

msqr1 commented Oct 21, 2024

I'll try to do that

@jtrmal
Copy link
Contributor

jtrmal commented Oct 22, 2024

@msqr1 I have a comment regarding your guide at https://github.com/msqr1/kaldi-wasm2/tree/main --
In the line

CC=emcc HOSTCC=clang-20 TARGET=RISCV64_GENERIC USE_THREAD=0 NO_SHARED=1 BINARY=32 BUILD_SINGLE=1 BUILD_DOUBLE=1 BUILD_BFLOAT16=0 BUILD_COMPLEX16=0 BUILD_COMPLEX=0 CFLAGS='-fno-exceptions -fno-rtti' make -j$(nproc)

Is the clang-20 necessary? AFAIK thats still WIP unreleased version from git and as such it will be a lot of hassle for your users to get it. Also I was a bit surprised by the TARGET being riscV -- is that correct? Is WASM compatible with RISCV?

I tried clang-18 and it failed in ubuntu24.04 in docker on Apple M3

29.07 gfortran -O3 -Wall -frecursive -fno-optimize-sibling-calls  -fno-tree-vectorize  -o sblat1 sblat1.o ../libopenblas_riscv64_generic-r0.3.28.a -lgfortran -lgfortran -L/opt/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten  -lGL-getprocaddr -lal -lstubs-debug -lnoexit -lc-debug -ldlmalloc -lc++-noexcept -lc++abi-debug-noexcept -lsockets
29.08 /usr/bin/ld: /opt/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libc-debug.a: error adding symbols: file format not recognized
29.08 collect2: error: ld returned 1 exit status
29.08 make[1]: *** [Makefile:320: sblat1] Error 1
29.08 make[1]: *** Waiting for unfinished jobs....
29.84 make[1]: Leaving directory '/opt/openblas/test'
29.84 make: *** [Makefile:171: tests] Error 2

but that might be just openblas issue...

@jtrmal
Copy link
Contributor

jtrmal commented Oct 22, 2024

I was able to compile openblas using

CC=emcc HOSTCC=gcc TARGET=RISCV64_GENERIC USE_THREAD=0 NO_SHARED=1 NOFORTRAN=1 BINARY=64 BUILD_SINGLE=1 BUILD_DOUBLE=1 BUILD_BFLOAT16=0 BUILD_COMPLEX16=0 BUILD_COMPLEX=0 CFLAGS='-fno-exceptions -fno-rtti' make -j$(nproc)
  • setting NOFORTRAN=1 was important (actually not sure if kaldi will compile)
  • why BINARY=32 in your original setup? 4G of memory might not be enough for bigger models

@msqr1
Copy link
Author

msqr1 commented Oct 22, 2024

@jtrmal

In the line

CC=emcc HOSTCC=clang-20 TARGET=RISCV64_GENERIC USE_THREAD=0 NO_SHARED=1 BINARY=32 BUILD_SINGLE=1 BUILD_DOUBLE=1 BUILD_BFLOAT16=0 BUILD_COMPLEX16=0 BUILD_COMPLEX=0 CFLAGS='-fno-exceptions -fno-rtti' make -j$(nproc)

Is the clang-20 necessary? AFAIK thats still WIP unreleased version from git and as such it will be a lot of hassle for your users to get it.

Oh my bad, cross compiling OpenBLAS require the native compiler to be passed in, which, in my case, is clang-20. The default is gcc. Ideally though, we would want emcc to do everything here without the need for the native compiler, but I still haven't figured out how to do this yet. I will try again with emcc or gcc

Also I was a bit surprised by the TARGET being riscV -- is that correct? Is WASM compatible with RISCV?

OpenBLAS was written tailored for the machine because it's using that machine's assembly, as you can see the .S files from each target. For compiling to WASM, we will have you a target that uses pure C files, which RISCV64_GENRIC seems to be the only one. See OpenMathLib/OpenBLAS#3640

why BINARY=32 in your original setup? 4G of memory might not be enough for bigger models

WASM on browsers barely have any supports memory being larger than 4G by default. WASM64 (64-bit ptr size is the only difference) is not standardized yet. Emscripten still marks it as experimental. Besides, I don't think we should be running super heavy models on a browser.

@jtrmal
Copy link
Contributor

jtrmal commented Oct 22, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants