Add the F8E4M3 dtype for CUDA and CPU #2546

EricLBuehler · 2024-10-06T21:04:12Z

No description provided.

* Offset it * Freeze * Offset it * Offset it * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Try out vllm impl again * Remove debugs * Polish it up * Polish it up * Clippy * Remove test file * Add config for if neox * Fix bug * Fix bug * Cast cache type on rust side * Cast types * To dtype * Drop temp * Update casting * Update casting * Update casting * Create dtype in bf16 * Check type * Debug * Check dtype * Check dtype * Check dtype * Check dtype * Check dtype * Check dtype * Check dtype * Check dtype * Check dtype * Debug * Debug * Debug * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Check old method * Use mistral slow rope impl * Use mistral slow rope impl * Use mistral slow rope impl * Use mistral slow rope impl * Use mistral slow rope impl * Use mistral slow rope impl * Use mistral slow rope impl * Use mistral slow rope impl * Use mistral slow rope impl * Use mistral slow rope impl * Use mistral slow rope impl * Use mistral slow rope impl * Reseting * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Remove debug * Debug * Debug * Remove debug * Remove debug * Debug * Remove debug * Debug * Remove debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Debug * Try to use 3dim rotemb fused * Try to use 3dim rotemb fused * Remove contig and debug * Check handling * Cleanup * Fix * Remove prints * Lower block dim * Use fused layernorm * Pass batch size * Simplify internal API * Simplify internal API * Try slow * Try candle layer norm * Try candle layer norm * Fix dep of candle layer norm * Reshape input for rank 2 * Reshape input for rank 2 * Fix ref * Code style * Make dep optional * Ensure contig * Ensure contig * Ensure contig * Debug contig dmmv error * Debug contig dmmv error * Debug contig dmmv error * Debug contig dmmv error * Try other method * Try other method * Try other method * Try other method * Try other method * Use typestate to optimize * Use typestate to optimize * Fixes * Fixes * Fixes * Fixes * Fixes * Debug via using slow rmsnorm * Debug via using slow rope * Remove debug * More debugging * Remove debug * Remove debug * Remove debug * Add better error enum * Fix diff marker * Fix some things * Fix some things * Fix some things * Fix dummy backends * Re add from storage noop * Fix removed kvconcat custom op * Fix erroneous feature gate * Complete metal backend refactoring * Check if calling * Check if calling * Update default for force dmmv * Load atomic * Debug * Use mmvq * Update * Add the empty functions * Add rope new_partial function * Make variant of qmatmul pub * Make variant of qmatmul pub * Add the varbuilder set_device function * Only link stdc++ if target has msvc * Only link stdc++ if target has msvc * Only link stdc++ if target has msvc * Only link stdc++ if target has msvc * Handle case of device mapping * Handle case of device mapping * Add getter * Fix * Fix * Support nvcc flags in flash attn * Support nvcc flags in flash attn * Support nvcc flags in flash attn * Support nvcc flags in flash attn * Support nvcc flags in flash attn * Fixes * Fixes * Fix the tests * Fix the tests

* Support flash-attn in quantized phi3. (huggingface#2194) * Use flash-attn in gemma. (huggingface#2195) * Use flash-attn in gemma. * Fix flash-attn for head dim 256. * Remove candle-layer-norm --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>

* Add unfold * Format

* Add the quantize_onto api * Take ref * Clippy * Format * Add error checking

* Use flash-attn in gemma. * Fix for the fast bf16 cublas gemm. * Fix some clippy lints. * Fix another lint. * Proper clippy fix.

* define structs * construct ResidualConvUnit * forward() for ResidualConvUnit * implement FeatureFusionBlock * implement Scratch * implement DPTHead * add identity module * implement forward for DTPHead * add get_intermediate_layers to DinoVisionTransformer * implement DepthAnythingV2 * some minor tweaks * fix compile errors * fix var builder prefixes * setup initial example * use fixed patch size of 37 (518 / 14) * debugged until output * print min and max values * add some dynamism to the output location * scale input image * extract prep function * extract output path function * normalize image with magic mean and std * add spectral coloring * squeeze in the right place * make enterpolation optional * use bail instead of panic * omit unnecessary Shape call * remove empty curly braces * use bail instead of assert * use vb and pp * remove closures * extract config object * Apply rustfmt. * Fix some clippy lints. * More lints. * Use the array methods. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>

* feat(gemm): implement Gemm operator in candle-onnx * feat(onnx): Add support for ArgMax operator in candle-onnx * Apply rustfmt. * Remove argmax as it was already present. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>

* Add: DINOv2Reg4 with PlantCLEF2024 weights and example ( See https://arxiv.org/abs/2309.16588 and https://zenodo.org/records/10848263 ) * Remove extra files + update README to download them + remove extra lines * minor fix (README remove extra spaces) * minor fix (README: Fix image url) * Modif: Add back interpolate_pos_encoding() + fix when no interpolation + remove extra comments + Update README ( source image changed and so the predictions ) * Fix: Improve code lisibility with '$ cargo clippy' and '$ cargo fmt' * Another clippy fix. --------- Co-authored-by: x-VEspit <vincent.espitalier@cirad.fr> Co-authored-by: laurent <laurent.mazare@gmail.com>

…e#2299)

* Add GGUF bf16 type support * Add non avx impl for vec_dot_bf16 * Fix from_u32 * Fix loading * Fix dequant of bf16

* Expose the softcap methods * Add some tests * Fix generics

* Update kernels for metal bf16 * Fix typo * Check if have bfloat

* onnx: workaround pow with negative base rather than fully defining pow in the cpu backend (as in huggingface#2318), this implements a much smaller change which is sufficient to evaluate silero-vad onnx models. Specifically, checking if pow is run with 2.0 exponent, and if so evaluate as simply `x*x` instead of the cpu backend of `e^(2.0 * ln(x))`. * PR: use Tensor::powf insead powf correctly handles a negative base.

index_select does not support negative indexing, but this change adds just enough workarounds in onnx to allow evaluating silero-vad models (which make use of negative indices).

* silero-vad v5 example This change adds an example of how to run silero-vad v5 * PR: rename 'vad' to 'silero-vad' * Update README.md --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>

…gingface#2442) * Fix for parler-tts, do not add the last slice of padding tokens. * Support for the mini model.

Co-authored-by: Yi Xu <xuyi@me.com>

* Update cudarc to 0.12. * Some cudnn tweaks.

* correct optional SE layer dimensions. * head_dim instead of num_heads is 32. * update test example output.

* Allow loading images with given std and mean * OpenCLIP text encoder component * Two MobileCLIP models * Clippy fixes. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>

* fix FLUX.1 weights * added flux1-dev.safetensors

* Clippy fixes for 1.81.0. * Another fix.

* Bump the version to 0.6.1. (huggingface#2438) * onnx: workaround pow with negative base (huggingface#2439) * onnx: workaround pow with negative base rather than fully defining pow in the cpu backend (as in huggingface#2318), this implements a much smaller change which is sufficient to evaluate silero-vad onnx models. Specifically, checking if pow is run with 2.0 exponent, and if so evaluate as simply `x*x` instead of the cpu backend of `e^(2.0 * ln(x))`. * PR: use Tensor::powf insead powf correctly handles a negative base. * onnx: support negative index in Gather (huggingface#2440) index_select does not support negative indexing, but this change adds just enough workarounds in onnx to allow evaluating silero-vad models (which make use of negative indices). * silero-vad v5 example (huggingface#2321) * silero-vad v5 example This change adds an example of how to run silero-vad v5 * PR: rename 'vad' to 'silero-vad' * Update README.md --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com> * Fix for parler-tts, do not add the last slice of padding tokens. (huggingface#2442) * Fix for parler-tts, do not add the last slice of padding tokens. * Support for the mini model. * Add FastViT model. (huggingface#2444) * fix: qwen2 lm_head loading huggingface#2443 (huggingface#2445) Co-authored-by: Yi Xu <xuyi@me.com> * Update cudarc to 0.12. (huggingface#2451) * Update cudarc to 0.12. * Some cudnn tweaks. * FastViT fixes. (huggingface#2452) * correct optional SE layer dimensions. * head_dim instead of num_heads is 32. * update test example output. * MobileCLIP models S1 and S2 (huggingface#2454) * Allow loading images with given std and mean * OpenCLIP text encoder component * Two MobileCLIP models * Clippy fixes. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com> * Fix FLUX.1 weights (huggingface#2457) * fix FLUX.1 weights * added flux1-dev.safetensors * Clippy fixes for 1.81.0. (huggingface#2461) * Clippy fixes for 1.81.0. * Another fix. * Make Error::msg more in line with anyhow::Error::msg * Add context trait * Even more flexible * Format --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com> Co-authored-by: shua <gpg@isthisa.email> Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: ilookee <lookee@live.com> Co-authored-by: Yi Xu <xuyi@me.com> Co-authored-by: Eugene Hauptmann <eugene.hp2012@gmail.com>

* Add api to get current seed * Remove cell for rwlock

* Add the i16 dtype * Added I16 and I32 to fix the missing arms issue (candle-onnx/eval) * Update rust-ci.yml * Update ci_cuda.yaml * fmt adjustment * Revert "Update rust-ci.yml" This reverts commit f659d36. * Revert "Update ci_cuda.yaml" This reverts commit 62a4b39.

EricLBuehler and others added 30 commits May 15, 2024 15:10

Merge remote-tracking branch 'upstream/main'

4e82fab

Merge remote-tracking branch 'upstream/main'

37cafcc

fix issue with cuda header file for A10G (#5)

5892fac

Merge remote-tracking branch 'upstream/main'

9b151f5

Merge

38f8d9e

Merge

c10fc33

Merge remote-tracking branch 'upstream/main'

527ebcc

Merge remote-tracking branch 'upstream/main'

bfc197b

Merge remote-tracking branch 'upstream/main'

0c2ac76

Merge remote-tracking branch 'upstream/main'

cb3dbc2

Add a set_dtype method

faa9435

Merge remote-tracking branch 'upstream/main'

462d948

Merge remote-tracking branch 'upstream/main'

5c06acd

Add more capability to slice_assign (#7)

696acaa

Implement unfold (#8)

0936406

* Add unfold * Format

Merge remote-tracking branch 'upstream/main'

636de1d

Bump cudarc to 0.11.5 (#10)

f52e234

Add QTensor::quantize_onto (#12)

bb8f6f0

* Add the quantize_onto api * Take ref * Clippy * Format * Add error checking

implement Slice op (huggingface#2260)

5b04d96

Fix the fast bf16 gemm cublas kernels. (huggingface#2274)

f7095bb

* Use flash-attn in gemma. * Fix for the fast bf16 cublas gemm. * Fix some clippy lints. * Fix another lint. * Proper clippy fix.

Fix a bug in the metal implemtation of col2im1d. (huggingface#2284)

b55b360

make up for the missing last token output of phi2 example (huggingfac…

b438cba

…e#2299)

Patch metal function

b7a3e34

Complete merge

c967be9

Expose cublas handle

9e09d7f

EricLBuehler and others added 29 commits August 14, 2024 12:50

Merge branch 'sdpa'

a38053f

Add GGUF BF16 support (#17)

1b1974e

* Add GGUF bf16 type support * Add non avx impl for vec_dot_bf16 * Fix from_u32 * Fix loading * Fix dequant of bf16

Merge remote-tracking branch 'upstream/main'

36bd9f9

Complete merge

6fbddd6

Add softcapping support to flash attention (#18)

f706ef2

* Expose the softcap methods * Add some tests * Fix generics

Update kernels for metal bf16 (#19)

3c8e120

* Update kernels for metal bf16 * Fix typo * Check if have bfloat

fix(metal/accelerate): f64-f32 type mismatch (#20)

014f140

Bump the version to 0.6.1. (huggingface#2438)

f317df8

onnx: support negative index in Gather (huggingface#2440)

a7142d3

index_select does not support negative indexing, but this change adds just enough workarounds in onnx to allow evaluating silero-vad models (which make use of negative indices).

silero-vad v5 example (huggingface#2321)

f62d7e8

* silero-vad v5 example This change adds an example of how to run silero-vad v5 * PR: rename 'vad' to 'silero-vad' * Update README.md --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>

Fix for parler-tts, do not add the last slice of padding tokens. (hug…

ceab78e

…gingface#2442) * Fix for parler-tts, do not add the last slice of padding tokens. * Support for the mini model.

Add FastViT model. (huggingface#2444)

5b4c593

fix: qwen2 lm_head loading huggingface#2443 (huggingface#2445)

ef9649c

Co-authored-by: Yi Xu <xuyi@me.com>

Update cudarc to 0.12. (huggingface#2451)

7412bd0

* Update cudarc to 0.12. * Some cudnn tweaks.

FastViT fixes. (huggingface#2452)

8e39086

* correct optional SE layer dimensions. * head_dim instead of num_heads is 32. * update test example output.

MobileCLIP models S1 and S2 (huggingface#2454)

8632a2f

* Allow loading images with given std and mean * OpenCLIP text encoder component * Two MobileCLIP models * Clippy fixes. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>

Fix FLUX.1 weights (huggingface#2457)

f492c04

* fix FLUX.1 weights * added flux1-dev.safetensors

Clippy fixes for 1.81.0. (huggingface#2461)

91e0c6e

* Clippy fixes for 1.81.0. * Another fix.

Add API to get current device seed (#22)

7f5a470

* Add api to get current seed * Remove cell for rwlock

Add QStorage::data for cuda and metal (#23)

9240d03

Fix build error with seed (#25)

8a99f7c

Merge remote-tracking branch 'upstream/main'

d08212c

Should compile now on metal

c04861d

Fix dtype cast

156ebd1

Fix set_dtype

20a57c4

Add initial f8 e4m3 type

121bdfd

EricLBuehler closed this Oct 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the F8E4M3 dtype for CUDA and CPU #2546

Add the F8E4M3 dtype for CUDA and CPU #2546

EricLBuehler commented Oct 6, 2024

Add the F8E4M3 dtype for CUDA and CPU #2546

Add the F8E4M3 dtype for CUDA and CPU #2546

Conversation

EricLBuehler commented Oct 6, 2024