Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm32: Optimize small struct returns as integer values for better ABI efficiency #4757

Closed
kassane opened this issue Sep 21, 2024 · 2 comments · Fixed by #4758
Closed

Wasm32: Optimize small struct returns as integer values for better ABI efficiency #4757

kassane opened this issue Sep 21, 2024 · 2 comments · Fixed by #4758

Comments

@kassane
Copy link
Contributor

kassane commented Sep 21, 2024

The current implementation of LDC2 (latest-CI) is using the struct return (sret) mechanism for all struct returns, including small structs that could fit in a single register. This approach, while general and correct, may lead to suboptimal performance for small structs, especially on targets like wasm32-emscripten.

Note: They both have the same configuration, increasing the -sSTACK_SIZE=512KB.

full LLVM-IR: https://gist.github.com/kassane/bbae501db913ccceaa9b600b1e0930b7 (ReleaseMode)

LLVM-IR (sg_make_buffer)

Based code:

// sokol_gfx.h
typedef struct sg_buffer        { uint32_t id; } sg_buffer;
inline sg_buffer sg_make_buffer(const sg_buffer_desc& desc) { return sg_make_buffer(&desc); }
SOKOL_GFX_API_DECL sg_buffer sg_make_buffer(const sg_buffer_desc* desc);

D binding

// sokol/gfx.d

extern(C)
struct Buffer {
    uint id = 0;
}

extern(C) Buffer sg_make_buffer(const BufferDesc *);
cube.zig cube.d
; Function Attrs: noredzone nosanitize_coverage nounwind skipprofile uwtable
define dso_local void @init() #0 !dbg !632 {
; [...]
%7 = call i32 @sg_make_buffer(ptr nonnull readonly align 4 %0), !dbg !887, !noalias !888
%8 = call i32 @sg_make_buffer(ptr nonnull readonly align 4 %0), !dbg !895, !noalias !896
; [...]
; Function Attrs: noredzone nounwind uwtable
declare i32 @sg_make_buffer(ptr readonly align 4) local_unnamed_addr #7
; [#uses = 1]
define void @init() #2 {
; [...]
  call void @sg_make_buffer(ptr noalias nonnull sret(%sokol.gfx.Buffer) align 4 %.sret_tmp, ptr nonnull %vbufd) #2
  call void @sg_make_buffer(ptr noalias nonnull sret(%sokol.gfx.Buffer) align 4 %.sret_tmp1, ptr nonnull %ibufd) #2

; [...]
; [#uses = 0]
define void @_D5sokol3gfx10makeBufferFNbNiNeMKSQBgQBd10BufferDescZSQCaQBx6Buffer(ptr noalias sret(%sokol.gfx.Buffer) align 4 %.sret_arg, ptr dereferenceable(56) %desc) local_unnamed_addr #2 {
  tail call void @sg_make_buffer(ptr noalias sret(%sokol.gfx.Buffer) align 4 %.sret_arg, ptr nonnull %desc) #2
  ret void
}

; [#uses = 3]
declare void @sg_make_buffer(ptr noalias sret(%sokol.gfx.Buffer) align 4, ptr) local_unnamed_addr #2

cube output

[sg][error][id:5][line:8473] 
cube.js:63:183
[sg][info][id:5][line:8474] 
	ERROR: 1:1: '' : syntax error
cube.js:63:243
[sg][error][id:5][line:8473] 
cube.js:63:183
[sg][info][id:5][line:8474] 
	ERROR: 1:1: '' : syntax error
cube.js:63:243
onerror: RuntimeError: index out of bounds cube.html:1:1053
Uncaught RuntimeError: index out of bounds
    c http://localhost:6931/cube.js:40
    Qa http://localhost:6931/cube.js:40
    _main http://localhost:6931/cube.js:65
    a http://localhost:6931/cube.js:67
    Ec http://localhost:6931/cube.js:67
    setTimeout handler*Ec http://localhost:6931/cube.js:67
    Dc http://localhost:6931/cube.js:66
    a http://localhost:6931/cube.js:64
    <anonymous> http://localhost:6931/cube.js:64
    promise callback*Ba/< http://localhost:6931/cube.js:6
    promise callback*Ba http://localhost:6931/cube.js:6
    Z http://localhost:6931/cube.js:64
    <anonymous> http://localhost:6931/cube.js:64
cube.wasm:33396:1
WebGL warning: bufferData: target: Invalid enum value 0 (Did you typo `gl.SOMETHINGG` and pass `undefined`?)
Error: Promised response from onMessage listener went out of scope ExtensionMessagingService.js:89:34
ldc2-flags to llvmIR
$ ldmd2 -w -c -preview=all --output-ll -O3 -boundscheck=off -vcolumns -cache=/home/kassane/sokol-d/.zig-cache/o/2e1a6a13eb8fd730ff21e1a4e29b9dfd -disable-verify -Hkeep-all-bodies -i -I./src /home/kassane/sokol-d/src/examples/cube.d --d-version=CarelessAlocation -L-allow-undefined -vdmd -Xcc=-v -P-I/home/kassane/.cache/zig/p/12208a21e4a62ced46244c44c6215ff99c03c0d384ec3afea04b32c4b8ecb1baa41a/upstream/emscripten/cache/sysroot/include -Xcc=-DIMPL -Xcc=-DSOKOL_GLES3 --flto=full -mtriple=wasm32-unknown-emscripten -mcpu=generic -betterC --verrors-context

Could it be possible to simplify the function's return rather than sret?

Reference

@kassane
Copy link
Contributor Author

kassane commented Sep 21, 2024

Source in C++ (extern"C"): https://godbolt.org/z/a1ra1sxf7 to wasm32

define hidden noundef i32 @main() {
entry:
 ; [...]
  %call = call i32 @sg_make_buffer(ptr noundef nonnull %b)
  ;[...]
}
; [...]
declare i32 @sg_make_buffer(ptr noundef) local_unnamed_addr #3

@kinke
Copy link
Member

kinke commented Sep 21, 2024

Analogous to #4741, we'd need a specific TargetABI implementation for wasm, using https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md#function-arguments-and-return-values as reference.

Edit: The main interesting point being that only these structs would need to be rewritten and passed as a primitive type:

Any struct or union that recursively (including through nested structs, unions, and arrays) contains just a single scalar value and is not specified to have greater than natural alignment.

kinke added a commit to kinke/ldc that referenced this issue Sep 22, 2024
To resolve ldc-developers#4757 and make our wasm ABI a bit more compatible with
clang/emscripten's. This includes switching to 128-bit `real`.
kinke added a commit to kinke/ldc that referenced this issue Sep 22, 2024
To resolve ldc-developers#4757 and make our wasm ABI a bit more compatible with
clang/emscripten's. This includes switching to 128-bit `real`.
kinke added a commit to kinke/ldc that referenced this issue Sep 23, 2024
To resolve ldc-developers#4757 and make our wasm ABI a bit more compatible with
clang/emscripten's. This includes switching to 128-bit `real`.
@kinke kinke closed this as completed in 4220a1b Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants