Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[browser][wasm] threads and JS interop #301

Merged
merged 54 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
588482b
work in progress
pavelsavara Sep 27, 2023
cf5202e
wip
pavelsavara Sep 27, 2023
ba23590
wip
pavelsavara Sep 27, 2023
7cdb8e2
XHR details
pavelsavara Sep 27, 2023
8d475d2
subtle and lazy DLL download
pavelsavara Sep 27, 2023
fb487f9
feedback
pavelsavara Sep 27, 2023
3104ff4
wip
pavelsavara Sep 27, 2023
5cfcff0
wip
pavelsavara Sep 27, 2023
8f3807b
feedback
pavelsavara Sep 29, 2023
cd560d8
http
pavelsavara Sep 29, 2023
58c4488
http
pavelsavara Sep 29, 2023
87fb56e
wip
pavelsavara Sep 29, 2023
b7910c5
wip
pavelsavara Sep 29, 2023
daf801a
wip
pavelsavara Sep 29, 2023
686f1a6
Aleksey's feedback
pavelsavara Oct 1, 2023
ea9aab0
wip
pavelsavara Oct 2, 2023
007a076
wip
pavelsavara Oct 2, 2023
9d74104
wip
pavelsavara Oct 4, 2023
b5c3157
designs 15, 16
pavelsavara Oct 9, 2023
812757c
wip
pavelsavara Oct 9, 2023
e9d4e8a
wip
pavelsavara Oct 9, 2023
d390d5a
wip
pavelsavara Oct 9, 2023
77685ab
more to (16)
pavelsavara Oct 9, 2023
433c104
more
pavelsavara Oct 9, 2023
874a5ec
whitespace
pavelsavara Oct 13, 2023
eb38609
wip
pavelsavara Oct 18, 2023
e849302
fix
pavelsavara Oct 18, 2023
346cd7d
perf
pavelsavara Oct 18, 2023
210b2c6
suspended threads and proxy disposal
pavelsavara Oct 26, 2023
000bad7
VFS notes
pavelsavara Oct 26, 2023
e8c3ae5
stdout
pavelsavara Oct 26, 2023
58791ad
wip
pavelsavara Nov 6, 2023
4f289d8
wip
pavelsavara Nov 8, 2023
39a37b5
responsive
pavelsavara Nov 9, 2023
62e8b9e
goals and non-goals
pavelsavara Nov 9, 2023
3e16562
clarify
pavelsavara Nov 9, 2023
346c905
clarify
pavelsavara Nov 9, 2023
293e31b
reorganize the doc
pavelsavara Nov 9, 2023
28cbd01
links
pavelsavara Nov 9, 2023
963a302
more
pavelsavara Nov 13, 2023
b4530bc
JSImport dispatch
pavelsavara Nov 13, 2023
4b424ff
JSImport ideas
pavelsavara Nov 15, 2023
e197b3c
more
pavelsavara Nov 20, 2023
16aeb0a
more
pavelsavara Nov 20, 2023
cf32eb0
more
pavelsavara Nov 21, 2023
40d53d7
more
pavelsavara Nov 22, 2023
4fd24c9
promise + .Wait deadlock
pavelsavara Nov 23, 2023
bf0a767
more
pavelsavara Nov 23, 2023
8d232d6
more
pavelsavara Apr 18, 2024
a4eef08
more
pavelsavara Apr 18, 2024
d11a56b
whitespace
pavelsavara Apr 18, 2024
bce30d6
feedback
pavelsavara May 14, 2024
510d40d
index
pavelsavara May 14, 2024
4e781dc
fix
pavelsavara May 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ Use update-index to regenerate it:
| 2021 | [Tracking Platform Dependencies](accepted/2021/platform-dependencies/platform-dependencies.md) | [Matt Thalman](https://github.com/mthalman) |
| 2022 | [.NET 7 Version Selection Improvements](accepted/2022/version-selection.md) | [Rich Lander](https://github.com/richlander) |
| 2023 | [Experimental APIs](accepted/2023/preview-apis/preview-apis.md) | [Immo Landwerth](https://github.com/terrjobst) |
| 2023 | [Multi-threading on a browser](accepted/2023/wasm-browser-threads.md) | [Pavel Savara](https://github.com/pavelsavara) |
| 2023 | [net8.0-browser TFM for applications running in the browser](accepted/2023/net8.0-browser-tfm.md) | [Javier Calvarro](https://github.com/javiercn) |

## Drafts
Expand Down
243 changes: 243 additions & 0 deletions accepted/2023/wasm-browser-threads.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
# Multi-threading on a browser

**Owner** [Pavel Savara](https://github.com/pavelsavara) |

## Table of content
- [Goals](#goals)
- [Key ideas](#key-ideas)
- [State April 2024](#state-2024-april)
- [Design details](#design-details)
- [State September 2023](#state-2023-sep)
- [Alternatives](#alternatives---as-considered-2023-sep)

# Goals
- CPU intensive workloads on dotnet thread pool.
- Allow user to start new managed threads using `new Thread` and join it.
- Add new C# API for creating web workers with JS interop. Allow JS async/promises via external event loop.
- enable blocking `Task.Wait` and `lock()` like APIs from C# user code on all threads
- Current public API throws PNSE for it
- This is core part on MT value proposition.
- If people want to use existing MT code-bases, most of the time, the code is full of locks.
- People want to use existing desktop/server multi-threaded code as is.
- allow HTTP and WS C# APIs to be used from any thread despite underlying JS object affinity.
- Blazor `BeginInvokeDotNet`/`EndInvokeDotNetAfterTask` APIs work correctly in multithreaded apps.
- JSImport/JSExport interop in maximum possible extent.
- don't change/break single threaded build. †

## Lower priority goals
- try to make it debugging friendly
- sync C# to async JS
- dynamic creation of new pthread
- implement crypto via `subtle` browser API
- allow MonoVM to lazily download DLLs from the server, instead of during startup.
- implement synchronous APIs of the HTTP and WS clients. At the moment they throw PNSE.
- sync JS to async JS to sync C#
- allow calls to synchronous JSExport from UI thread (callback)
- don't prevent future marshaling of JS [transferable objects](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Transferable_objects), like streams and canvas.
- offload CPU intensive part of WASM startup to WebWorker, so that the pre-rendered (blazor) UI could stay responsive during Mono VM startup.

## Non-goals
- interact with JS state on `WebWorker` of managed threads other than UI thread or dedicated `JSWebWorker`

<sub><sup>† Note: all the text below discusses MT build only, unless explicit about ST build.</sup></sub>

# Key ideas

Move all managed user code out of UI/DOM thread, so that it becomes consistent with all other threads.

## Context - Problems
**1)** If you have multithreading, any thread might need to block while waiting for any other to release a lock.
- locks are in the user code, in nuget packages, in Mono VM itself
- there are managed and un-managed locks
- in single-threaded build of the runtime, all of this is NOOP. That's why it works on UI thread.

**2)** UI thread in the browser can't synchronously block
- that means, "you can't not block" UI thread, not just usual "you should not block" UI
- `Atomics.wait()` throws `TypeError` on UI thread
- you can spin-wait but it's bad idea.
- Deadlock: when you spin-block, the JS timer loop and any messages are not pumping.
- But code in other threads may be waiting for some such event to resolve.
- all async/await don't work
- all networking doesn't work
- you can't create or join another web worker
- browser dev tools UI freeze
- It eats your battery
- Browser will kill your tab at random point (Aw, snap).
- It's not deterministic and you can't really test your app to prove it harmless.
- all the other threads/workers could synchronously block
- `Atomics.wait()` works as expected
- if we will have managed thread on the UI thread, any `lock` or Mono GC barrier could cause spin-wait
- in case of Mono code, we at least know it's short duration
- we should prevent it from blocking in user code

**3)** JavaScript engine APIs and objects have thread affinity.
- The DOM and few other browser APIs are only available on the main UI "thread"
- and so, you need to have C# interop with UI, but you can't block there.
- HTTP & WS objects have affinity, but we would like to consume them (via Streams) from any managed thread
- Any `JSObject`, `JSException` and `Promise`->`Task` have thread affinity
- they need to be disposed on correct thread. GC is running on random thread

**4)** State management of JS context `self` of the worker.
- emscripten pre-allocates pool of web worker to be used as pthreads.
- Because they could only be created asynchronously, but `pthread_create` is synchronous call
- Because they are slow to start
- those pthreads have stateful JS context `self`, which is re-used when mapped to C# thread pool
- when we allow JS interop on a managed thread, we need a way how to clean up the JS state

**5)** Blazor's `renderBatch` is using direct memory access

**6)** Dynamic creation of new WebWorker requires async operations on emscripten main thread.
- we could pre-allocate fixed size pthread pool. But one size doesn't fit all and it's expensive to create too large pool.

**7)** There could be pending HTTP promise (which needs browser event loop to resolve) and blocking `.Wait` on the same thread and same task/chain. Leading to deadlock.

# State 2024 April

## What was implemented in Net9 - Deputy thread design

For other possible design options we considered [see below](#alternatives-and-details---as-considered-2023-sep).

- Introduce dedicated web worker called "deputy thread"
- managed `Main()` is dispatched onto deputy thread
- MonoVM startup on deputy thread
- non-GC C functions of mono are still available
- Emscripten startup stays on UI thread
- C functions of emscripten
- download of assets and into WASM memory
- UI/DOM thread
- because the UI thread would be mostly idling, it could:
- render UI, keep debugger working
- dynamically create pthreads
- UI thread stays attached to Mono VM for Blazor's reasons (for Net9)
- it keeps `renderBatch` working as is, bu it's far from ideal
- there is risk that UI could be suspended by pending GC
- It would be ideal change Blazor so that it doesn't touch managed objects via naked pointers during render.
- we strive to detach the UI thread from Mono
- I/O thread
- is helper thread which allows `Task` to be resolved by UI's `Promise` even when deputy thread is blocked in `.Wait`
- JS interop from any thread is marshaled to UI thread's JavaScript
- HTTP and WS clients are implemented in JS of UI thread
- There is draft of `JSWebWorker` API
- it allows C# users to create dedicated JS thread
- the `JSImport` calls are dispatched to it if you are on the that thread
- or if you pass `JSObject` proxy with affinity to that thread as `JSImport` parameter.
- The API was not made public in Net9 yet
- calling synchronous `JSExports` is not supported on UI thread
- this could be changed by configuration option but it's dangerous.
- calling asynchronous `JSExports` is supported
- calling asynchronous `JSImport` is supported
- calling synchronous `JSImport` is supported without synchronous callback to C#
- Strings are marshaled by value
- as opposed to by reference optimization we have in single-threaded build
- Emscripten VFS and other syscalls
- file system operations are single-threaded and always marshaled to UI thread
- Emscripten pool of pthreads
- browser threads are expensive (as compared to normal OS)
- creation of `WebWorker` requires UI thread to do it
- there is quite complex and slow setup for `WebWorker` to become pthread and then to attach as Mono thread.
- that's why Emscripten pre-allocates pthreads
- this allows `pthread_create` to be synchronous and faster

# Design details

## Define terms
- UI thread
- this is the main browser "thread", the one with DOM on it
- it can't block-wait, only spin-wait
- "sidecar" thread - possible design
- is a web worker with emscripten and mono VM started on it
- there is no emscripten on UI thread
- for Blazor rendering MAUI/BlazorWebView use the same concept
- doing this allows all managed threads to allow blocking wait
- "deputy" thread - possible design
- is a web worker and pthread with C# `Main` entrypoint
- emscripten startup stays on UI thread
- doing this allows all managed threads to allow blocking wait
- "managed thread"
- is a thread with emscripten pthread and Mono VM attached thread and GC barriers
- "main managed thread"
- is a thread with C# `Main` entrypoint running on it
- if this is UI thread, it means that one managed thread is special
- see problems **1,2**
- "managed thread pool thread"
- pthread dedicated to serving Mono thread pool
- "comlink"
- in this document it stands for the pattern
- dispatch to another worker via pure JS means
- create JS proxies for types which can't be serialized, like `Function`
- actual [comlink](https://github.com/GoogleChromeLabs/comlink)
- doesn't implement spin-wait
- we already have prototype of the similar functionality
- which can spin-wait

## Proxies - thread affinity
- all proxies of JS objects have thread affinity
- all of them need to be used and disposed on correct thread
- how to dispatch to correct thread is one of the questions here
- all of them are registered to 2 GCs
- `Dispose` need to be schedule asynchronously instead of blocking Mono GC
- because of the proxy thread affinity, but the target thread is suspended during GC, so we could not dispatch to it, at that time.
- the JS handles need to be freed only after both sides unregistered it (at the same time).
- `JSObject`
- have thread ID on them, so we know which thread owns them
- `JSException`
- they are a proxy because stack trace is lazy
- we could eval stack trace eagerly, so they could become "value type"
- but it would be expensive
- `Task`
- continuations need to be dispatched onto correct JS thread
- they can't be passed back to wrong JS thread
- resolving `Task` could be async
- `Func`/`Action`/`JSImport`
- callbacks need to be dispatched onto correct JS thread
- they can't be passed back to wrong JS thread
- calling functions which return `Task` could be aggressively async
- unless the synchronous part of the implementation could throw exception
- which maybe our HTTP/WS could do ?
- could this difference be ignored ?
- `JSExport`/`Function`
- we already are on correct thread in JS, unless this is UI thread
- would anything improve if we tried to be more async ?
- `MonoString`
- we have optimization for interned strings, that we marshal them only once by value. Subsequent calls in both directions are just a pinned pointer.
- in deputy design we could create `MonoString` instance on the UI thread, but it involves GC barrier

## JSWebWorker with JS interop
- is proposed concept to let user to manage JS state of the worker explicitly
- because of problem **4**
- is C# thread created and disposed by new API for it
- could block on synchronization primitives
- could do full JSImport/JSExport to it's own JS `self` context
- there is `JSSynchronizationContext`` installed on it
- so that user code could dispatch back to it, in case that it needs to call `JSObject` proxy (with thread affinity)
- this thread needs to throw on any `.Wait` because of the problem **7**

## HTTP and WS clients
- are implemented in terms of `JSObject` and `Promise` proxies
- they have thread affinity, see above
- typically to the `JSWebWorker` of the creator
- but are consumed via their C# Streams from any thread.
- therefore need to solve the dispatch to correct thread.
- such dispatch will come with overhead
- especially when called with small buffer in tight loop
- or we could throw PNSE, but it may be difficult for user code to
- know what thread created the client
- have means how to dispatch the call there
- other unknowing users are `XmlUrlResolver`, `XmlDownloadManager`, `X509ResourceClient`, ...
- because we could have blocking wait now, we could also implement synchronous APIs of HTTP/WS
- so that existing user code bases would just work without change
- this would also require separate thread, doing the async job
- we could use I/O thread for it

## Performance
As compared to ST build for dotnet wasm:
- the dispatch between threads (caused by JS object thread affinity) will have negative performance impact on the JS interop
- in case of HTTP/WS clients used via Streams, it could be surprizing
- browser performance is lower when working with SharedArrayBuffer
- Mono performance is lower because there are GC safe-points and locks in the VM code
- startup is slower because creation of WebWorker instances is slow
- VFS access is slow because it's dispatched to UI thread
- console output is slow because it's POSIX stream is dispatched to UI thread, call per line

# Alternatives and details - as considered 2023 Sep
See https://gist.github.com/pavelsavara/c81ef3a9e4000d67f49ddb0f1b1c2284
Loading