-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gemma 2 2B crashes on mobile phone #524
Comments
Do you happen to have the console log? Besides, what is the |
It may be due to one of the limits being exceeded (not necessarily the buffer size, 2GB sounds enough). Gemma requires a larger size for certain buffers than other models due to its large vocab size 256K, compared to other models like Llama3.1 being 128K. I might have to look into this later Edit: actually, just saw that you mentioned Phi 3 Mini crashes as well. I will try to look into this. Meanwhile, if you have some sort of log, it would be very helpful, perhaps with remote debugging. |
Ahh yes, there is a DEBUG mode here: #519 (comment) Any log that may relate to the crash would be helpful, thanks! |
Ah yes! There is a |
Already found it, thanks :-)
|
I see... thanks for the info! |
There are various issues similar to this on mobile devices, probably something related to WebGPU on Android Chromes. I don't have something on top of my mind. Not sure if updating Android version and using the latest Chrome Canary would alleviate. |
The phone went into standby, and then when I woke it up and tried running inference I saw this: It seems to be related to 'losing the WebGPU'. Should I call |
Quick question, are you using WebWorker, ServiceWorker, or the plain MLCEngine? For ServiceWorker, my understanding is that this PR has fixed this: #471 |
WebWorker. I noticed I hadn't put a try-catch around WebLLM there (a testament to it's quality), but I've added that now in the hopes of catching the WebLLM says "please initialize again", but what a setting to let WebLLM do this by itself? "Stay alive until told otherwise" could even be a default? |
This seems to be an issue where, the web worker is terminated due to the phone going standby, but your frontend logic's states are still preserved, hence directly sending a request, expecting the model to be loaded. We had similar issue with service worker before: #471. This PR #533 moves the fix for service worker to web worker as well. You can test it locally, or try it out when the new npm is published. The main logic is that, when the backend realizes there is a mismatch between the frontend's expected loaded model, and the backend's actually-loaded model, the backend calls |
This should be added to npm 0.2.56. Let me know if the issue is fixed! |
Whenever I try to load it, it crashes Chrome.
This is on a Pixel 6a with 6Gb of RAM.
To make sure it wasn't simply too big, I tried running Gemma 2 2B via Wllama (1.63GB Q4 .gguf). That did run.
Additional tests
The text was updated successfully, but these errors were encountered: