Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Initial metal integration setup #1230

Closed

Conversation

ivarflakstad
Copy link
Member

@ivarflakstad ivarflakstad commented Nov 1, 2023

Initial metal integration

This work has been split into several PRs 😊
#1308
#1309
#1316
#1318
#1323
#1341

ivarflakstad and others added 29 commits November 9, 2023 16:43
- Most kernels just copy themselfs to get the shapes correct
- Matmul works only in 1 case and simply empty allocates otherwise
- Logits and randomized to make the demo finish itself.

Performance is quite bad (30ms/token), but lot's of prints and allocs and some actual sending to metal.

Couln't get it super high by removing the obvious blockers (println + the actual running matmuls).

Allocations takes between 1us and 100us and seems very stable, Maybe metal doesn't really have a smart allocator and we'll need to own it.
…ayout offset (like CudaSlice.slice) for candle intergration
- Added proper kernel type check (through modules + macro)
- split contiguous and strided into 2 different kernels
- Verified on long range + strided values.
@ivarflakstad
Copy link
Member Author

Closing since this has been split into several PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants