| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* module docs
* varbuilder gguf docs
* add a link to gguf files
* small additonal mod doc titles
* safetensor docs
* more core docs
* more module docs in canlde_core
* 2 more link fixes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add some fast Metal MLX SDPA kernels (#32)
* Sketch the sdpa kernel
* Add full sdpa kernel,
* Add test
* Add vectorized kernel for decoding
* Update tests
* Add some docs
* Fix sdpa_vector names
* Add softcapping for vectorized sdpa
* Add softcapping for full sdpa
* Add support for head dim 32, 96, 256
* Add support for head dim 32, 96, 256
* Update docs
* Add update notice
* Clippy and format
* Conditional compilation for bf16
* Use it in quantized llama
* Some review comments
* Use set_params!
* Remove unused
* Remove feature
* Fix metal sdpa for v stride
* Remove comma
* Add the dim method to layout and shape.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
| |
|
|
|
|
|
| |
* Retrieve more information from PyTorch checkpoints.
* Add enough support to load dino-v2 backbone weights.
|
|
|
|
|
| |
* Add the permute op (similar to pytorch).
* Add the backprop for dimension permutation.
|
|
|
|
|
| |
* Avoid recomputing the index from scratch each time.
* More performance optimisations.
|
|
|
|
|
|
|
| |
* Introduce the strided blocks.
* Use the strided blocks to fasten the copy.
* Add more testing.
|
|
|
|
|
|
|
| |
* Add backtrace information to errors where relevant.
* More backtrace information.
* Add to the FAQ.
|
|
|
|
|
|
|
|
|
| |
* Cosmetic cleanups to the error enum.
* More error cleanup.
* Proper error handling rather than panicing.
* Add some conv1d dedicated error.
|
| |
|
| |
|
| |
|
| |
|
|
|