summaryrefslogtreecommitdiff
path: root/candle-metal-kernels/src/lib.rs
Commit message (Expand)AuthorAgeFilesLines
* Sync upstream MLX sdpa vector kernels with mask (#2718)HEADmainEric Buehler2025-01-161-1/+187
* Lint fixes introduced with Rust 1.83 (#2646)Anubhab Bandyopadhyay2024-11-281-10/+10
* Add some fast Metal MLX SDPA kernels (#2584)Eric Buehler2024-11-051-1/+322
* UG metal integration. (#2580)Laurent Mazare2024-10-271-1/+1
* Tweak some metal tests. (#2528)Laurent Mazare2024-10-021-5/+0
* Efficient implementation of `Tensor::ones()` for `metal` (#2512)Anubhab Bandyopadhyay2024-10-011-0/+28
* Integrate the MLX gemm kernels (#2468)Laurent Mazare2024-09-111-16/+195
* Metal bgemm min changes (#2364)ivarflakstad2024-08-011-0/+2
* Use RAII for terminating the encoding. (#2353)Laurent Mazare2024-07-241-49/+41
* Use a trait for the encoder provider (so that encoder can ultimately be reuse...Laurent Mazare2024-07-241-120/+120
* Add a metal kernel for col2im1d. (#2214)Laurent Mazare2024-05-251-0/+33
* Add the layernorm specialized op. (#2212)Laurent Mazare2024-05-241-0/+63
* Separate quantized phi-3 implementation. (#2157)Laurent Mazare2024-05-041-1/+1
* Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)MilkFather2024-04-291-1/+1
* Add argsort. (#2132)Laurent Mazare2024-04-271-0/+40
* Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056)Thomas Santerre2024-04-211-35/+82
* Handle multiple dimensions in metal QMM + two fixes. (#2097)Laurent Mazare2024-04-201-7/+8
* Use BufferOffset in metal backend ops. (#2029)Laurent Mazare2024-04-081-104/+51
* Rework the buffer offset logic for metal kernels (#2028)Laurent Mazare2024-04-071-223/+66
* Optimize copy-2d for metal. (#2024)Laurent Mazare2024-04-071-8/+49
* Add the rope THD kernel. (#2014)Laurent Mazare2024-04-051-0/+45
* Add support for "sign" on tensors (#2012)Thomas Santerre2024-04-041-1/+1
* Fix for the RWKV models. (#1955)Laurent Mazare2024-03-281-4/+4
* More flexible matmul contiguity checks. (#1949)Laurent Mazare2024-03-271-4/+8
* Contiguous variant of the rope kernel. (#1929)Laurent Mazare2024-03-251-0/+43
* Fast kernels for rotary embeddings. (#1928)Laurent Mazare2024-03-241-0/+41
* Add support for strided index-select on Metal (#1909)Thomas Santerre2024-03-221-2/+10
* Add support for conv_transpose2d on Metal backend (#1903)Thomas Santerre2024-03-211-0/+58
* RmsNorm kernel for metal. (#1895)Laurent Mazare2024-03-211-0/+58
* Add support for conv_transpose1d for metal backend (#1874)Thomas Santerre2024-03-191-0/+53
* Add avg_pool2d metal implementation for the metal backend (#1869)Thomas Santerre2024-03-181-1/+1
* Add support for max_pool2d for Metal backend (#1863)Thomas Santerre2024-03-181-0/+33
* Optimize the cat operation on contiguous tensors (#1855)Laurent Mazare2024-03-171-0/+50
* Metal random-generation bug fixes (#1811)Niklas Hallqvist2024-03-081-4/+8
* feat: add silu activation function (#1706)OlivierDehaene2024-02-141-1/+1
* Merge pull request #1606 from FL33TW00D/feature/larger-batchesChristopher Fleetwood2024-01-291-7/+6
|\
| * chore: finalFL33TW00D2024-01-221-15/+10
| * chore: actual fixFL33TW00D2024-01-191-2/+3
| * chore: switch to bufferFL33TW00D2024-01-191-10/+14
| * fix: larger batchesFL33TW00D2024-01-181-7/+6
* | Revert public EncoderParamIvar Flakstad2024-01-171-1/+1
* | Merge branch 'main' into ivarflakstad/metal-prngIvar Flakstad2024-01-171-59/+180
|\|
| * Quantized GGUF style (#1523)Nicolas Patry2024-01-171-52/+176
* | Seed should be updated by random kernel result.Ivar Flakstad2024-01-151-4/+8
* | Merge branch 'main' into ivarflakstad/metal-prngIvar Flakstad2024-01-121-2/+2
|\|
| * Add relu kernel for metal (#1488)Juarez Bochi2024-01-101-2/+2
* | Merge branch 'main' into ivarflakstad/metal-prngIvar Flakstad2024-01-071-1/+12
|\|
| * Metal: support unary abs (#1503)Gonzalo2023-12-301-1/+4
| * Metal: more u8/u32 (#1502)Gonzalo2023-12-291-0/+4
| * Metal: i64 basic support (#1495)Gonzalo2023-12-291-0/+4