summaryrefslogtreecommitdiff
path: root/candle-core/src/metal_backend
Commit message (Collapse)AuthorAgeFilesLines
* add scatter add (#2656)zachcp2024-12-011-0/+1
|
* add u32 - U32 gather (#2653)zachcp2024-11-301-0/+1
|
* 20241118 docs (#2629)zachcp2024-11-191-0/+2
| | | | | | | | | | | | | | | | | * module docs * varbuilder gguf docs * add a link to gguf files * small additonal mod doc titles * safetensor docs * more core docs * more module docs in canlde_core * 2 more link fixes
* Add some missing index-select metal kernels. (#2613)Laurent Mazare2024-11-121-1/+10
| | | | | * Add some missing index-select metal kernels. * Make some matrix contiguous pre-matmul.
* UG metal integration. (#2580)Laurent Mazare2024-10-271-0/+22
|
* Switch to using the MLX matmul by default. (#2547)Laurent Mazare2024-10-061-3/+3
|
* Efficient implementation of `Tensor::ones()` for `metal` (#2512)Anubhab Bandyopadhyay2024-10-011-4/+32
| | | | | | | | | | | | | * WIP: hopefully better const impl * with GPU * More tests on * Reverting primitive for * Incorporating review changes - added check elem count check in kerner, using for call strategy * rustfmt ran
* Metal commands refactoring (#2489)Laurent Mazare2024-09-212-99/+113
| | | | | | | | | * Split out the commands part of the metal device. * Make most fields private. * Move the allocator back. * Rework the encoder provider type.
* Add a couple cast metal kernels. (#2479)Laurent Mazare2024-09-151-8/+31
|
* Missing metal kernels. (#2474)Laurent Mazare2024-09-121-0/+2
|
* Hook the MLX matmul kernels in candle-core. (#2473)Laurent Mazare2024-09-122-0/+38
|
* Use the new MLX kernels to handle the BF16 matmul. (#2470)Laurent Mazare2024-09-111-24/+44
|
* Enable BF16 on metal. (#2380)Laurent Mazare2024-08-011-0/+1
|
* Enable the affine kernel for u8/u32. (#2376)Laurent Mazare2024-08-011-0/+2
|
* Add a basic metal example with capture (#2324)Laurent Mazare2024-07-091-1/+7
| | | | | * Add some tracing. * Get the trace to work.
* Fix a bug in the metal implemtation of col2im1d. (#2284)Laurent Mazare2024-06-221-1/+6
|
* add where_cond f32 for metal (#2236)Lionel Touati2024-06-021-0/+1
|
* Add a metal kernel for col2im1d. (#2214)Laurent Mazare2024-05-251-34/+92
| | | | | | | | | * Add a metal kernel for col2im1d. * Enable the col2im variant. * Bugfix. * Revert the quantized tweak.
* Use write rather than try-write on the metal rw-locks. (#2162)Laurent Mazare2024-05-052-7/+13
|
* Separate quantized phi-3 implementation. (#2157)Laurent Mazare2024-05-041-3/+0
| | | | | | | | | | | * Separate quantized phi-3 implementation. * Integrate the quantized phi3 model.= * Small fixes, get the generation to work properly. * Keep the old llama implementation around. * Change the default.
* Add argsort. (#2132)Laurent Mazare2024-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | * Add the argsort cuda kernels. * CPU version of arg-sort. * Hook the cuda kernel + rework the cpu bits. * Add some dedicated test. * Working cuda kernel. * Metal kernel. * Metal adjustments. * Bugfix. * Use the fast rope in qwen. * Rework the expert selection in qwen.
* Add StorageRef. (#2113)Laurent Mazare2024-04-231-1/+14
| | | | | * Add the storage-ref bits. * Add the metal implementation.
* Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056)Thomas Santerre2024-04-211-147/+232
| | | | | | | | | | | | | | | | | * add basic unary bench for sqrt * process unary commands in tiles of 4 * re-enable all benchmarks * rename helper to unary * modify approach to split up tiled and non-tiled operations * undo bench ignore for other tests * update tile size to 2 * only perform the optimization on the contiguous even numbered element case
* Fix for the batch dim in the quantized matmul example. (#2073)Laurent Mazare2024-04-151-1/+1
| | | | | | | | | * Fix for the batch dim in the quantized matmul example. * Enable more tests on cuda. * Add a test for qmm with a batch. * Fix the zeros-dim test on metal.
* Add missing bfloat unary strided kernels and fix typo (#2058)ivarflakstad2024-04-141-0/+20
|
* Add a synchronize method to devices. (#2055)Laurent Mazare2024-04-141-0/+4
| | | | | * Add a synchronize method to devices. * Metal version.
* Support gather on bf16 for metal. (#2035)Laurent Mazare2024-04-101-0/+1
|
* Use BufferOffset in metal backend ops. (#2029)Laurent Mazare2024-04-081-50/+39
| | | | | | | * Use BufferOffset in the metal backend. * More BufferOffset usage. * Use in where-cond.
* Rework the buffer offset logic for metal kernels (#2028)Laurent Mazare2024-04-071-39/+43
| | | | | | | | | | | | | | | * Move the metal kernels utils in a separate module. * Use the BufferOffset for unary ops. * Fix clippy lints. * Use the new BufferOffset. * Adapt the binary ops. * Affine. * More ops (powf, elu, cast).
* Add support for "sign" on tensors (#2012)Thomas Santerre2024-04-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | * add the sign unary operator * remove uneeded import * remove uneeded import * undo formatting * undo formatting * remove unnecessary redefintion * allow gradient to flow through for sign and round * fix cpu ops to ensure that negzero and positive zero are handled properly * clippy fixes * Properly avoid gradient tracking. * Use a branchless version. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* update dtypes checks for several metal operations (#2010)Thomas Santerre2024-04-041-27/+45
|
* Backend refactoring. (#1966)Laurent Mazare2024-03-292-0/+2071
* Backend refactoring. * Metal tweaks. * Move the cudnn module.