summaryrefslogtreecommitdiff
path: root/candle-core
Commit message (Expand)AuthorAgeFilesLines
* 20241118 docs (#2629)zachcp2024-11-1922-12/+47
* Add max-all/min-all. (#2616)Laurent Mazare2024-11-141-0/+36
* Add some missing index-select metal kernels. (#2613)Laurent Mazare2024-11-121-1/+10
* Update docs (#2553)zachcp2024-11-111-0/+14
* Add some fast Metal MLX SDPA kernels (#2584)Eric Buehler2024-11-052-0/+12
* UG metal integration. (#2580)Laurent Mazare2024-10-275-10/+87
* Support for UG kernels. (#2579)Laurent Mazare2024-10-277-2/+137
* Testcases (#2567)Anubhab Bandyopadhyay2024-10-172-3/+278
* Switch to using the MLX matmul by default. (#2547)Laurent Mazare2024-10-061-3/+3
* Fix for cudnn bf16 conv2d. (#2535)Laurent Mazare2024-10-022-10/+14
* Add support for cuda streams. (#2532)Laurent Mazare2024-10-023-0/+24
* Efficient implementation of `Tensor::ones()` for `metal` (#2512)Anubhab Bandyopadhyay2024-10-012-4/+62
* Cuda quantized mmv bugfix. (#2526)Laurent Mazare2024-10-011-1/+25
* Yet another cuda qmm padding fix. (#2509)Laurent Mazare2024-09-301-25/+55
* Bugfix for the metal elu kernel. (#2490)Laurent Mazare2024-09-211-0/+13
* Metal commands refactoring (#2489)Laurent Mazare2024-09-212-99/+113
* Improve error message (#2485)ivnsch2024-09-201-1/+5
* Add a couple cast metal kernels. (#2479)Laurent Mazare2024-09-151-8/+31
* Export TensorIndexer public to candle users (#2477)Shengtuo Hu2024-09-131-1/+1
* Missing metal kernels. (#2474)Laurent Mazare2024-09-121-0/+2
* Hook the MLX matmul kernels in candle-core. (#2473)Laurent Mazare2024-09-122-0/+38
* Use the new MLX kernels to handle the BF16 matmul. (#2470)Laurent Mazare2024-09-112-26/+46
* Complete the missing backticks in the comments (#2469)hongmengning2024-09-111-0/+3
* Update cudarc to 0.12. (#2451)Laurent Mazare2024-08-272-2/+4
* Stream tensor (#2429)Laurent Mazare2024-08-172-0/+208
* Support Minus(u) for arbitrary values of u, e.g. Minus(3). (#2428)Laurent Mazare2024-08-171-0/+4
* Add documentation examples for `Tensor::i` and `Tensor::narrow` methods (#2308)Carsten Csiky2024-08-102-8/+169
* optimize gradient for silu a bit (#2393)MilkFather2024-08-041-2/+2
* Revert the bf16 gemm metal changes for now. (#2386)Laurent Mazare2024-08-011-2/+2
* Add a minimal test for the metal bf16 matmul. (#2381)Laurent Mazare2024-08-011-0/+20
* Enable BF16 on metal. (#2380)Laurent Mazare2024-08-011-0/+1
* Add get_ids to GradStore (#2379)Takanori MAEHARA2024-08-011-0/+5
* Use BF16 on metal when possible. (#2378)Laurent Mazare2024-08-011-0/+16
* Fix log_sum_exp to handle large positive/negative inputs (#2367)Yun-Jhong Wu2024-08-012-6/+34
* Enable the affine kernel for u8/u32. (#2376)Laurent Mazare2024-08-011-0/+2
* Add support for Llama 3.1 (#2359)Eric Buehler2024-07-265-10/+10
* Fix for backprop in ConvTranspose2D with stride of 2 (#2337)Ivor Wanders2024-07-172-2/+99
* Fix Elu gradient NaN on large input (#2328)Alexey Gerasev2024-07-161-1/+2
* Add a basic metal example with capture (#2324)Laurent Mazare2024-07-093-1/+39
* Fix a bug in the metal implemtation of col2im1d. (#2284)Laurent Mazare2024-06-221-1/+6
* Fix the fast bf16 gemm cublas kernels. (#2274)Laurent Mazare2024-06-184-12/+24
* Automatically upcast for to_u64 (#2244)Eric Buehler2024-06-041-1/+7
* add where_cond f32 for metal (#2236)Lionel Touati2024-06-021-0/+1
* Add a metal kernel for col2im1d. (#2214)Laurent Mazare2024-05-251-34/+92
* Add the layernorm specialized op. (#2212)Laurent Mazare2024-05-242-1/+39
* More efficient cuda implementation for ConvTranspose1d. (#2211)Laurent Mazare2024-05-242-4/+75
* Add a slice_set op. (#2193)Laurent Mazare2024-05-182-0/+87
* Add SliceSafetensors. (#2179)Laurent Mazare2024-05-112-0/+71
* Make it possible to use TF32 accumulation in F32 matmuls. (#2178)Laurent Mazare2024-05-113-30/+89
* Use write rather than try-write on the metal rw-locks. (#2162)Laurent Mazare2024-05-052-7/+13