summaryrefslogtreecommitdiff
path: root/candle-core
Commit message (Expand)AuthorAgeFilesLines
* Add a minimal test for the metal bf16 matmul. (#2381)Laurent Mazare2024-08-011-0/+20
* Enable BF16 on metal. (#2380)Laurent Mazare2024-08-011-0/+1
* Add get_ids to GradStore (#2379)Takanori MAEHARA2024-08-011-0/+5
* Use BF16 on metal when possible. (#2378)Laurent Mazare2024-08-011-0/+16
* Fix log_sum_exp to handle large positive/negative inputs (#2367)Yun-Jhong Wu2024-08-012-6/+34
* Enable the affine kernel for u8/u32. (#2376)Laurent Mazare2024-08-011-0/+2
* Add support for Llama 3.1 (#2359)Eric Buehler2024-07-265-10/+10
* Fix for backprop in ConvTranspose2D with stride of 2 (#2337)Ivor Wanders2024-07-172-2/+99
* Fix Elu gradient NaN on large input (#2328)Alexey Gerasev2024-07-161-1/+2
* Add a basic metal example with capture (#2324)Laurent Mazare2024-07-093-1/+39
* Fix a bug in the metal implemtation of col2im1d. (#2284)Laurent Mazare2024-06-221-1/+6
* Fix the fast bf16 gemm cublas kernels. (#2274)Laurent Mazare2024-06-184-12/+24
* Automatically upcast for to_u64 (#2244)Eric Buehler2024-06-041-1/+7
* add where_cond f32 for metal (#2236)Lionel Touati2024-06-021-0/+1
* Add a metal kernel for col2im1d. (#2214)Laurent Mazare2024-05-251-34/+92
* Add the layernorm specialized op. (#2212)Laurent Mazare2024-05-242-1/+39
* More efficient cuda implementation for ConvTranspose1d. (#2211)Laurent Mazare2024-05-242-4/+75
* Add a slice_set op. (#2193)Laurent Mazare2024-05-182-0/+87
* Add SliceSafetensors. (#2179)Laurent Mazare2024-05-112-0/+71
* Make it possible to use TF32 accumulation in F32 matmuls. (#2178)Laurent Mazare2024-05-113-30/+89
* Use write rather than try-write on the metal rw-locks. (#2162)Laurent Mazare2024-05-052-7/+13
* Separate quantized phi-3 implementation. (#2157)Laurent Mazare2024-05-042-4/+1
* Bump the version number to 0.5.1. (#2155)Laurent Mazare2024-05-033-39/+2
* F16/BF16 bugfix (bis). (#2143)Laurent Mazare2024-04-291-14/+36
* Bugfix the recent f16/bf16 changes. (#2142)Laurent Mazare2024-04-291-8/+8
* Bug Fix: When converting a tensor to a variable, clone if the tensor is alrea...Jeffrey Dallatezza2024-04-291-2/+7
* Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)MilkFather2024-04-291-2/+2
* Add a toggle for F16/BF16 accumulation in gemm. (#2141)Laurent Mazare2024-04-293-15/+150
* Add a forward_via_f16 method to the qmatmul op. (#2138)Laurent Mazare2024-04-281-0/+19
* Add the cuda dequantize f16 kernels. (#2137)Laurent Mazare2024-04-284-18/+242
* Add a sort function. (#2134)Laurent Mazare2024-04-282-0/+35
* Add argsort. (#2132)Laurent Mazare2024-04-274-1/+241
* Add StorageRef. (#2113)Laurent Mazare2024-04-2310-5/+108
* Update zip requirement from 0.6.6 to 1.1.1 (#2103)dependabot[bot]2024-04-221-1/+1
* Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056)Thomas Santerre2024-04-214-147/+283
* Small cleanups to the llama multi-process example. (#2098)Laurent Mazare2024-04-201-1/+5
* Handle multiple dimensions in metal QMM + two fixes. (#2097)Laurent Mazare2024-04-201-15/+20
* Fix the silu gradient issue on 0. (#2083)Laurent Mazare2024-04-181-1/+1
* Add more QMMV cuda kernels. (#2077)Laurent Mazare2024-04-182-15/+25
* Add the mmv kernels for small batch sizes. (#2075)Laurent Mazare2024-04-162-19/+81
* Fix for the batch dim in the quantized matmul example. (#2073)Laurent Mazare2024-04-153-38/+38
* Add a function to clear the KV cache in falcon. (#2066)Laurent Mazare2024-04-151-0/+1
* Handle zero dims in some simple operations. (#2064)Laurent Mazare2024-04-152-0/+43
* Faster kernels for quantized matmul on cuda (#2060)Laurent Mazare2024-04-151-6/+137
* Expose the synchronize function on the generic device. (#2062)Laurent Mazare2024-04-141-0/+8
* Add missing bfloat unary strided kernels and fix typo (#2058)ivarflakstad2024-04-141-0/+20
* Add a synchronize method to devices. (#2055)Laurent Mazare2024-04-146-0/+24
* Add benchmarks for qmatmul operations (#2048)Thomas Santerre2024-04-133-0/+74
* Support gather on bf16 for metal. (#2035)Laurent Mazare2024-04-101-0/+1
* Use BufferOffset in metal backend ops. (#2029)Laurent Mazare2024-04-081-50/+39