summaryrefslogtreecommitdiff
path: root/candle-core
Commit message (Expand)AuthorAgeFilesLines
...
* Automatically upcast for to_u64 (#2244)Eric Buehler2024-06-041-1/+7
* add where_cond f32 for metal (#2236)Lionel Touati2024-06-021-0/+1
* Add a metal kernel for col2im1d. (#2214)Laurent Mazare2024-05-251-34/+92
* Add the layernorm specialized op. (#2212)Laurent Mazare2024-05-242-1/+39
* More efficient cuda implementation for ConvTranspose1d. (#2211)Laurent Mazare2024-05-242-4/+75
* Add a slice_set op. (#2193)Laurent Mazare2024-05-182-0/+87
* Add SliceSafetensors. (#2179)Laurent Mazare2024-05-112-0/+71
* Make it possible to use TF32 accumulation in F32 matmuls. (#2178)Laurent Mazare2024-05-113-30/+89
* Use write rather than try-write on the metal rw-locks. (#2162)Laurent Mazare2024-05-052-7/+13
* Separate quantized phi-3 implementation. (#2157)Laurent Mazare2024-05-042-4/+1
* Bump the version number to 0.5.1. (#2155)Laurent Mazare2024-05-033-39/+2
* F16/BF16 bugfix (bis). (#2143)Laurent Mazare2024-04-291-14/+36
* Bugfix the recent f16/bf16 changes. (#2142)Laurent Mazare2024-04-291-8/+8
* Bug Fix: When converting a tensor to a variable, clone if the tensor is alrea...Jeffrey Dallatezza2024-04-291-2/+7
* Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)MilkFather2024-04-291-2/+2
* Add a toggle for F16/BF16 accumulation in gemm. (#2141)Laurent Mazare2024-04-293-15/+150
* Add a forward_via_f16 method to the qmatmul op. (#2138)Laurent Mazare2024-04-281-0/+19
* Add the cuda dequantize f16 kernels. (#2137)Laurent Mazare2024-04-284-18/+242
* Add a sort function. (#2134)Laurent Mazare2024-04-282-0/+35
* Add argsort. (#2132)Laurent Mazare2024-04-274-1/+241
* Add StorageRef. (#2113)Laurent Mazare2024-04-2310-5/+108
* Update zip requirement from 0.6.6 to 1.1.1 (#2103)dependabot[bot]2024-04-221-1/+1
* Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056)Thomas Santerre2024-04-214-147/+283
* Small cleanups to the llama multi-process example. (#2098)Laurent Mazare2024-04-201-1/+5
* Handle multiple dimensions in metal QMM + two fixes. (#2097)Laurent Mazare2024-04-201-15/+20
* Fix the silu gradient issue on 0. (#2083)Laurent Mazare2024-04-181-1/+1
* Add more QMMV cuda kernels. (#2077)Laurent Mazare2024-04-182-15/+25
* Add the mmv kernels for small batch sizes. (#2075)Laurent Mazare2024-04-162-19/+81
* Fix for the batch dim in the quantized matmul example. (#2073)Laurent Mazare2024-04-153-38/+38
* Add a function to clear the KV cache in falcon. (#2066)Laurent Mazare2024-04-151-0/+1
* Handle zero dims in some simple operations. (#2064)Laurent Mazare2024-04-152-0/+43
* Faster kernels for quantized matmul on cuda (#2060)Laurent Mazare2024-04-151-6/+137
* Expose the synchronize function on the generic device. (#2062)Laurent Mazare2024-04-141-0/+8
* Add missing bfloat unary strided kernels and fix typo (#2058)ivarflakstad2024-04-141-0/+20
* Add a synchronize method to devices. (#2055)Laurent Mazare2024-04-146-0/+24
* Add benchmarks for qmatmul operations (#2048)Thomas Santerre2024-04-133-0/+74
* Support gather on bf16 for metal. (#2035)Laurent Mazare2024-04-101-0/+1
* Use BufferOffset in metal backend ops. (#2029)Laurent Mazare2024-04-081-50/+39
* Rework the buffer offset logic for metal kernels (#2028)Laurent Mazare2024-04-071-39/+43
* Handle the batch dimension in quantized MMV on metal. (#2022)Laurent Mazare2024-04-061-1/+4
* first commit (#2018)Jorge António2024-04-051-1/+1
* Add support for "sign" on tensors (#2012)Thomas Santerre2024-04-045-10/+57
* Fix the matmul layout for accelerate & mkl. (#2011)Laurent Mazare2024-04-043-26/+8
* update dtypes checks for several metal operations (#2010)Thomas Santerre2024-04-041-27/+45
* Optimize the gelu f16 opt. (#2008)Laurent Mazare2024-04-042-8/+19
* Split the cuda error file. (#2003)Laurent Mazare2024-04-042-65/+67
* Relax the contiguous check for cuda kernels. (#2000)Laurent Mazare2024-04-031-1/+6
* Improve the handling of matmul with squeezed layouts. (#1998)Laurent Mazare2024-04-024-138/+150
* modify access for conv and op to be pub to allow external packages to have cu...Thomas Santerre2024-04-011-2/+2
* Quantized cuda tweaks. (#1981)Laurent Mazare2024-04-011-89/+62