index
:
forks/candle.git
main
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
candle-core
Commit message (
Expand
)
Author
Age
Files
Lines
*
Add a minimal test for the metal bf16 matmul. (#2381)
Laurent Mazare
2024-08-01
1
-0
/
+20
*
Enable BF16 on metal. (#2380)
Laurent Mazare
2024-08-01
1
-0
/
+1
*
Add get_ids to GradStore (#2379)
Takanori MAEHARA
2024-08-01
1
-0
/
+5
*
Use BF16 on metal when possible. (#2378)
Laurent Mazare
2024-08-01
1
-0
/
+16
*
Fix log_sum_exp to handle large positive/negative inputs (#2367)
Yun-Jhong Wu
2024-08-01
2
-6
/
+34
*
Enable the affine kernel for u8/u32. (#2376)
Laurent Mazare
2024-08-01
1
-0
/
+2
*
Add support for Llama 3.1 (#2359)
Eric Buehler
2024-07-26
5
-10
/
+10
*
Fix for backprop in ConvTranspose2D with stride of 2 (#2337)
Ivor Wanders
2024-07-17
2
-2
/
+99
*
Fix Elu gradient NaN on large input (#2328)
Alexey Gerasev
2024-07-16
1
-1
/
+2
*
Add a basic metal example with capture (#2324)
Laurent Mazare
2024-07-09
3
-1
/
+39
*
Fix a bug in the metal implemtation of col2im1d. (#2284)
Laurent Mazare
2024-06-22
1
-1
/
+6
*
Fix the fast bf16 gemm cublas kernels. (#2274)
Laurent Mazare
2024-06-18
4
-12
/
+24
*
Automatically upcast for to_u64 (#2244)
Eric Buehler
2024-06-04
1
-1
/
+7
*
add where_cond f32 for metal (#2236)
Lionel Touati
2024-06-02
1
-0
/
+1
*
Add a metal kernel for col2im1d. (#2214)
Laurent Mazare
2024-05-25
1
-34
/
+92
*
Add the layernorm specialized op. (#2212)
Laurent Mazare
2024-05-24
2
-1
/
+39
*
More efficient cuda implementation for ConvTranspose1d. (#2211)
Laurent Mazare
2024-05-24
2
-4
/
+75
*
Add a slice_set op. (#2193)
Laurent Mazare
2024-05-18
2
-0
/
+87
*
Add SliceSafetensors. (#2179)
Laurent Mazare
2024-05-11
2
-0
/
+71
*
Make it possible to use TF32 accumulation in F32 matmuls. (#2178)
Laurent Mazare
2024-05-11
3
-30
/
+89
*
Use write rather than try-write on the metal rw-locks. (#2162)
Laurent Mazare
2024-05-05
2
-7
/
+13
*
Separate quantized phi-3 implementation. (#2157)
Laurent Mazare
2024-05-04
2
-4
/
+1
*
Bump the version number to 0.5.1. (#2155)
Laurent Mazare
2024-05-03
3
-39
/
+2
*
F16/BF16 bugfix (bis). (#2143)
Laurent Mazare
2024-04-29
1
-14
/
+36
*
Bugfix the recent f16/bf16 changes. (#2142)
Laurent Mazare
2024-04-29
1
-8
/
+8
*
Bug Fix: When converting a tensor to a variable, clone if the tensor is alrea...
Jeffrey Dallatezza
2024-04-29
1
-2
/
+7
*
Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)
MilkFather
2024-04-29
1
-2
/
+2
*
Add a toggle for F16/BF16 accumulation in gemm. (#2141)
Laurent Mazare
2024-04-29
3
-15
/
+150
*
Add a forward_via_f16 method to the qmatmul op. (#2138)
Laurent Mazare
2024-04-28
1
-0
/
+19
*
Add the cuda dequantize f16 kernels. (#2137)
Laurent Mazare
2024-04-28
4
-18
/
+242
*
Add a sort function. (#2134)
Laurent Mazare
2024-04-28
2
-0
/
+35
*
Add argsort. (#2132)
Laurent Mazare
2024-04-27
4
-1
/
+241
*
Add StorageRef. (#2113)
Laurent Mazare
2024-04-23
10
-5
/
+108
*
Update zip requirement from 0.6.6 to 1.1.1 (#2103)
dependabot[bot]
2024-04-22
1
-1
/
+1
*
Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056)
Thomas Santerre
2024-04-21
4
-147
/
+283
*
Small cleanups to the llama multi-process example. (#2098)
Laurent Mazare
2024-04-20
1
-1
/
+5
*
Handle multiple dimensions in metal QMM + two fixes. (#2097)
Laurent Mazare
2024-04-20
1
-15
/
+20
*
Fix the silu gradient issue on 0. (#2083)
Laurent Mazare
2024-04-18
1
-1
/
+1
*
Add more QMMV cuda kernels. (#2077)
Laurent Mazare
2024-04-18
2
-15
/
+25
*
Add the mmv kernels for small batch sizes. (#2075)
Laurent Mazare
2024-04-16
2
-19
/
+81
*
Fix for the batch dim in the quantized matmul example. (#2073)
Laurent Mazare
2024-04-15
3
-38
/
+38
*
Add a function to clear the KV cache in falcon. (#2066)
Laurent Mazare
2024-04-15
1
-0
/
+1
*
Handle zero dims in some simple operations. (#2064)
Laurent Mazare
2024-04-15
2
-0
/
+43
*
Faster kernels for quantized matmul on cuda (#2060)
Laurent Mazare
2024-04-15
1
-6
/
+137
*
Expose the synchronize function on the generic device. (#2062)
Laurent Mazare
2024-04-14
1
-0
/
+8
*
Add missing bfloat unary strided kernels and fix typo (#2058)
ivarflakstad
2024-04-14
1
-0
/
+20
*
Add a synchronize method to devices. (#2055)
Laurent Mazare
2024-04-14
6
-0
/
+24
*
Add benchmarks for qmatmul operations (#2048)
Thomas Santerre
2024-04-13
3
-0
/
+74
*
Support gather on bf16 for metal. (#2035)
Laurent Mazare
2024-04-10
1
-0
/
+1
*
Use BufferOffset in metal backend ops. (#2029)
Laurent Mazare
2024-04-08
1
-50
/
+39
[next]