summaryrefslogtreecommitdiff
path: root/candle-core
Commit message (Collapse)AuthorAgeFilesLines
* Detach all grads during backprop. (#1243)Laurent Mazare2023-11-051-4/+21
| | | | | | | * Detach all grads during backprop. * Add an environment variable to select the backprop behavior. * Update the comment.
* feat: add backprop for elu (#1269)drbh2023-11-042-1/+34
| | | | | | | | | * feat: add backprop for elu * Cosmetic tweaks. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Allow using gguf-v3 files. (#1262)Laurent Mazare2023-11-031-5/+15
|
* feat: impl backprop for erf and gelu-erf (#1258)drbh2023-11-033-3/+59
| | | | | | | | | | | * impl backprop for erf anf gelu-erf * feat: unary tests added for erf and gelu-erf * fix: (clippy) remove immediately dereferenced ref * fix: improve comments with pytorch code snippet * fix: adjust comment typo in backprop impl
* Backprop support for conv1d (cpu only for now). (#1255)Laurent Mazare2023-11-031-1/+38
|
* Test for the transposed conv1d. (#1254)Laurent Mazare2023-11-032-1/+17
|
* Add the conv-transpose1d op. (#1251)Laurent Mazare2023-11-038-0/+221
| | | | | * Skeleton structure for conv-transpose1d. * CPU implementation for conv-transpose1d.
* Lazy detach. (#1242)Laurent Mazare2023-11-022-10/+20
|
* Add a hack for generating random uniform/normal for f16/bf16. (#1228)Laurent Mazare2023-10-311-4/+16
|
* PyO3: Add `equal` and `__richcmp__` to `candle.Tensor` (#1099)Lukas Kreussel2023-10-301-1/+1
| | | | | | | | | | | | | | | | | | | * add `equal` to tensor * add `__richcmp__` support for tensors and scalars * typo * more typos * Add `abs` + `candle.testing` * remove duplicated `broadcast_shape_binary_op` * `candle.i16` => `candle.i64` * `tensor.nelements` -> `tensor.nelement` * Cleanup `abs`
* Support negative steps in arange. (#1218)Laurent Mazare2023-10-302-3/+33
|
* Add i64-abs. (#1216)Laurent Mazare2023-10-292-1/+42
|
* Marian MT model (#1210)Laurent Mazare2023-10-291-10/+10
| | | | | | | | | | | | | | | | | | | | | * Skeleton files for the marian MT model. * Marian initialization. * Implement the attention forward method. * Forward pass for the encoder side. * Expose the encoder and decoder. * Start plugging the decoder. * Forward pass for the decoder layer. * Set up the marian example. * Add some missing backtraces. * Bugfix.
* Fix the conv2d gradient computation. (#1214)Laurent Mazare2023-10-292-0/+72
|
* Allow for different behavior between training and eval (#1213)Laurent Mazare2023-10-292-0/+17
| | | | | * Forward with training. * Do not use dropout on vgg evaluation.
* No need for the even constraint on vecdot-q40-q80. (#1202)Laurent Mazare2023-10-284-41/+2
|
* Add a quantized variant of llama2.c (#1197)Laurent Mazare2023-10-272-28/+2
| | | | | * Add a quantized variant of llama2.c * Clippy fixes.
* Add some missing backtraces. (#1193)Laurent Mazare2023-10-271-6/+12
|
* Enable the test for meshgrid + fix the implementation. (#1175)Laurent Mazare2023-10-251-24/+28
|
* Implemented meshgrid (#1174)Wouter Doppenberg2023-10-252-0/+66
| | | | | | | | | | | * Implemented meshgrid * Resolved feedback from LaurentMazare * Rustfmt * Updated docstring * Removed outdated error mode from docstring
* fix ucopy for `f64` tensors (#1170)Ibiyemi Abiodun2023-10-241-1/+1
|
* derivative for GELU (#1160)KGrewal12023-10-232-1/+22
| | | | | * derivative for GELU * add tests
* Handle LongStorage in pytorch checkpoints. (#1152)Laurent Mazare2023-10-221-0/+1
|
* Expose the track-op method. (#1148)Laurent Mazare2023-10-221-1/+1
|
* Add get_on_dim. (#1142)Laurent Mazare2023-10-211-0/+18
|
* Add pad_with_same. (#1127)Laurent Mazare2023-10-182-0/+56
| | | | | | | | | * More model cloning. * More cloning on quantized models. * Add pad-with-same. * Add some tests.
* Better error message when overflowing in narrow. (#1119)Laurent Mazare2023-10-181-9/+17
|
* Refactor the pth tensor exctraction. (#1109)Laurent Mazare2023-10-161-44/+48
|
* Read all the tensors in a PyTorch pth file. (#1106)Laurent Mazare2023-10-161-0/+13
|
* Improve the reshape error messages. (#1096)Laurent Mazare2023-10-151-68/+33
| | | | | * Improve the reshape error messages. * Add the verbose-prompt flag to the phi example.
* Avoid trying to backprop through non-differentiable layers. (#1094)Laurent Mazare2023-10-141-2/+12
|
* Create a new curand instead of reseeding. (#1089)Laurent Mazare2023-10-142-1/+10
|
* Fix the npy read function and add some testing. (#1080)Laurent Mazare2023-10-125-2/+33
|
* Use full tensors for zeros and ones (#1071)Laurent Mazare2023-10-111-16/+6
| | | | | * Only optimize float tensors. * Use full tensors for zeros and ones.
* Only optimize float tensors. (#1069)Laurent Mazare2023-10-101-0/+14
|
* Remove some unusued bits. (#1067)Laurent Mazare2023-10-091-2/+1
|
* Make the cuda rng seedable. (#1056)Laurent Mazare2023-10-084-0/+16
|
* Better control on the optional dequantization in QMatMul (#1049)Laurent Mazare2023-10-071-7/+28
| | | | | | | * Cosmetic change to the quantized whisper model. * Fix the dequantization. * Add the dequantize all variable.
* Add the round-to function. (#1039)Laurent Mazare2023-10-052-0/+18
|
* fix: fix index_select cuda kernel for src target dim different than ids dim ↵Gonzalo2023-10-052-2/+13
| | | | | | | when selecting dim > 0 (#1037) * fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 * cargo fmt
* Add the rounding operators. (#1030)Laurent Mazare2023-10-044-0/+133
| | | | | | | * Add the rounding operators. * Avoid tracking gradients for the rounding operations. * Add some rounding tests.
* Simd128 optimized q8k vecdot. (#1026)Laurent Mazare2023-10-032-0/+33
|
* AVX optimized q8k vecdot. (#1024)Laurent Mazare2023-10-032-0/+35
|
* Fix for the index-select cuda setup. (#1022)Laurent Mazare2023-10-032-1/+16
| | | | | * Fix for index-select. * Better fix + add some testing.
* neon optimized q8k multiplication. (#1021)Laurent Mazare2023-10-022-3/+36
| | | | | | | * neon optimized q8k multiplication. * Bugfixes. * simdification.
* Add the q8k vec-dot multiplication. (#1019)Laurent Mazare2023-10-022-2/+46
|
* Improve the quantized whisper setup. (#1018)Laurent Mazare2023-10-022-17/+26
| | | | | | | * Improve the quantized whisper setup. * Fix the config file paths. * Use the standard matmul where possible.
* Improve the testing of the optimized quantized vec-dot ops (#1016)Laurent Mazare2023-10-022-5/+68
| | | | | * Expose the unopt functions for testing. * Better testing of the optimized quantized computations.
* Simd128 version of q6k vec-dot. (#1015)Laurent Mazare2023-10-012-1/+127
| | | | | | | * Add a specific function for the simd128 q6k vec-dot. * Simdification. * More simdification.
* Bump the version to 0.3.0. (#1014)Laurent Mazare2023-10-011-1/+1
| | | | | * Bump the version to 0.3.0. * Changelog update.