Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Detach all grads during backprop. (#1243) | Laurent Mazare | 2023-11-05 | 1 | -4/+21 |
| | | | | | | | * Detach all grads during backprop. * Add an environment variable to select the backprop behavior. * Update the comment. | ||||
* | feat: add backprop for elu (#1269) | drbh | 2023-11-04 | 2 | -1/+34 |
| | | | | | | | | | * feat: add backprop for elu * Cosmetic tweaks. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com> | ||||
* | Allow using gguf-v3 files. (#1262) | Laurent Mazare | 2023-11-03 | 1 | -5/+15 |
| | |||||
* | feat: impl backprop for erf and gelu-erf (#1258) | drbh | 2023-11-03 | 3 | -3/+59 |
| | | | | | | | | | | | * impl backprop for erf anf gelu-erf * feat: unary tests added for erf and gelu-erf * fix: (clippy) remove immediately dereferenced ref * fix: improve comments with pytorch code snippet * fix: adjust comment typo in backprop impl | ||||
* | Backprop support for conv1d (cpu only for now). (#1255) | Laurent Mazare | 2023-11-03 | 1 | -1/+38 |
| | |||||
* | Test for the transposed conv1d. (#1254) | Laurent Mazare | 2023-11-03 | 2 | -1/+17 |
| | |||||
* | Add the conv-transpose1d op. (#1251) | Laurent Mazare | 2023-11-03 | 8 | -0/+221 |
| | | | | | * Skeleton structure for conv-transpose1d. * CPU implementation for conv-transpose1d. | ||||
* | Lazy detach. (#1242) | Laurent Mazare | 2023-11-02 | 2 | -10/+20 |
| | |||||
* | Add a hack for generating random uniform/normal for f16/bf16. (#1228) | Laurent Mazare | 2023-10-31 | 1 | -4/+16 |
| | |||||
* | PyO3: Add `equal` and `__richcmp__` to `candle.Tensor` (#1099) | Lukas Kreussel | 2023-10-30 | 1 | -1/+1 |
| | | | | | | | | | | | | | | | | | | | * add `equal` to tensor * add `__richcmp__` support for tensors and scalars * typo * more typos * Add `abs` + `candle.testing` * remove duplicated `broadcast_shape_binary_op` * `candle.i16` => `candle.i64` * `tensor.nelements` -> `tensor.nelement` * Cleanup `abs` | ||||
* | Support negative steps in arange. (#1218) | Laurent Mazare | 2023-10-30 | 2 | -3/+33 |
| | |||||
* | Add i64-abs. (#1216) | Laurent Mazare | 2023-10-29 | 2 | -1/+42 |
| | |||||
* | Marian MT model (#1210) | Laurent Mazare | 2023-10-29 | 1 | -10/+10 |
| | | | | | | | | | | | | | | | | | | | | | * Skeleton files for the marian MT model. * Marian initialization. * Implement the attention forward method. * Forward pass for the encoder side. * Expose the encoder and decoder. * Start plugging the decoder. * Forward pass for the decoder layer. * Set up the marian example. * Add some missing backtraces. * Bugfix. | ||||
* | Fix the conv2d gradient computation. (#1214) | Laurent Mazare | 2023-10-29 | 2 | -0/+72 |
| | |||||
* | Allow for different behavior between training and eval (#1213) | Laurent Mazare | 2023-10-29 | 2 | -0/+17 |
| | | | | | * Forward with training. * Do not use dropout on vgg evaluation. | ||||
* | No need for the even constraint on vecdot-q40-q80. (#1202) | Laurent Mazare | 2023-10-28 | 4 | -41/+2 |
| | |||||
* | Add a quantized variant of llama2.c (#1197) | Laurent Mazare | 2023-10-27 | 2 | -28/+2 |
| | | | | | * Add a quantized variant of llama2.c * Clippy fixes. | ||||
* | Add some missing backtraces. (#1193) | Laurent Mazare | 2023-10-27 | 1 | -6/+12 |
| | |||||
* | Enable the test for meshgrid + fix the implementation. (#1175) | Laurent Mazare | 2023-10-25 | 1 | -24/+28 |
| | |||||
* | Implemented meshgrid (#1174) | Wouter Doppenberg | 2023-10-25 | 2 | -0/+66 |
| | | | | | | | | | | | * Implemented meshgrid * Resolved feedback from LaurentMazare * Rustfmt * Updated docstring * Removed outdated error mode from docstring | ||||
* | fix ucopy for `f64` tensors (#1170) | Ibiyemi Abiodun | 2023-10-24 | 1 | -1/+1 |
| | |||||
* | derivative for GELU (#1160) | KGrewal1 | 2023-10-23 | 2 | -1/+22 |
| | | | | | * derivative for GELU * add tests | ||||
* | Handle LongStorage in pytorch checkpoints. (#1152) | Laurent Mazare | 2023-10-22 | 1 | -0/+1 |
| | |||||
* | Expose the track-op method. (#1148) | Laurent Mazare | 2023-10-22 | 1 | -1/+1 |
| | |||||
* | Add get_on_dim. (#1142) | Laurent Mazare | 2023-10-21 | 1 | -0/+18 |
| | |||||
* | Add pad_with_same. (#1127) | Laurent Mazare | 2023-10-18 | 2 | -0/+56 |
| | | | | | | | | | * More model cloning. * More cloning on quantized models. * Add pad-with-same. * Add some tests. | ||||
* | Better error message when overflowing in narrow. (#1119) | Laurent Mazare | 2023-10-18 | 1 | -9/+17 |
| | |||||
* | Refactor the pth tensor exctraction. (#1109) | Laurent Mazare | 2023-10-16 | 1 | -44/+48 |
| | |||||
* | Read all the tensors in a PyTorch pth file. (#1106) | Laurent Mazare | 2023-10-16 | 1 | -0/+13 |
| | |||||
* | Improve the reshape error messages. (#1096) | Laurent Mazare | 2023-10-15 | 1 | -68/+33 |
| | | | | | * Improve the reshape error messages. * Add the verbose-prompt flag to the phi example. | ||||
* | Avoid trying to backprop through non-differentiable layers. (#1094) | Laurent Mazare | 2023-10-14 | 1 | -2/+12 |
| | |||||
* | Create a new curand instead of reseeding. (#1089) | Laurent Mazare | 2023-10-14 | 2 | -1/+10 |
| | |||||
* | Fix the npy read function and add some testing. (#1080) | Laurent Mazare | 2023-10-12 | 5 | -2/+33 |
| | |||||
* | Use full tensors for zeros and ones (#1071) | Laurent Mazare | 2023-10-11 | 1 | -16/+6 |
| | | | | | * Only optimize float tensors. * Use full tensors for zeros and ones. | ||||
* | Only optimize float tensors. (#1069) | Laurent Mazare | 2023-10-10 | 1 | -0/+14 |
| | |||||
* | Remove some unusued bits. (#1067) | Laurent Mazare | 2023-10-09 | 1 | -2/+1 |
| | |||||
* | Make the cuda rng seedable. (#1056) | Laurent Mazare | 2023-10-08 | 4 | -0/+16 |
| | |||||
* | Better control on the optional dequantization in QMatMul (#1049) | Laurent Mazare | 2023-10-07 | 1 | -7/+28 |
| | | | | | | | * Cosmetic change to the quantized whisper model. * Fix the dequantization. * Add the dequantize all variable. | ||||
* | Add the round-to function. (#1039) | Laurent Mazare | 2023-10-05 | 2 | -0/+18 |
| | |||||
* | fix: fix index_select cuda kernel for src target dim different than ids dim ↵ | Gonzalo | 2023-10-05 | 2 | -2/+13 |
| | | | | | | | when selecting dim > 0 (#1037) * fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 * cargo fmt | ||||
* | Add the rounding operators. (#1030) | Laurent Mazare | 2023-10-04 | 4 | -0/+133 |
| | | | | | | | * Add the rounding operators. * Avoid tracking gradients for the rounding operations. * Add some rounding tests. | ||||
* | Simd128 optimized q8k vecdot. (#1026) | Laurent Mazare | 2023-10-03 | 2 | -0/+33 |
| | |||||
* | AVX optimized q8k vecdot. (#1024) | Laurent Mazare | 2023-10-03 | 2 | -0/+35 |
| | |||||
* | Fix for the index-select cuda setup. (#1022) | Laurent Mazare | 2023-10-03 | 2 | -1/+16 |
| | | | | | * Fix for index-select. * Better fix + add some testing. | ||||
* | neon optimized q8k multiplication. (#1021) | Laurent Mazare | 2023-10-02 | 2 | -3/+36 |
| | | | | | | | * neon optimized q8k multiplication. * Bugfixes. * simdification. | ||||
* | Add the q8k vec-dot multiplication. (#1019) | Laurent Mazare | 2023-10-02 | 2 | -2/+46 |
| | |||||
* | Improve the quantized whisper setup. (#1018) | Laurent Mazare | 2023-10-02 | 2 | -17/+26 |
| | | | | | | | * Improve the quantized whisper setup. * Fix the config file paths. * Use the standard matmul where possible. | ||||
* | Improve the testing of the optimized quantized vec-dot ops (#1016) | Laurent Mazare | 2023-10-02 | 2 | -5/+68 |
| | | | | | * Expose the unopt functions for testing. * Better testing of the optimized quantized computations. | ||||
* | Simd128 version of q6k vec-dot. (#1015) | Laurent Mazare | 2023-10-01 | 2 | -1/+127 |
| | | | | | | | * Add a specific function for the simd128 q6k vec-dot. * Simdification. * More simdification. | ||||
* | Bump the version to 0.3.0. (#1014) | Laurent Mazare | 2023-10-01 | 1 | -1/+1 |
| | | | | | * Bump the version to 0.3.0. * Changelog update. |