| Commit message (Expand) | Author | Age | Files | Lines |
* | Fix the fast bf16 gemm cublas kernels. (#2274) | Laurent Mazare | 2024-06-18 | 1 | -1/+4 |
* | Make it possible to use TF32 accumulation in F32 matmuls. (#2178) | Laurent Mazare | 2024-05-11 | 1 | -24/+18 |
* | Cuda kernel for dequantizing q8k. (#1760) | Laurent Mazare | 2024-02-26 | 1 | -4/+4 |
* | Cuda acceleration for quantized model. (#1754) | Laurent Mazare | 2024-02-25 | 1 | -16/+23 |
* | Dilated convolutions (#657) | Laurent Mazare | 2023-08-29 | 1 | -3/+3 |
* | Add to the cuda example a reproduction of the issue. (#579) | Laurent Mazare | 2023-08-24 | 1 | -2/+11 |
* | Add a test for conv2d with padding + bugfix the random number generation on c... | Laurent Mazare | 2023-08-24 | 1 | -0/+3 |
* | Add some group parameter to convolutions. (#566) | Laurent Mazare | 2023-08-23 | 1 | -1/+1 |
* | Cudnn support (#445) | Laurent Mazare | 2023-08-14 | 1 | -5/+4 |
* | More accelerate optimizations (#427) | Laurent Mazare | 2023-08-13 | 1 | -0/+3 |
* | Rename the candle crate to candle-core (#301) | Laurent Mazare | 2023-08-02 | 1 | -1/+1 |
* | Simplify the parameters used by sum and sum_keepdim. (#165) | Laurent Mazare | 2023-07-14 | 1 | -2/+2 |
* | Use the same default as pytorch for sum. (#164) | Laurent Mazare | 2023-07-13 | 1 | -2/+2 |
* | Sketch a fast cuda kernel for reduce-sum. (#109) | Laurent Mazare | 2023-07-08 | 1 | -0/+15 |
* | Add some very simple sum benchmark. (#108) | Laurent Mazare | 2023-07-08 | 1 | -34/+0 |
* | Add mkl support for matrix multiply. (#86) | Laurent Mazare | 2023-07-06 | 1 | -0/+3 |
* | Refactor the hierarchy. | Nicolas Patry | 2023-06-27 | 1 | -0/+31 |