summaryrefslogtreecommitdiff
path: root/candle-core/examples/cuda_basics.rs
Commit message (Expand)AuthorAgeFilesLines
* Fix the fast bf16 gemm cublas kernels. (#2274)Laurent Mazare2024-06-181-1/+4
* Make it possible to use TF32 accumulation in F32 matmuls. (#2178)Laurent Mazare2024-05-111-24/+18
* Cuda kernel for dequantizing q8k. (#1760)Laurent Mazare2024-02-261-4/+4
* Cuda acceleration for quantized model. (#1754)Laurent Mazare2024-02-251-16/+23
* Dilated convolutions (#657)Laurent Mazare2023-08-291-3/+3
* Add to the cuda example a reproduction of the issue. (#579)Laurent Mazare2023-08-241-2/+11
* Add a test for conv2d with padding + bugfix the random number generation on c...Laurent Mazare2023-08-241-0/+3
* Add some group parameter to convolutions. (#566)Laurent Mazare2023-08-231-1/+1
* Cudnn support (#445)Laurent Mazare2023-08-141-5/+4
* More accelerate optimizations (#427)Laurent Mazare2023-08-131-0/+3
* Rename the candle crate to candle-core (#301)Laurent Mazare2023-08-021-1/+1
* Simplify the parameters used by sum and sum_keepdim. (#165)Laurent Mazare2023-07-141-2/+2
* Use the same default as pytorch for sum. (#164)Laurent Mazare2023-07-131-2/+2
* Sketch a fast cuda kernel for reduce-sum. (#109)Laurent Mazare2023-07-081-0/+15
* Add some very simple sum benchmark. (#108)Laurent Mazare2023-07-081-34/+0
* Add mkl support for matrix multiply. (#86)Laurent Mazare2023-07-061-0/+3
* Refactor the hierarchy.Nicolas Patry2023-06-271-0/+31