summaryrefslogtreecommitdiff
path: root/candle-core/src/quantized/cuda.rs
Commit message (Expand)AuthorAgeFilesLines
* Clippy fixes for the cuda feature. (#2650)Laurent Mazare2024-11-291-1/+1
* Cuda quantized mmv bugfix. (#2526)Laurent Mazare2024-10-011-1/+25
* Yet another cuda qmm padding fix. (#2509)Laurent Mazare2024-09-301-25/+55
* Add the cuda dequantize f16 kernels. (#2137)Laurent Mazare2024-04-281-13/+75
* Add more QMMV cuda kernels. (#2077)Laurent Mazare2024-04-181-8/+10
* Add the mmv kernels for small batch sizes. (#2075)Laurent Mazare2024-04-161-18/+46
* Fix for the batch dim in the quantized matmul example. (#2073)Laurent Mazare2024-04-151-1/+1
* Add a function to clear the KV cache in falcon. (#2066)Laurent Mazare2024-04-151-0/+1
* Faster kernels for quantized matmul on cuda (#2060)Laurent Mazare2024-04-151-6/+137
* Quantized cuda tweaks. (#1981)Laurent Mazare2024-04-011-89/+62
* Switch the default to using the faster kernels. (#1978)Laurent Mazare2024-04-011-1/+1
* More ggml cuda kernels (#1977)Laurent Mazare2024-04-011-7/+147
* Properly handle the batch dimension in cuda quantized matmul. (#1832)Laurent Mazare2024-03-101-1/+1
* Handle Q5_0 and Q5_1 quants in cuda.laurent2024-02-291-16/+38
* Fix the block size for some cuda kernels. (#1767)Laurent Mazare2024-02-271-13/+15
* Cuda kernel for dequantizing q8k. (#1760)Laurent Mazare2024-02-261-18/+16
* Cuda acceleration for quantized model. (#1754)Laurent Mazare2024-02-251-0/+321