summaryrefslogtreecommitdiff
path: root/candle-core/src/quantized
Commit message (Expand)AuthorAgeFilesLines
* Add a Context trait similar to anyhow::Context. (#2676)Laurent Mazare2024-12-222-4/+4
* Clippy fixes for the cuda feature. (#2650)Laurent Mazare2024-11-291-1/+1
* Lint fixes introduced with Rust 1.83 (#2646)Anubhab Bandyopadhyay2024-11-282-3/+3
* 20241118 docs (#2629)zachcp2024-11-193-3/+3
* Cuda quantized mmv bugfix. (#2526)Laurent Mazare2024-10-011-1/+25
* Yet another cuda qmm padding fix. (#2509)Laurent Mazare2024-09-301-25/+55
* Automatically upcast for to_u64 (#2244)Eric Buehler2024-06-041-1/+7
* Bump the version number to 0.5.1. (#2155)Laurent Mazare2024-05-031-1/+0
* Add a forward_via_f16 method to the qmatmul op. (#2138)Laurent Mazare2024-04-281-0/+19
* Add the cuda dequantize f16 kernels. (#2137)Laurent Mazare2024-04-283-17/+122
* Handle multiple dimensions in metal QMM + two fixes. (#2097)Laurent Mazare2024-04-201-15/+20
* Add more QMMV cuda kernels. (#2077)Laurent Mazare2024-04-181-8/+10
* Add the mmv kernels for small batch sizes. (#2075)Laurent Mazare2024-04-161-18/+46
* Fix for the batch dim in the quantized matmul example. (#2073)Laurent Mazare2024-04-151-1/+1
* Add a function to clear the KV cache in falcon. (#2066)Laurent Mazare2024-04-151-0/+1
* Faster kernels for quantized matmul on cuda (#2060)Laurent Mazare2024-04-151-6/+137
* Handle the batch dimension in quantized MMV on metal. (#2022)Laurent Mazare2024-04-061-1/+4
* Quantized cuda tweaks. (#1981)Laurent Mazare2024-04-011-89/+62
* Switch the default to using the faster kernels. (#1978)Laurent Mazare2024-04-011-1/+1
* More ggml cuda kernels (#1977)Laurent Mazare2024-04-011-7/+147
* Properly handle the batch dimension in cuda quantized matmul. (#1832)Laurent Mazare2024-03-101-1/+1
* Fix dequantization. (#1823)Laurent Mazare2024-03-081-1/+1
* Improve metal buffer usage (#1807)ivarflakstad2024-03-071-2/+7
* Handle Q5_0 and Q5_1 quants in cuda.laurent2024-02-291-16/+38
* Fix the block size for some cuda kernels. (#1767)Laurent Mazare2024-02-271-13/+15
* Cuda kernel for dequantizing q8k. (#1760)Laurent Mazare2024-02-261-18/+16
* Cuda acceleration for quantized model. (#1754)Laurent Mazare2024-02-256-48/+430
* Qmetal tweaks (#1704)Laurent Mazare2024-02-133-100/+141
* Fixing quantized llama demo on metal. (#1703)Nicolas Patry2024-02-133-0/+19
* Quantized GGUF style (#1523)Nicolas Patry2024-01-174-82/+485
* Bugfix for dequantizing q5k layers. (#1569)Laurent Mazare2024-01-111-4/+4
* Simpler repro for the neon optimization issue + bugfix (#1544)Laurent Mazare2024-01-071-152/+56
* Fix the quantized mistral example. (#1478)Laurent Mazare2023-12-251-1/+1
* Fix a couple typos (#1451)Laurent Mazare2023-12-172-3/+3
* Implement the module trait directly for QMatMul. (#1372)Laurent Mazare2023-11-251-2/+2
* Allow using gguf-v3 files. (#1262)Laurent Mazare2023-11-031-5/+15
* No need for the even constraint on vecdot-q40-q80. (#1202)Laurent Mazare2023-10-284-41/+2
* Add a quantized variant of llama2.c (#1197)Laurent Mazare2023-10-272-28/+2
* Better control on the optional dequantization in QMatMul (#1049)Laurent Mazare2023-10-071-7/+28
* Simd128 optimized q8k vecdot. (#1026)Laurent Mazare2023-10-032-0/+33
* AVX optimized q8k vecdot. (#1024)Laurent Mazare2023-10-032-0/+35
* neon optimized q8k multiplication. (#1021)Laurent Mazare2023-10-022-3/+36
* Add the q8k vec-dot multiplication. (#1019)Laurent Mazare2023-10-021-2/+18
* Improve the quantized whisper setup. (#1018)Laurent Mazare2023-10-021-10/+19
* Improve the testing of the optimized quantized vec-dot ops (#1016)Laurent Mazare2023-10-021-2/+60
* Simd128 version of q6k vec-dot. (#1015)Laurent Mazare2023-10-012-1/+127
* Simd128 version of the q2k-q8k vecdot product. (#1011)Laurent Mazare2023-09-302-45/+75
* Simd128 q2k vecdot (#982)Laurent Mazare2023-09-282-4/+57
* Sketch a simd128 optimized q4k vecdot. (#977)Laurent Mazare2023-09-272-1/+97
* Simd128 vec-dot for q4_0. (#974)Laurent Mazare2023-09-272-1/+54