summaryrefslogtreecommitdiff
path: root/candle-core/src/quantized/mod.rs
Commit message (Expand)AuthorAgeFilesLines
* Add a Context trait similar to anyhow::Context. (#2676)Laurent Mazare2024-12-221-2/+2
* 20241118 docs (#2629)zachcp2024-11-191-0/+1
* Add a forward_via_f16 method to the qmatmul op. (#2138)Laurent Mazare2024-04-281-0/+19
* Add the cuda dequantize f16 kernels. (#2137)Laurent Mazare2024-04-281-4/+43
* Fix dequantization. (#1823)Laurent Mazare2024-03-081-1/+1
* Cuda acceleration for quantized model. (#1754)Laurent Mazare2024-02-251-4/+32
* Qmetal tweaks (#1704)Laurent Mazare2024-02-131-91/+12
* Fixing quantized llama demo on metal. (#1703)Nicolas Patry2024-02-131-0/+12
* Quantized GGUF style (#1523)Nicolas Patry2024-01-171-48/+254
* Implement the module trait directly for QMatMul. (#1372)Laurent Mazare2023-11-251-2/+2
* Better control on the optional dequantization in QMatMul (#1049)Laurent Mazare2023-10-071-7/+28
* Improve the quantized whisper setup. (#1018)Laurent Mazare2023-10-021-10/+19
* simd128 optimized q8_0 vecdot (#972)Laurent Mazare2023-09-271-0/+2
* Add a quantized version of the t5 model. (#921)Laurent Mazare2023-09-211-1/+1
* Support for quantized tensors in the python api. (#706)Laurent Mazare2023-09-011-3/+11
* Llama quantization. (#625)Laurent Mazare2023-08-271-0/+4
* Add a function to write gguf files. (#585)Laurent Mazare2023-08-241-1/+38
* Preliminary GGUF support. (#557)Laurent Mazare2023-08-231-0/+1
* Add quantization support for `q2k`, `q3k`, `q4k` and `q5k` (#524)Lukas Kreussel2023-08-221-0/+1
* Neon support for quantization. (#519)Laurent Mazare2023-08-191-0/+2
* Add a simple Module trait and implement it for the various nn layers (#500)Laurent Mazare2023-08-181-0/+1
* Tensor -> QTensor conversion (#496)Laurent Mazare2023-08-181-3/+40
* Relax the requirements on CustomOp. (#486)Laurent Mazare2023-08-171-3/+3
* Move the avx specific bits to a separate file. (#481)Laurent Mazare2023-08-171-0/+2
* Get the ggml based llama to generate some text. (#464)Laurent Mazare2023-08-161-13/+18
* Add quantized tensors. (#458)Laurent Mazare2023-08-151-1/+113
* Split out the quantized file. (#456)Laurent Mazare2023-08-151-0/+82