summaryrefslogtreecommitdiff
path: root/candle-core/src/quantized
Commit message (Expand)AuthorAgeFilesLines
...
* simd128 optimized q8_0 vecdot (#972)Laurent Mazare2023-09-273-0/+54
* Use the gelu-erf activation. (#969)Laurent Mazare2023-09-261-3/+3
* Avoid some overflows on wasm32. (#968)Laurent Mazare2023-09-262-3/+14
* Add a quantized version of the t5 model. (#921)Laurent Mazare2023-09-211-1/+1
* Fix some errors about BlockQ8_1 (#776)zmlcc2023-09-081-3/+5
* Add `ggufv2` support (#725)Lukas Kreussel2023-09-031-21/+97
* Support for quantized tensors in the python api. (#706)Laurent Mazare2023-09-011-3/+11
* Small cleanups (avoid some possible mutations) (#670)Laurent Mazare2023-08-301-99/+59
* Neon optimized vecdot (#666)Laurent Mazare2023-08-292-8/+369
* Add `avx` implemenetations of `q2k`, `q3k` and `q5k` vec-dot functions (#654)Lukas Kreussel2023-08-292-8/+403
* AVX version of the q4k vecdot. (#651)Laurent Mazare2023-08-292-9/+120
* Neon optimized version of the q4k vecdot product. (#632)Laurent Mazare2023-08-272-1/+99
* Llama quantization. (#625)Laurent Mazare2023-08-271-0/+4
* Add the quantize command. (#624)Laurent Mazare2023-08-271-1/+2
* Fix for q5_1 quantization. (#617)Laurent Mazare2023-08-271-1/+1
* Quantization tests + fix some issues. (#616)Laurent Mazare2023-08-271-6/+6
* More missing quantized bits. (#615)Laurent Mazare2023-08-271-7/+94
* Missing quants ops (#611)Laurent Mazare2023-08-261-13/+123
* Another transmute tweak. (#610)Laurent Mazare2023-08-261-20/+19
* Avoid using tmp values. (#609)Laurent Mazare2023-08-261-20/+8
* Add reference implementation for `q4k` and `q5k` (#586)Lukas Kreussel2023-08-261-4/+177
* Avoid some transmutes. (#607)Laurent Mazare2023-08-251-10/+5
* Neon intrinsics for the q8_0 vecdot. (#604)Laurent Mazare2023-08-252-0/+64
* AVX version for the q8-0 multiplications. (#598)Laurent Mazare2023-08-252-1/+23
* Generic implementation of vecdot for q80. (#596)Laurent Mazare2023-08-251-2/+18
* Add a function to write gguf files. (#585)Laurent Mazare2023-08-242-4/+163
* Referenze implementations of `q2k` and `q3k` vec-dot functions (#580)Lukas Kreussel2023-08-241-7/+179
* GGUF support in the quantized model. (#559)Laurent Mazare2023-08-231-2/+88
* Handle GGUF files in tensor-tools. (#558)Laurent Mazare2023-08-231-2/+10
* Preliminary GGUF support. (#557)Laurent Mazare2023-08-232-0/+221
* Avoid some mutable variables (take 2). (#554)Laurent Mazare2023-08-222-37/+29
* Revert "Avoid some mut in quantized functions. (#550)" (#552)Laurent Mazare2023-08-222-30/+39
* Avoid some mut in quantized functions. (#550)Laurent Mazare2023-08-222-39/+30
* Add quantization support for `q2k`, `q3k`, `q4k` and `q5k` (#524)Lukas Kreussel2023-08-223-399/+901
* Neon support for quantization. (#519)Laurent Mazare2023-08-193-0/+228
* Basic `qmatmul` parallelization (#492)Lukas Kreussel2023-08-181-5/+15
* Add a simple Module trait and implement it for the various nn layers (#500)Laurent Mazare2023-08-181-0/+1
* Tensor -> QTensor conversion (#496)Laurent Mazare2023-08-182-4/+41
* Q6K quantization (#495)Laurent Mazare2023-08-171-2/+207
* AVX version of the q6k vec-dot. (#493)Laurent Mazare2023-08-172-1/+104
* Relax the requirements on CustomOp. (#486)Laurent Mazare2023-08-171-3/+3
* Move the avx specific bits to a separate file. (#481)Laurent Mazare2023-08-173-116/+119
* AVX version of the vecdot for q4_0. (#474)Laurent Mazare2023-08-171-0/+75
* Add vecdot for q6k-q8k. (#476)Laurent Mazare2023-08-161-2/+56
* Use a zipped iterator. (#475)Laurent Mazare2023-08-161-11/+54
* Add a kv-cache to the quantized llama example. (#466)Laurent Mazare2023-08-161-4/+4
* Get the ggml based llama to generate some text. (#464)Laurent Mazare2023-08-163-23/+39
* Add quantized tensors. (#458)Laurent Mazare2023-08-152-106/+139
* Quantized support for f16 and f32 (#457)Laurent Mazare2023-08-151-0/+74
* Split out the quantized file. (#456)Laurent Mazare2023-08-153-0/+1104