summaryrefslogtreecommitdiff
path: root/candle-core/src/quantized
Commit message (Expand)AuthorAgeFilesLines
* Simpler repro for the neon optimization issue + bugfix (#1544)Laurent Mazare2024-01-071-152/+56
* Fix the quantized mistral example. (#1478)Laurent Mazare2023-12-251-1/+1
* Fix a couple typos (#1451)Laurent Mazare2023-12-172-3/+3
* Implement the module trait directly for QMatMul. (#1372)Laurent Mazare2023-11-251-2/+2
* Allow using gguf-v3 files. (#1262)Laurent Mazare2023-11-031-5/+15
* No need for the even constraint on vecdot-q40-q80. (#1202)Laurent Mazare2023-10-284-41/+2
* Add a quantized variant of llama2.c (#1197)Laurent Mazare2023-10-272-28/+2
* Better control on the optional dequantization in QMatMul (#1049)Laurent Mazare2023-10-071-7/+28
* Simd128 optimized q8k vecdot. (#1026)Laurent Mazare2023-10-032-0/+33
* AVX optimized q8k vecdot. (#1024)Laurent Mazare2023-10-032-0/+35
* neon optimized q8k multiplication. (#1021)Laurent Mazare2023-10-022-3/+36
* Add the q8k vec-dot multiplication. (#1019)Laurent Mazare2023-10-021-2/+18
* Improve the quantized whisper setup. (#1018)Laurent Mazare2023-10-021-10/+19
* Improve the testing of the optimized quantized vec-dot ops (#1016)Laurent Mazare2023-10-021-2/+60
* Simd128 version of q6k vec-dot. (#1015)Laurent Mazare2023-10-012-1/+127
* Simd128 version of the q2k-q8k vecdot product. (#1011)Laurent Mazare2023-09-302-45/+75
* Simd128 q2k vecdot (#982)Laurent Mazare2023-09-282-4/+57
* Sketch a simd128 optimized q4k vecdot. (#977)Laurent Mazare2023-09-272-1/+97
* Simd128 vec-dot for q4_0. (#974)Laurent Mazare2023-09-272-1/+54
* simd128 optimized q8_0 vecdot (#972)Laurent Mazare2023-09-273-0/+54
* Use the gelu-erf activation. (#969)Laurent Mazare2023-09-261-3/+3
* Avoid some overflows on wasm32. (#968)Laurent Mazare2023-09-262-3/+14
* Add a quantized version of the t5 model. (#921)Laurent Mazare2023-09-211-1/+1
* Fix some errors about BlockQ8_1 (#776)zmlcc2023-09-081-3/+5
* Add `ggufv2` support (#725)Lukas Kreussel2023-09-031-21/+97
* Support for quantized tensors in the python api. (#706)Laurent Mazare2023-09-011-3/+11
* Small cleanups (avoid some possible mutations) (#670)Laurent Mazare2023-08-301-99/+59
* Neon optimized vecdot (#666)Laurent Mazare2023-08-292-8/+369
* Add `avx` implemenetations of `q2k`, `q3k` and `q5k` vec-dot functions (#654)Lukas Kreussel2023-08-292-8/+403
* AVX version of the q4k vecdot. (#651)Laurent Mazare2023-08-292-9/+120
* Neon optimized version of the q4k vecdot product. (#632)Laurent Mazare2023-08-272-1/+99
* Llama quantization. (#625)Laurent Mazare2023-08-271-0/+4
* Add the quantize command. (#624)Laurent Mazare2023-08-271-1/+2
* Fix for q5_1 quantization. (#617)Laurent Mazare2023-08-271-1/+1
* Quantization tests + fix some issues. (#616)Laurent Mazare2023-08-271-6/+6
* More missing quantized bits. (#615)Laurent Mazare2023-08-271-7/+94
* Missing quants ops (#611)Laurent Mazare2023-08-261-13/+123
* Another transmute tweak. (#610)Laurent Mazare2023-08-261-20/+19
* Avoid using tmp values. (#609)Laurent Mazare2023-08-261-20/+8
* Add reference implementation for `q4k` and `q5k` (#586)Lukas Kreussel2023-08-261-4/+177
* Avoid some transmutes. (#607)Laurent Mazare2023-08-251-10/+5
* Neon intrinsics for the q8_0 vecdot. (#604)Laurent Mazare2023-08-252-0/+64
* AVX version for the q8-0 multiplications. (#598)Laurent Mazare2023-08-252-1/+23
* Generic implementation of vecdot for q80. (#596)Laurent Mazare2023-08-251-2/+18
* Add a function to write gguf files. (#585)Laurent Mazare2023-08-242-4/+163
* Referenze implementations of `q2k` and `q3k` vec-dot functions (#580)Lukas Kreussel2023-08-241-7/+179
* GGUF support in the quantized model. (#559)Laurent Mazare2023-08-231-2/+88
* Handle GGUF files in tensor-tools. (#558)Laurent Mazare2023-08-231-2/+10
* Preliminary GGUF support. (#557)Laurent Mazare2023-08-232-0/+221
* Avoid some mutable variables (take 2). (#554)Laurent Mazare2023-08-222-37/+29