summaryrefslogtreecommitdiff
path: root/candle-examples/examples/quantized
Commit message (Expand)AuthorAgeFilesLines
* Add the SmolLM2 models. (#2595)Laurent Mazare2024-11-031-1/+24
* Force the revision for the phi3-llama quantized models. (#2159)Laurent Mazare2024-05-041-2/+11
* Add a toggle for F16/BF16 accumulation in gemm. (#2141)Laurent Mazare2024-04-291-0/+3
* Add the phi-v3 quantized model. (#2118)Laurent Mazare2024-04-241-24/+35
* Add support for llama3 on the quantized example (#2086)Thomas Santerre2024-04-181-8/+23
* Include topk sampling in the quantized example. (#2005)Laurent Mazare2024-04-041-7/+19
* Switch the default to using the faster kernels. (#1978)Laurent Mazare2024-04-011-3/+3
* More ggml cuda kernels (#1977)Laurent Mazare2024-04-011-0/+8
* Add a flag to force running the quantized model on CPUs. (#1778)Laurent Mazare2024-02-281-1/+5
* Add an option to split the prompt. (#1766)Laurent Mazare2024-02-271-1/+14
* Quantized GGUF style (#1523)Nicolas Patry2024-01-171-7/+9
* Support mistral instruct v0.2. (#1475)Laurent Mazare2023-12-231-4/+15
* Mixtral quantized instruct. (#1447)Laurent Mazare2023-12-161-0/+11
* Update the readme to mention mixtral. (#1443)Laurent Mazare2023-12-151-0/+13
* Quantized mixtral model (#1442)Laurent Mazare2023-12-151-1/+12
* Add the leo models to the quantized examples. (#1398)Laurent Mazare2023-12-031-31/+46
* Add quantized Starling, fix open-chat prompt (#1393)Lucas de Ávila Martins2023-12-021-6/+36
* Fix OpenChat 3.5 tokenizer (#1347)Lucas de Ávila Martins2023-11-191-1/+3
* Add OpenChat 3.5 to quantized examples (#1346)Lucas de Ávila Martins2023-11-191-7/+39
* Fix quantized zephyr chat prompt (#1314) (#1317)Michael Leandersson2023-11-111-2/+7
* Quantized model small tweaks (#1290)Laurent Mazare2023-11-071-39/+54
* Adds check for 7b-zephyr and uses correct template (#1283)DTJ112352023-11-061-3/+6
* Add support for Zephyr-7b in the quantized model. (#1124)Laurent Mazare2023-10-181-2/+12
* Fix the prompt for mistral when using instruct/interactive mode. (#1013)Laurent Mazare2023-10-011-12/+31
* Integrate TheBloke quantized mistral weights. (#1012)Laurent Mazare2023-09-301-2/+26
* Add a gif to the quantized readme. (#833)Laurent Mazare2023-09-132-0/+2
* Add more example readmes. (#828)Laurent Mazare2023-09-121-1/+1
* Implement top_p / nucleus sampling (#819)Juarez Bochi2023-09-121-1/+5
* Add a small readme for the quantized example. (#823)Laurent Mazare2023-09-121-0/+35
* Move more models to candle-transformers (#796)Laurent Mazare2023-09-102-372/+1
* Tweak some quantized args (#692)Laurent Mazare2023-08-311-5/+14
* Interactive mode for the quantized model. (#690)Laurent Mazare2023-08-312-55/+109
* Neon optimized vecdot (#666)Laurent Mazare2023-08-292-364/+371
* Remove some dead-code annotations. (#629)Laurent Mazare2023-08-271-11/+0
* Add some optional repeat penalty. (#623)Laurent Mazare2023-08-271-17/+5
* Generic implementation of vecdot for q80. (#596)Laurent Mazare2023-08-251-5/+23
* Get the rms epsilon from GGUF. (#565)Laurent Mazare2023-08-231-8/+10
* Fix the quantized example. (#564)Laurent Mazare2023-08-231-2/+2
* add chat models in quantized example (#551)cksac2023-08-231-0/+18
* GGUF support in the quantized model. (#559)Laurent Mazare2023-08-231-45/+143
* GQA support in the quantized model. (#555)Laurent Mazare2023-08-221-5/+31
* Add some llama-v2 variants. (#545)Laurent Mazare2023-08-221-3/+22
* Add some optional repeat penalty. (#535)Laurent Mazare2023-08-211-0/+33
* Add a yolo-v3 example. (#528)Laurent Mazare2023-08-201-0/+6
* Line up the llama.cpp implementation with the candle one. (#518)Laurent Mazare2023-08-191-40/+78
* Add a simple Module trait and implement it for the various nn layers (#500)Laurent Mazare2023-08-181-1/+1
* Q6K quantization (#495)Laurent Mazare2023-08-171-0/+8
* Add the whisper small model. (#490)Laurent Mazare2023-08-171-1/+1
* Add a verbose-prompt mode, similar to llama.cpp. (#489)Laurent Mazare2023-08-171-5/+13
* Layer norm tweaks (#482)Laurent Mazare2023-08-171-18/+4