summaryrefslogtreecommitdiff
path: root/candle-examples/examples/llama
Commit message (Expand)AuthorAgeFilesLines
* Add the SmolLM2 models. (#2595)Laurent Mazare2024-11-031-14/+43
* Fix the repo name for llama 3.1. (#2576)Laurent Mazare2024-10-261-2/+2
* Add some llama-3.2 examples. (#2508)Laurent Mazare2024-09-261-1/+13
* Add support for Llama 3.1 (#2359)Eric Buehler2024-07-261-6/+24
* Support top-k in tthe llama example. (#2150)Laurent Mazare2024-05-011-3/+21
* Better time measurement for the llama example. (#2106)Laurent Mazare2024-04-221-2/+5
* Use llama v3 by default + add to readme. (#2094)Laurent Mazare2024-04-201-1/+1
* Also enable llama-v3 8b instruct. (#2088)Laurent Mazare2024-04-191-1/+3
* Llama v3. (#2085)Laurent Mazare2024-04-181-9/+13
* Make the cache for the llama model explicit too. (#1745)Laurent Mazare2024-02-221-3/+3
* Use the tokenizer-output-stream in the llama example. (#1715)Laurent Mazare2024-02-151-11/+9
* fix index_pos bug when kv cache is disabled. (#1517)optman2024-01-061-4/+4
* Add support for tiny-llama-1.1b. (#1512)Laurent Mazare2023-12-311-2/+9
* Rework the llama example config, add the solar model. (#1485)Laurent Mazare2023-12-261-72/+36
* Adapt more examples to the updated safetensor api. (#947)Laurent Mazare2023-09-231-9/+1
* Implement top_p / nucleus sampling (#819)Juarez Bochi2023-09-121-1/+5
* Move some models to candle-transformers so that it's easier to re-use. (#794)Laurent Mazare2023-09-102-448/+1
* Add some optional repeat penalty. (#623)Laurent Mazare2023-08-271-0/+18
* s/panic/bail/Nicolas Patry2023-08-251-2/+2
* Adding support for codellama in examples.Nicolas Patry2023-08-252-6/+26
* GQA support in the quantized model. (#555)Laurent Mazare2023-08-221-1/+1
* Add a simple Module trait and implement it for the various nn layers (#500)Laurent Mazare2023-08-181-1/+1
* Add an abstract type for RmsNorm. (#499)Laurent Mazare2023-08-181-1/+1
* Layer norm tweaks (#482)Laurent Mazare2023-08-171-19/+4
* Add some tracing to the quantized example. (#473)Laurent Mazare2023-08-161-1/+0
* Fixing llamav1Nicolas Patry2023-08-161-2/+2
* Get the ggml based llama to generate some text. (#464)Laurent Mazare2023-08-161-5/+1
* Clippy.Nicolas Patry2023-08-161-5/+5
* Using the real config from the hub when available.Nicolas Patry2023-08-162-43/+75
* Tweak the llama example. (#450)Laurent Mazare2023-08-151-63/+14
* Support local weights & dynamic outputs (#447)Guoqing Bao2023-08-151-15/+39
* Add a cuda kernel for upsampling. (#441)Laurent Mazare2023-08-141-2/+2
* Remove the checkpoint conversion script. (#405)Laurent Mazare2023-08-112-202/+0
* Support the Accelerate BLAS on macOS. (#325)Laurent Mazare2023-08-051-0/+3
* Add some tracing to llama. (#318)Laurent Mazare2023-08-032-4/+53
* Use u8 tensors for masks. (#273)Laurent Mazare2023-07-291-2/+1
* Support both llama v1 and llama v2. (#272)Laurent Mazare2023-07-282-2/+20
* Line-up the llama implementation with the python-transformers one. (#271)Laurent Mazare2023-07-281-43/+28
* Softmax numerical stability. (#267)Laurent Mazare2023-07-281-1/+1
* Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo aroundNicolas Patry2023-07-271-4/+4
* Switch to using llama-v2 by default. (#251)Laurent Mazare2023-07-261-4/+4
* Lining up the flash attn version with the non-flash one. (#248)Laurent Mazare2023-07-261-11/+10
* Again set a few extra params in flash-attn. (#245)Laurent Mazare2023-07-261-1/+5
* Proper flash-attn parameters. (#244)Laurent Mazare2023-07-261-4/+12
* Better handling of dtypes in llama. (#243)Laurent Mazare2023-07-262-13/+12
* Add flash attention (#241)Laurent Mazare2023-07-262-8/+30
* Rename the .r functions to .dims so as to be a bit more explicit. (#220)Laurent Mazare2023-07-221-6/+6
* Support for MQA for llama v2. (#205)Laurent Mazare2023-07-202-109/+122
* Removing `candle-hub` internal to extract into `hf-hub` standalone.Nicolas Patry2023-07-191-1/+1
* Add some 'cuda-if-available' helper function. (#172)Laurent Mazare2023-07-151-14/+1