summaryrefslogtreecommitdiff
path: root/candle-examples/examples/llama
Commit message (Collapse)AuthorAgeFilesLines
* Add the SmolLM2 models. (#2595)Laurent Mazare2024-11-031-14/+43
| | | | | * Add the SmolLM2 models. * More SmolLM2 support.
* Fix the repo name for llama 3.1. (#2576)Laurent Mazare2024-10-261-2/+2
| | | | | * Fix the repo name for llama 3.1. * Fix the book.
* Add some llama-3.2 examples. (#2508)Laurent Mazare2024-09-261-1/+13
| | | | | * Add some llama-3.2 examples. * Support tie-word-embeddings for llama.
* Add support for Llama 3.1 (#2359)Eric Buehler2024-07-261-6/+24
| | | | | | | | | | | | | | | | | * Add Llama 3.1 rope * Clippy * Format * Clippy * Add support for multiple eos tokens: * Untagged either * Remove either dep and fix settings.json * Make the max positional embeddings configurable
* Support top-k in tthe llama example. (#2150)Laurent Mazare2024-05-011-3/+21
|
* Better time measurement for the llama example. (#2106)Laurent Mazare2024-04-221-2/+5
|
* Use llama v3 by default + add to readme. (#2094)Laurent Mazare2024-04-201-1/+1
|
* Also enable llama-v3 8b instruct. (#2088)Laurent Mazare2024-04-191-1/+3
|
* Llama v3. (#2085)Laurent Mazare2024-04-181-9/+13
| | | | | | | * Llama v3. * Tweak the default params + handle special tokens. * Small tweak.
* Make the cache for the llama model explicit too. (#1745)Laurent Mazare2024-02-221-3/+3
|
* Use the tokenizer-output-stream in the llama example. (#1715)Laurent Mazare2024-02-151-11/+9
| | | | | * Use the tokenizer-output-stream in the llama example. * Also use tokenizer-output-stream for llama2-c.
* fix index_pos bug when kv cache is disabled. (#1517)optman2024-01-061-4/+4
| | | | | | | | | * fix index_pos bug when kv cache is disabled * Tweak the fix. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Add support for tiny-llama-1.1b. (#1512)Laurent Mazare2023-12-311-2/+9
|
* Rework the llama example config, add the solar model. (#1485)Laurent Mazare2023-12-261-72/+36
|
* Adapt more examples to the updated safetensor api. (#947)Laurent Mazare2023-09-231-9/+1
| | | | | | | | | * Simplify the safetensor usage. * Convert more examples. * Move more examples. * Adapt stable-diffusion.
* Implement top_p / nucleus sampling (#819)Juarez Bochi2023-09-121-1/+5
| | | | | | | | | | | | | * Implement top_p / nucleus sampling * Update changelog * rustfmt * Add tests * Fix clippy warning * Fix another clippy error
* Move some models to candle-transformers so that it's easier to re-use. (#794)Laurent Mazare2023-09-102-448/+1
| | | | | | | | | * Move some models to candle-transformers so that they can be shared. * Also move falcon. * Move Llama. * Move whisper (partial).
* Add some optional repeat penalty. (#623)Laurent Mazare2023-08-271-0/+18
| | | | | * Add some optional repeat penalty. * Add the missing files.
* s/panic/bail/Nicolas Patry2023-08-251-2/+2
|
* Adding support for codellama in examples.Nicolas Patry2023-08-252-6/+26
| | | | | | Codellama requires bf16 for now (error to convert from bf16 to f16). Multiprocess demo not functional for it because flash-attn only supports f16 for now.
* GQA support in the quantized model. (#555)Laurent Mazare2023-08-221-1/+1
| | | | | | | | | * GQA support in the quantized model. * Fix the reshaping. * Fix the main llama model. * Infer the proper gqa from the model kind.
* Add a simple Module trait and implement it for the various nn layers (#500)Laurent Mazare2023-08-181-1/+1
| | | | | | | * Start adding the module trait. * Use the module trait. * Implement module for qmatmul.
* Add an abstract type for RmsNorm. (#499)Laurent Mazare2023-08-181-1/+1
|
* Layer norm tweaks (#482)Laurent Mazare2023-08-171-19/+4
| | | | | | | * Add some options to make layer-norm more configurable. * Add the rms-norm variant. * Replace the RmsNorm with the shared bits.
* Add some tracing to the quantized example. (#473)Laurent Mazare2023-08-161-1/+0
|
* Fixing llamav1Nicolas Patry2023-08-161-2/+2
|
* Get the ggml based llama to generate some text. (#464)Laurent Mazare2023-08-161-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | * Add more stats to the ggml example. * Build a quantized model from the file content. * Move the tensor retrieval in the main crate. * Start adding the forward pass. * Add more to the forward pass of the quantized llama. * Apply the attention layers. * Add the sampling loop. * Get the sampling loop to work. * Minor tweak. * Add a quantize/dequantize test. * Bugfix. * Add a comment + swap the order. * Bugfixes.
* Clippy.Nicolas Patry2023-08-161-5/+5
|
* Using the real config from the hub when available.Nicolas Patry2023-08-162-43/+75
|
* Tweak the llama example. (#450)Laurent Mazare2023-08-151-63/+14
|
* Support local weights & dynamic outputs (#447)Guoqing Bao2023-08-151-15/+39
| | | | | | | * Support local weights & dynamic outputs * Revise as suggested * Cargo code format
* Add a cuda kernel for upsampling. (#441)Laurent Mazare2023-08-141-2/+2
| | | | | * Add a cuda kernel for upsampling. * Update for the latest tokenizers version.
* Remove the checkpoint conversion script. (#405)Laurent Mazare2023-08-112-202/+0
| | | | | * Remove the checkpoint conversion script. * Remove references to the script.
* Support the Accelerate BLAS on macOS. (#325)Laurent Mazare2023-08-051-0/+3
| | | | | * Add the accelerate feature. * Ffi tweaks.
* Add some tracing to llama. (#318)Laurent Mazare2023-08-032-4/+53
|
* Use u8 tensors for masks. (#273)Laurent Mazare2023-07-291-2/+1
|
* Support both llama v1 and llama v2. (#272)Laurent Mazare2023-07-282-2/+20
|
* Line-up the llama implementation with the python-transformers one. (#271)Laurent Mazare2023-07-281-43/+28
| | | | | * Line-up the llama implementation with the python-transformers one. * Also lineup the multiprocess version.
* Softmax numerical stability. (#267)Laurent Mazare2023-07-281-1/+1
| | | | | * Softmax numerical stability. * Fix the flash-attn test.
* Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo aroundNicolas Patry2023-07-271-4/+4
| | | | all the time)
* Switch to using llama-v2 by default. (#251)Laurent Mazare2023-07-261-4/+4
|
* Lining up the flash attn version with the non-flash one. (#248)Laurent Mazare2023-07-261-11/+10
| | | | | * Move the flash-attn function in the proper crate. * Causality tweak.
* Again set a few extra params in flash-attn. (#245)Laurent Mazare2023-07-261-1/+5
| | | | | | | | | | | | | | | | | * Again set a few extra params. * Use the appropriate kernel sizes. * Add all the kernel sizes. * Parallel compiling. * Reduce the amount of parallelism. * Add the missing kernel. * Fix a typo. * Remove bf16 support for now.
* Proper flash-attn parameters. (#244)Laurent Mazare2023-07-261-4/+12
| | | | | | | | | | | | | * Proper flash-attn parameters. * Set the flash attention parameters. * Add more validations. * Setup the o_ flash attn parameters. * More flash-attn support. * Set more flash attn parameters.
* Better handling of dtypes in llama. (#243)Laurent Mazare2023-07-262-13/+12
|
* Add flash attention (#241)Laurent Mazare2023-07-262-8/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.
* Rename the .r functions to .dims so as to be a bit more explicit. (#220)Laurent Mazare2023-07-221-6/+6
|
* Support for MQA for llama v2. (#205)Laurent Mazare2023-07-202-109/+122
| | | | | | | | | | | * Support for MQA for llama v2. * More llama-v2. * Move the rotary embedding precomputation in the cache. * Add a v2 flag. * Use the hf model.
* Removing `candle-hub` internal to extract into `hf-hub` standalone.Nicolas Patry2023-07-191-1/+1
|
* Add some 'cuda-if-available' helper function. (#172)Laurent Mazare2023-07-151-14/+1
|