summaryrefslogtreecommitdiff
path: root/candle-examples/examples/llama/main.rs
Commit message (Collapse)AuthorAgeFilesLines
* Add the SmolLM2 models. (#2595)Laurent Mazare2024-11-031-14/+43
| | | | | * Add the SmolLM2 models. * More SmolLM2 support.
* Fix the repo name for llama 3.1. (#2576)Laurent Mazare2024-10-261-2/+2
| | | | | * Fix the repo name for llama 3.1. * Fix the book.
* Add some llama-3.2 examples. (#2508)Laurent Mazare2024-09-261-1/+13
| | | | | * Add some llama-3.2 examples. * Support tie-word-embeddings for llama.
* Add support for Llama 3.1 (#2359)Eric Buehler2024-07-261-6/+24
| | | | | | | | | | | | | | | | | * Add Llama 3.1 rope * Clippy * Format * Clippy * Add support for multiple eos tokens: * Untagged either * Remove either dep and fix settings.json * Make the max positional embeddings configurable
* Support top-k in tthe llama example. (#2150)Laurent Mazare2024-05-011-3/+21
|
* Better time measurement for the llama example. (#2106)Laurent Mazare2024-04-221-2/+5
|
* Use llama v3 by default + add to readme. (#2094)Laurent Mazare2024-04-201-1/+1
|
* Also enable llama-v3 8b instruct. (#2088)Laurent Mazare2024-04-191-1/+3
|
* Llama v3. (#2085)Laurent Mazare2024-04-181-9/+13
| | | | | | | * Llama v3. * Tweak the default params + handle special tokens. * Small tweak.
* Make the cache for the llama model explicit too. (#1745)Laurent Mazare2024-02-221-3/+3
|
* Use the tokenizer-output-stream in the llama example. (#1715)Laurent Mazare2024-02-151-11/+9
| | | | | * Use the tokenizer-output-stream in the llama example. * Also use tokenizer-output-stream for llama2-c.
* fix index_pos bug when kv cache is disabled. (#1517)optman2024-01-061-4/+4
| | | | | | | | | * fix index_pos bug when kv cache is disabled * Tweak the fix. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Add support for tiny-llama-1.1b. (#1512)Laurent Mazare2023-12-311-2/+9
|
* Rework the llama example config, add the solar model. (#1485)Laurent Mazare2023-12-261-72/+36
|
* Adapt more examples to the updated safetensor api. (#947)Laurent Mazare2023-09-231-9/+1
| | | | | | | | | * Simplify the safetensor usage. * Convert more examples. * Move more examples. * Adapt stable-diffusion.
* Implement top_p / nucleus sampling (#819)Juarez Bochi2023-09-121-1/+5
| | | | | | | | | | | | | * Implement top_p / nucleus sampling * Update changelog * rustfmt * Add tests * Fix clippy warning * Fix another clippy error
* Move some models to candle-transformers so that it's easier to re-use. (#794)Laurent Mazare2023-09-101-2/+1
| | | | | | | | | * Move some models to candle-transformers so that they can be shared. * Also move falcon. * Move Llama. * Move whisper (partial).
* Add some optional repeat penalty. (#623)Laurent Mazare2023-08-271-0/+18
| | | | | * Add some optional repeat penalty. * Add the missing files.
* s/panic/bail/Nicolas Patry2023-08-251-2/+2
|
* Adding support for codellama in examples.Nicolas Patry2023-08-251-5/+15
| | | | | | Codellama requires bf16 for now (error to convert from bf16 to f16). Multiprocess demo not functional for it because flash-attn only supports f16 for now.
* Add some tracing to the quantized example. (#473)Laurent Mazare2023-08-161-1/+0
|
* Using the real config from the hub when available.Nicolas Patry2023-08-161-10/+18
|
* Tweak the llama example. (#450)Laurent Mazare2023-08-151-63/+14
|
* Support local weights & dynamic outputs (#447)Guoqing Bao2023-08-151-15/+39
| | | | | | | * Support local weights & dynamic outputs * Revise as suggested * Cargo code format
* Add a cuda kernel for upsampling. (#441)Laurent Mazare2023-08-141-2/+2
| | | | | * Add a cuda kernel for upsampling. * Update for the latest tokenizers version.
* Remove the checkpoint conversion script. (#405)Laurent Mazare2023-08-111-3/+0
| | | | | * Remove the checkpoint conversion script. * Remove references to the script.
* Support the Accelerate BLAS on macOS. (#325)Laurent Mazare2023-08-051-0/+3
| | | | | * Add the accelerate feature. * Ffi tweaks.
* Add some tracing to llama. (#318)Laurent Mazare2023-08-031-0/+14
|
* Support both llama v1 and llama v2. (#272)Laurent Mazare2023-07-281-1/+5
|
* Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo aroundNicolas Patry2023-07-271-4/+4
| | | | all the time)
* Switch to using llama-v2 by default. (#251)Laurent Mazare2023-07-261-4/+4
|
* Better handling of dtypes in llama. (#243)Laurent Mazare2023-07-261-1/+1
|
* Add flash attention (#241)Laurent Mazare2023-07-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.
* Support for MQA for llama v2. (#205)Laurent Mazare2023-07-201-29/+18
| | | | | | | | | | | * Support for MQA for llama v2. * More llama-v2. * Move the rotary embedding precomputation in the cache. * Add a v2 flag. * Use the hf model.
* Removing `candle-hub` internal to extract into `hf-hub` standalone.Nicolas Patry2023-07-191-1/+1
|
* Add some 'cuda-if-available' helper function. (#172)Laurent Mazare2023-07-151-14/+1
|
* Removing cuda default.Nicolas Patry2023-07-141-1/+11
| | | | | | | Seems very important for a lot of exploring users usually on laptop without GPUs. Adding more README instructions in a follow up.
* Add a cli argument to easily switch the dtype. (#161)Laurent Mazare2023-07-131-6/+7
|
* Sketch the candle-transformers crate. (#147)Laurent Mazare2023-07-121-17/+3
| | | | | * Sketch the candle-transformers crate. * Format the empty files.
* Use arange in the examples. (#146)Laurent Mazare2023-07-121-4/+3
|
* Add from_iter and arange, use it in the doctests. (#145)Laurent Mazare2023-07-121-1/+0
|
* Llama batch (#144)Laurent Mazare2023-07-121-3/+2
| | | | | * Add a batch dimension to llama. * Bugfixes.
* Allow for lazy loading of npz files, use it in llama to reduce memory usage ↵Laurent Mazare2023-07-111-5/+1
| | | | in the cpu version. (#141)
* Resurrect the llama npy support. (#140)Laurent Mazare2023-07-111-2/+8
|
* Refactor the llama example to make it more in sync with the other ones. (#139)Laurent Mazare2023-07-111-349/+19
| | | | | | | | | * Refactor the llama example to make it more in sync with the other ones. * Make clippy happy. * Properly load the safetensor weights. * Get llama back to a working state for the safetensors case.
* Add a KV cache to falcon. (#104)Laurent Mazare2023-07-071-2/+1
|
* Creating new sync Api for `candle-hub`.Nicolas Patry2023-07-061-5/+4
| | | | | | - `api::Api` -> `api::tokio::api` (And created new `api::sync::Api`). - Remove `tokio` from all our examples. - Using similar codebase for now instead of ureq (for simplicity).
* MKL adjustments. (#87)Laurent Mazare2023-07-061-0/+3
|
* Add mkl support for matrix multiply. (#86)Laurent Mazare2023-07-061-1/+4
| | | | | | | | | | | | | * Fix some rebase issues. * Use mkl instead. * Use mkl in bert. * Add the optional mkl feature. * Conditional compilation based on the mkl feature. * Add more mkl support.
* Support dim indexes in cat.laurent2023-07-051-11/+10
|