Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Add the SmolLM2 models. (#2595) | Laurent Mazare | 2024-11-03 | 1 | -14/+43 |
| | | | | | * Add the SmolLM2 models. * More SmolLM2 support. | ||||
* | Fix the repo name for llama 3.1. (#2576) | Laurent Mazare | 2024-10-26 | 1 | -2/+2 |
| | | | | | * Fix the repo name for llama 3.1. * Fix the book. | ||||
* | Add some llama-3.2 examples. (#2508) | Laurent Mazare | 2024-09-26 | 1 | -1/+13 |
| | | | | | * Add some llama-3.2 examples. * Support tie-word-embeddings for llama. | ||||
* | Add support for Llama 3.1 (#2359) | Eric Buehler | 2024-07-26 | 1 | -6/+24 |
| | | | | | | | | | | | | | | | | | * Add Llama 3.1 rope * Clippy * Format * Clippy * Add support for multiple eos tokens: * Untagged either * Remove either dep and fix settings.json * Make the max positional embeddings configurable | ||||
* | Support top-k in tthe llama example. (#2150) | Laurent Mazare | 2024-05-01 | 1 | -3/+21 |
| | |||||
* | Better time measurement for the llama example. (#2106) | Laurent Mazare | 2024-04-22 | 1 | -2/+5 |
| | |||||
* | Use llama v3 by default + add to readme. (#2094) | Laurent Mazare | 2024-04-20 | 1 | -1/+1 |
| | |||||
* | Also enable llama-v3 8b instruct. (#2088) | Laurent Mazare | 2024-04-19 | 1 | -1/+3 |
| | |||||
* | Llama v3. (#2085) | Laurent Mazare | 2024-04-18 | 1 | -9/+13 |
| | | | | | | | * Llama v3. * Tweak the default params + handle special tokens. * Small tweak. | ||||
* | Make the cache for the llama model explicit too. (#1745) | Laurent Mazare | 2024-02-22 | 1 | -3/+3 |
| | |||||
* | Use the tokenizer-output-stream in the llama example. (#1715) | Laurent Mazare | 2024-02-15 | 1 | -11/+9 |
| | | | | | * Use the tokenizer-output-stream in the llama example. * Also use tokenizer-output-stream for llama2-c. | ||||
* | fix index_pos bug when kv cache is disabled. (#1517) | optman | 2024-01-06 | 1 | -4/+4 |
| | | | | | | | | | * fix index_pos bug when kv cache is disabled * Tweak the fix. --------- Co-authored-by: laurent <laurent.mazare@gmail.com> | ||||
* | Add support for tiny-llama-1.1b. (#1512) | Laurent Mazare | 2023-12-31 | 1 | -2/+9 |
| | |||||
* | Rework the llama example config, add the solar model. (#1485) | Laurent Mazare | 2023-12-26 | 1 | -72/+36 |
| | |||||
* | Adapt more examples to the updated safetensor api. (#947) | Laurent Mazare | 2023-09-23 | 1 | -9/+1 |
| | | | | | | | | | * Simplify the safetensor usage. * Convert more examples. * Move more examples. * Adapt stable-diffusion. | ||||
* | Implement top_p / nucleus sampling (#819) | Juarez Bochi | 2023-09-12 | 1 | -1/+5 |
| | | | | | | | | | | | | | * Implement top_p / nucleus sampling * Update changelog * rustfmt * Add tests * Fix clippy warning * Fix another clippy error | ||||
* | Move some models to candle-transformers so that it's easier to re-use. (#794) | Laurent Mazare | 2023-09-10 | 1 | -2/+1 |
| | | | | | | | | | * Move some models to candle-transformers so that they can be shared. * Also move falcon. * Move Llama. * Move whisper (partial). | ||||
* | Add some optional repeat penalty. (#623) | Laurent Mazare | 2023-08-27 | 1 | -0/+18 |
| | | | | | * Add some optional repeat penalty. * Add the missing files. | ||||
* | s/panic/bail/ | Nicolas Patry | 2023-08-25 | 1 | -2/+2 |
| | |||||
* | Adding support for codellama in examples. | Nicolas Patry | 2023-08-25 | 1 | -5/+15 |
| | | | | | | Codellama requires bf16 for now (error to convert from bf16 to f16). Multiprocess demo not functional for it because flash-attn only supports f16 for now. | ||||
* | Add some tracing to the quantized example. (#473) | Laurent Mazare | 2023-08-16 | 1 | -1/+0 |
| | |||||
* | Using the real config from the hub when available. | Nicolas Patry | 2023-08-16 | 1 | -10/+18 |
| | |||||
* | Tweak the llama example. (#450) | Laurent Mazare | 2023-08-15 | 1 | -63/+14 |
| | |||||
* | Support local weights & dynamic outputs (#447) | Guoqing Bao | 2023-08-15 | 1 | -15/+39 |
| | | | | | | | * Support local weights & dynamic outputs * Revise as suggested * Cargo code format | ||||
* | Add a cuda kernel for upsampling. (#441) | Laurent Mazare | 2023-08-14 | 1 | -2/+2 |
| | | | | | * Add a cuda kernel for upsampling. * Update for the latest tokenizers version. | ||||
* | Remove the checkpoint conversion script. (#405) | Laurent Mazare | 2023-08-11 | 1 | -3/+0 |
| | | | | | * Remove the checkpoint conversion script. * Remove references to the script. | ||||
* | Support the Accelerate BLAS on macOS. (#325) | Laurent Mazare | 2023-08-05 | 1 | -0/+3 |
| | | | | | * Add the accelerate feature. * Ffi tweaks. | ||||
* | Add some tracing to llama. (#318) | Laurent Mazare | 2023-08-03 | 1 | -0/+14 |
| | |||||
* | Support both llama v1 and llama v2. (#272) | Laurent Mazare | 2023-07-28 | 1 | -1/+5 |
| | |||||
* | Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo around | Nicolas Patry | 2023-07-27 | 1 | -4/+4 |
| | | | | all the time) | ||||
* | Switch to using llama-v2 by default. (#251) | Laurent Mazare | 2023-07-26 | 1 | -4/+4 |
| | |||||
* | Better handling of dtypes in llama. (#243) | Laurent Mazare | 2023-07-26 | 1 | -1/+1 |
| | |||||
* | Add flash attention (#241) | Laurent Mazare | 2023-07-26 | 1 | -1/+4 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama. | ||||
* | Support for MQA for llama v2. (#205) | Laurent Mazare | 2023-07-20 | 1 | -29/+18 |
| | | | | | | | | | | | * Support for MQA for llama v2. * More llama-v2. * Move the rotary embedding precomputation in the cache. * Add a v2 flag. * Use the hf model. | ||||
* | Removing `candle-hub` internal to extract into `hf-hub` standalone. | Nicolas Patry | 2023-07-19 | 1 | -1/+1 |
| | |||||
* | Add some 'cuda-if-available' helper function. (#172) | Laurent Mazare | 2023-07-15 | 1 | -14/+1 |
| | |||||
* | Removing cuda default. | Nicolas Patry | 2023-07-14 | 1 | -1/+11 |
| | | | | | | | Seems very important for a lot of exploring users usually on laptop without GPUs. Adding more README instructions in a follow up. | ||||
* | Add a cli argument to easily switch the dtype. (#161) | Laurent Mazare | 2023-07-13 | 1 | -6/+7 |
| | |||||
* | Sketch the candle-transformers crate. (#147) | Laurent Mazare | 2023-07-12 | 1 | -17/+3 |
| | | | | | * Sketch the candle-transformers crate. * Format the empty files. | ||||
* | Use arange in the examples. (#146) | Laurent Mazare | 2023-07-12 | 1 | -4/+3 |
| | |||||
* | Add from_iter and arange, use it in the doctests. (#145) | Laurent Mazare | 2023-07-12 | 1 | -1/+0 |
| | |||||
* | Llama batch (#144) | Laurent Mazare | 2023-07-12 | 1 | -3/+2 |
| | | | | | * Add a batch dimension to llama. * Bugfixes. | ||||
* | Allow for lazy loading of npz files, use it in llama to reduce memory usage ↵ | Laurent Mazare | 2023-07-11 | 1 | -5/+1 |
| | | | | in the cpu version. (#141) | ||||
* | Resurrect the llama npy support. (#140) | Laurent Mazare | 2023-07-11 | 1 | -2/+8 |
| | |||||
* | Refactor the llama example to make it more in sync with the other ones. (#139) | Laurent Mazare | 2023-07-11 | 1 | -349/+19 |
| | | | | | | | | | * Refactor the llama example to make it more in sync with the other ones. * Make clippy happy. * Properly load the safetensor weights. * Get llama back to a working state for the safetensors case. | ||||
* | Add a KV cache to falcon. (#104) | Laurent Mazare | 2023-07-07 | 1 | -2/+1 |
| | |||||
* | Creating new sync Api for `candle-hub`. | Nicolas Patry | 2023-07-06 | 1 | -5/+4 |
| | | | | | | - `api::Api` -> `api::tokio::api` (And created new `api::sync::Api`). - Remove `tokio` from all our examples. - Using similar codebase for now instead of ureq (for simplicity). | ||||
* | MKL adjustments. (#87) | Laurent Mazare | 2023-07-06 | 1 | -0/+3 |
| | |||||
* | Add mkl support for matrix multiply. (#86) | Laurent Mazare | 2023-07-06 | 1 | -1/+4 |
| | | | | | | | | | | | | | * Fix some rebase issues. * Use mkl instead. * Use mkl in bert. * Add the optional mkl feature. * Conditional compilation based on the mkl feature. * Add more mkl support. | ||||
* | Support dim indexes in cat. | laurent | 2023-07-05 | 1 | -11/+10 |
| |