forks/candle.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add the SmolLM2 models. (#2595)	Laurent Mazare	2024-11-03	1	-14/+43
\| \| \| \| \|	* Add the SmolLM2 models. * More SmolLM2 support.
*	Fix the repo name for llama 3.1. (#2576)	Laurent Mazare	2024-10-26	1	-2/+2
\| \| \| \| \|	* Fix the repo name for llama 3.1. * Fix the book.
*	Add some llama-3.2 examples. (#2508)	Laurent Mazare	2024-09-26	1	-1/+13
\| \| \| \| \|	* Add some llama-3.2 examples. * Support tie-word-embeddings for llama.
*	Add support for Llama 3.1 (#2359)	Eric Buehler	2024-07-26	1	-6/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add Llama 3.1 rope * Clippy * Format * Clippy * Add support for multiple eos tokens: * Untagged either * Remove either dep and fix settings.json * Make the max positional embeddings configurable
*	Support top-k in tthe llama example. (#2150)	Laurent Mazare	2024-05-01	1	-3/+21
\|
*	Better time measurement for the llama example. (#2106)	Laurent Mazare	2024-04-22	1	-2/+5
\|
*	Use llama v3 by default + add to readme. (#2094)	Laurent Mazare	2024-04-20	1	-1/+1
\|
*	Also enable llama-v3 8b instruct. (#2088)	Laurent Mazare	2024-04-19	1	-1/+3
\|
*	Llama v3. (#2085)	Laurent Mazare	2024-04-18	1	-9/+13
\| \| \| \| \| \| \|	* Llama v3. * Tweak the default params + handle special tokens. * Small tweak.
*	Make the cache for the llama model explicit too. (#1745)	Laurent Mazare	2024-02-22	1	-3/+3
\|
*	Use the tokenizer-output-stream in the llama example. (#1715)	Laurent Mazare	2024-02-15	1	-11/+9
\| \| \| \| \|	* Use the tokenizer-output-stream in the llama example. * Also use tokenizer-output-stream for llama2-c.
*	fix index_pos bug when kv cache is disabled. (#1517)	optman	2024-01-06	1	-4/+4
\| \| \| \| \| \| \| \| \|	* fix index_pos bug when kv cache is disabled * Tweak the fix. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
*	Add support for tiny-llama-1.1b. (#1512)	Laurent Mazare	2023-12-31	1	-2/+9
\|
*	Rework the llama example config, add the solar model. (#1485)	Laurent Mazare	2023-12-26	1	-72/+36
\|
*	Adapt more examples to the updated safetensor api. (#947)	Laurent Mazare	2023-09-23	1	-9/+1
\| \| \| \| \| \| \| \| \|	* Simplify the safetensor usage. * Convert more examples. * Move more examples. * Adapt stable-diffusion.
*	Implement top_p / nucleus sampling (#819)	Juarez Bochi	2023-09-12	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Implement top_p / nucleus sampling * Update changelog * rustfmt * Add tests * Fix clippy warning * Fix another clippy error
*	Move some models to candle-transformers so that it's easier to re-use. (#794)	Laurent Mazare	2023-09-10	1	-2/+1
\| \| \| \| \| \| \| \| \|	* Move some models to candle-transformers so that they can be shared. * Also move falcon. * Move Llama. * Move whisper (partial).
*	Add some optional repeat penalty. (#623)	Laurent Mazare	2023-08-27	1	-0/+18
\| \| \| \| \|	* Add some optional repeat penalty. * Add the missing files.
*	s/panic/bail/	Nicolas Patry	2023-08-25	1	-2/+2
\|
*	Adding support for codellama in examples.	Nicolas Patry	2023-08-25	1	-5/+15
\| \| \| \| \| \|	Codellama requires bf16 for now (error to convert from bf16 to f16). Multiprocess demo not functional for it because flash-attn only supports f16 for now.
*	Add some tracing to the quantized example. (#473)	Laurent Mazare	2023-08-16	1	-1/+0
\|
*	Using the real config from the hub when available.	Nicolas Patry	2023-08-16	1	-10/+18
\|
*	Tweak the llama example. (#450)	Laurent Mazare	2023-08-15	1	-63/+14
\|
*	Support local weights & dynamic outputs (#447)	Guoqing Bao	2023-08-15	1	-15/+39
\| \| \| \| \| \| \|	* Support local weights & dynamic outputs * Revise as suggested * Cargo code format
*	Add a cuda kernel for upsampling. (#441)	Laurent Mazare	2023-08-14	1	-2/+2
\| \| \| \| \|	* Add a cuda kernel for upsampling. * Update for the latest tokenizers version.
*	Remove the checkpoint conversion script. (#405)	Laurent Mazare	2023-08-11	1	-3/+0
\| \| \| \| \|	* Remove the checkpoint conversion script. * Remove references to the script.
*	Support the Accelerate BLAS on macOS. (#325)	Laurent Mazare	2023-08-05	1	-0/+3
\| \| \| \| \|	* Add the accelerate feature. * Ffi tweaks.
*	Add some tracing to llama. (#318)	Laurent Mazare	2023-08-03	1	-0/+14
\|
*	Support both llama v1 and llama v2. (#272)	Laurent Mazare	2023-07-28	1	-1/+5
\|
*	Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo around	Nicolas Patry	2023-07-27	1	-4/+4
\| \| \| \|	all the time)
*	Switch to using llama-v2 by default. (#251)	Laurent Mazare	2023-07-26	1	-4/+4
\|
*	Better handling of dtypes in llama. (#243)	Laurent Mazare	2023-07-26	1	-1/+1
\|
*	Add flash attention (#241)	Laurent Mazare	2023-07-26	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.
*	Support for MQA for llama v2. (#205)	Laurent Mazare	2023-07-20	1	-29/+18
\| \| \| \| \| \| \| \| \| \| \|	* Support for MQA for llama v2. * More llama-v2. * Move the rotary embedding precomputation in the cache. * Add a v2 flag. * Use the hf model.
*	Removing `candle-hub` internal to extract into `hf-hub` standalone.	Nicolas Patry	2023-07-19	1	-1/+1
\|
*	Add some 'cuda-if-available' helper function. (#172)	Laurent Mazare	2023-07-15	1	-14/+1
\|
*	Removing cuda default.	Nicolas Patry	2023-07-14	1	-1/+11
\| \| \| \| \| \| \|	Seems very important for a lot of exploring users usually on laptop without GPUs. Adding more README instructions in a follow up.
*	Add a cli argument to easily switch the dtype. (#161)	Laurent Mazare	2023-07-13	1	-6/+7
\|
*	Sketch the candle-transformers crate. (#147)	Laurent Mazare	2023-07-12	1	-17/+3
\| \| \| \| \|	* Sketch the candle-transformers crate. * Format the empty files.
*	Use arange in the examples. (#146)	Laurent Mazare	2023-07-12	1	-4/+3
\|
*	Add from_iter and arange, use it in the doctests. (#145)	Laurent Mazare	2023-07-12	1	-1/+0
\|
*	Llama batch (#144)	Laurent Mazare	2023-07-12	1	-3/+2
\| \| \| \| \|	* Add a batch dimension to llama. * Bugfixes.
*	Allow for lazy loading of npz files, use it in llama to reduce memory usage ↵	Laurent Mazare	2023-07-11	1	-5/+1
\| \| \| \|	in the cpu version. (#141)
*	Resurrect the llama npy support. (#140)	Laurent Mazare	2023-07-11	1	-2/+8
\|
*	Refactor the llama example to make it more in sync with the other ones. (#139)	Laurent Mazare	2023-07-11	1	-349/+19
\| \| \| \| \| \| \| \| \|	* Refactor the llama example to make it more in sync with the other ones. * Make clippy happy. * Properly load the safetensor weights. * Get llama back to a working state for the safetensors case.
*	Add a KV cache to falcon. (#104)	Laurent Mazare	2023-07-07	1	-2/+1
\|
*	Creating new sync Api for `candle-hub`.	Nicolas Patry	2023-07-06	1	-5/+4
\| \| \| \| \| \|	- `api::Api` -> `api::tokio::api` (And created new `api::sync::Api`). - Remove `tokio` from all our examples. - Using similar codebase for now instead of ureq (for simplicity).
*	MKL adjustments. (#87)	Laurent Mazare	2023-07-06	1	-0/+3
\|
*	Add mkl support for matrix multiply. (#86)	Laurent Mazare	2023-07-06	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix some rebase issues. * Use mkl instead. * Use mkl in bert. * Add the optional mkl feature. * Conditional compilation based on the mkl feature. * Add more mkl support.
*	Support dim indexes in cat.	laurent	2023-07-05	1	-11/+10
\|