forks/candle.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add the SmolLM2 models. (#2595)	Laurent Mazare	2024-11-03	1	-14/+43
\| \| \| \| \|	* Add the SmolLM2 models. * More SmolLM2 support.
*	Fix the repo name for llama 3.1. (#2576)	Laurent Mazare	2024-10-26	1	-2/+2
\| \| \| \| \|	* Fix the repo name for llama 3.1. * Fix the book.
*	Add some llama-3.2 examples. (#2508)	Laurent Mazare	2024-09-26	1	-1/+13
\| \| \| \| \|	* Add some llama-3.2 examples. * Support tie-word-embeddings for llama.
*	Add support for Llama 3.1 (#2359)	Eric Buehler	2024-07-26	1	-6/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add Llama 3.1 rope * Clippy * Format * Clippy * Add support for multiple eos tokens: * Untagged either * Remove either dep and fix settings.json * Make the max positional embeddings configurable
*	Support top-k in tthe llama example. (#2150)	Laurent Mazare	2024-05-01	1	-3/+21
\|
*	Better time measurement for the llama example. (#2106)	Laurent Mazare	2024-04-22	1	-2/+5
\|
*	Use llama v3 by default + add to readme. (#2094)	Laurent Mazare	2024-04-20	1	-1/+1
\|
*	Also enable llama-v3 8b instruct. (#2088)	Laurent Mazare	2024-04-19	1	-1/+3
\|
*	Llama v3. (#2085)	Laurent Mazare	2024-04-18	1	-9/+13
\| \| \| \| \| \| \|	* Llama v3. * Tweak the default params + handle special tokens. * Small tweak.
*	Make the cache for the llama model explicit too. (#1745)	Laurent Mazare	2024-02-22	1	-3/+3
\|
*	Use the tokenizer-output-stream in the llama example. (#1715)	Laurent Mazare	2024-02-15	1	-11/+9
\| \| \| \| \|	* Use the tokenizer-output-stream in the llama example. * Also use tokenizer-output-stream for llama2-c.
*	fix index_pos bug when kv cache is disabled. (#1517)	optman	2024-01-06	1	-4/+4
\| \| \| \| \| \| \| \| \|	* fix index_pos bug when kv cache is disabled * Tweak the fix. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
*	Add support for tiny-llama-1.1b. (#1512)	Laurent Mazare	2023-12-31	1	-2/+9
\|
*	Rework the llama example config, add the solar model. (#1485)	Laurent Mazare	2023-12-26	1	-72/+36
\|
*	Adapt more examples to the updated safetensor api. (#947)	Laurent Mazare	2023-09-23	1	-9/+1
\| \| \| \| \| \| \| \| \|	* Simplify the safetensor usage. * Convert more examples. * Move more examples. * Adapt stable-diffusion.
*	Implement top_p / nucleus sampling (#819)	Juarez Bochi	2023-09-12	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Implement top_p / nucleus sampling * Update changelog * rustfmt * Add tests * Fix clippy warning * Fix another clippy error
*	Move some models to candle-transformers so that it's easier to re-use. (#794)	Laurent Mazare	2023-09-10	2	-448/+1
\| \| \| \| \| \| \| \| \|	* Move some models to candle-transformers so that they can be shared. * Also move falcon. * Move Llama. * Move whisper (partial).
*	Add some optional repeat penalty. (#623)	Laurent Mazare	2023-08-27	1	-0/+18
\| \| \| \| \|	* Add some optional repeat penalty. * Add the missing files.
*	s/panic/bail/	Nicolas Patry	2023-08-25	1	-2/+2
\|
*	Adding support for codellama in examples.	Nicolas Patry	2023-08-25	2	-6/+26
\| \| \| \| \| \|	Codellama requires bf16 for now (error to convert from bf16 to f16). Multiprocess demo not functional for it because flash-attn only supports f16 for now.
*	GQA support in the quantized model. (#555)	Laurent Mazare	2023-08-22	1	-1/+1
\| \| \| \| \| \| \| \| \|	* GQA support in the quantized model. * Fix the reshaping. * Fix the main llama model. * Infer the proper gqa from the model kind.
*	Add a simple Module trait and implement it for the various nn layers (#500)	Laurent Mazare	2023-08-18	1	-1/+1
\| \| \| \| \| \| \|	* Start adding the module trait. * Use the module trait. * Implement module for qmatmul.
*	Add an abstract type for RmsNorm. (#499)	Laurent Mazare	2023-08-18	1	-1/+1
\|
*	Layer norm tweaks (#482)	Laurent Mazare	2023-08-17	1	-19/+4
\| \| \| \| \| \| \|	* Add some options to make layer-norm more configurable. * Add the rms-norm variant. * Replace the RmsNorm with the shared bits.
*	Add some tracing to the quantized example. (#473)	Laurent Mazare	2023-08-16	1	-1/+0
\|
*	Fixing llamav1	Nicolas Patry	2023-08-16	1	-2/+2
\|
*	Get the ggml based llama to generate some text. (#464)	Laurent Mazare	2023-08-16	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add more stats to the ggml example. * Build a quantized model from the file content. * Move the tensor retrieval in the main crate. * Start adding the forward pass. * Add more to the forward pass of the quantized llama. * Apply the attention layers. * Add the sampling loop. * Get the sampling loop to work. * Minor tweak. * Add a quantize/dequantize test. * Bugfix. * Add a comment + swap the order. * Bugfixes.
*	Clippy.	Nicolas Patry	2023-08-16	1	-5/+5
\|
*	Using the real config from the hub when available.	Nicolas Patry	2023-08-16	2	-43/+75
\|
*	Tweak the llama example. (#450)	Laurent Mazare	2023-08-15	1	-63/+14
\|
*	Support local weights & dynamic outputs (#447)	Guoqing Bao	2023-08-15	1	-15/+39
\| \| \| \| \| \| \|	* Support local weights & dynamic outputs * Revise as suggested * Cargo code format
*	Add a cuda kernel for upsampling. (#441)	Laurent Mazare	2023-08-14	1	-2/+2
\| \| \| \| \|	* Add a cuda kernel for upsampling. * Update for the latest tokenizers version.
*	Remove the checkpoint conversion script. (#405)	Laurent Mazare	2023-08-11	2	-202/+0
\| \| \| \| \|	* Remove the checkpoint conversion script. * Remove references to the script.
*	Support the Accelerate BLAS on macOS. (#325)	Laurent Mazare	2023-08-05	1	-0/+3
\| \| \| \| \|	* Add the accelerate feature. * Ffi tweaks.
*	Add some tracing to llama. (#318)	Laurent Mazare	2023-08-03	2	-4/+53
\|
*	Use u8 tensors for masks. (#273)	Laurent Mazare	2023-07-29	1	-2/+1
\|
*	Support both llama v1 and llama v2. (#272)	Laurent Mazare	2023-07-28	2	-2/+20
\|
*	Line-up the llama implementation with the python-transformers one. (#271)	Laurent Mazare	2023-07-28	1	-43/+28
\| \| \| \| \|	* Line-up the llama implementation with the python-transformers one. * Also lineup the multiprocess version.
*	Softmax numerical stability. (#267)	Laurent Mazare	2023-07-28	1	-1/+1
\| \| \| \| \|	* Softmax numerical stability. * Fix the flash-attn test.
*	Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo around	Nicolas Patry	2023-07-27	1	-4/+4
\| \| \| \|	all the time)
*	Switch to using llama-v2 by default. (#251)	Laurent Mazare	2023-07-26	1	-4/+4
\|
*	Lining up the flash attn version with the non-flash one. (#248)	Laurent Mazare	2023-07-26	1	-11/+10
\| \| \| \| \|	* Move the flash-attn function in the proper crate. * Causality tweak.
*	Again set a few extra params in flash-attn. (#245)	Laurent Mazare	2023-07-26	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Again set a few extra params. * Use the appropriate kernel sizes. * Add all the kernel sizes. * Parallel compiling. * Reduce the amount of parallelism. * Add the missing kernel. * Fix a typo. * Remove bf16 support for now.
*	Proper flash-attn parameters. (#244)	Laurent Mazare	2023-07-26	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Proper flash-attn parameters. * Set the flash attention parameters. * Add more validations. * Setup the o_ flash attn parameters. * More flash-attn support. * Set more flash attn parameters.
*	Better handling of dtypes in llama. (#243)	Laurent Mazare	2023-07-26	2	-13/+12
\|
*	Add flash attention (#241)	Laurent Mazare	2023-07-26	2	-8/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.
*	Rename the .r functions to .dims so as to be a bit more explicit. (#220)	Laurent Mazare	2023-07-22	1	-6/+6
\|
*	Support for MQA for llama v2. (#205)	Laurent Mazare	2023-07-20	2	-109/+122
\| \| \| \| \| \| \| \| \| \| \|	* Support for MQA for llama v2. * More llama-v2. * Move the rotary embedding precomputation in the cache. * Add a v2 flag. * Use the hf model.
*	Removing `candle-hub` internal to extract into `hf-hub` standalone.	Nicolas Patry	2023-07-19	1	-1/+1
\|
*	Add some 'cuda-if-available' helper function. (#172)	Laurent Mazare	2023-07-15	1	-14/+1
\|