forks/candle.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Flash-Attn upgrade / SoftCap Candle-FlashAttn [1/n] (#2688)	Michael Feil	2024-12-31	1	-6/+9
\| \| \| \| \| \| \| \| \|	* update flash-attn v1 * restore: hdim224 * add 224 flash_fwd_template * remove whitespace
*	Update the flash attn kernels. (#2333)	Laurent Mazare	2024-07-15	1	-161/+263
\|
*	Use flash-attn in gemma. (#2195)	Laurent Mazare	2024-05-18	1	-0/+4
\| \| \| \| \|	* Use flash-attn in gemma. * Fix flash-attn for head dim 256.
*	chore: update flash attention kernels (#1518)	OlivierDehaene	2024-01-05	1	-30/+33
\| \| \| \| \| \| \| \| \| \| \|	* chore: update flash attention kernels * fmt * remove unused kernels * force f32 * correct stride
*	Add flash attention (#241)	Laurent Mazare	2023-07-26	1	-0/+251
	* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.