summaryrefslogtreecommitdiff
path: root/candle-flash-attn/kernels/softmax.h
Commit message (Collapse)AuthorAgeFilesLines
* Update the flash attn kernels. (#2333)Laurent Mazare2024-07-151-166/+71
|
* chore: update flash attention kernels (#1518)OlivierDehaene2024-01-051-23/+34
| | | | | | | | | | | * chore: update flash attention kernels * fmt * remove unused kernels * force f32 * correct stride
* Add flash attention (#241)Laurent Mazare2023-07-261-0/+272
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.