Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Flash-Attn upgrade / SoftCap Candle-FlashAttn [1/n] (#2688) | Michael Feil | 2024-12-31 | 1 | -9/+4 |
| | | | | | | | | | * update flash-attn v1 * restore: hdim224 * add 224 flash_fwd_template * remove whitespace | ||||
* | Update the flash attn kernels. (#2333) | Laurent Mazare | 2024-07-15 | 1 | -6/+29 |
| | |||||
* | chore: update flash attention kernels (#1518) | OlivierDehaene | 2024-01-05 | 1 | -12/+42 |
| | | | | | | | | | | | * chore: update flash attention kernels * fmt * remove unused kernels * force f32 * correct stride | ||||
* | Add flash attention (#241) | Laurent Mazare | 2023-07-26 | 1 | -0/+141 |
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama. |