summaryrefslogtreecommitdiff
path: root/candle-flash-attn
Commit message (Collapse)AuthorAgeFilesLines
...
* Add some flash attn test (#253)Laurent Mazare2023-07-264-12/+123
| | | | | | | | | * Add some flash-attn test. * Add the cpu test. * Fail when the head is not a multiple of 8. * Polish the flash attention test.
* Use bail rather than wrapping a string where possible. (#249)Laurent Mazare2023-07-261-2/+2
| | | | | * Use bail rather than wrapping a string where possible. * Revert the cuda default bit.
* Lining up the flash attn version with the non-flash one. (#248)Laurent Mazare2023-07-261-1/+18
| | | | | * Move the flash-attn function in the proper crate. * Causality tweak.
* Again set a few extra params in flash-attn. (#245)Laurent Mazare2023-07-2620-115/+471
| | | | | | | | | | | | | | | | | * Again set a few extra params. * Use the appropriate kernel sizes. * Add all the kernel sizes. * Parallel compiling. * Reduce the amount of parallelism. * Add the missing kernel. * Fix a typo. * Remove bf16 support for now.
* Proper flash-attn parameters. (#244)Laurent Mazare2023-07-263-8/+122
| | | | | | | | | | | | | * Proper flash-attn parameters. * Set the flash attention parameters. * Add more validations. * Setup the o_ flash attn parameters. * More flash-attn support. * Set more flash attn parameters.
* Specific cache dir for the flash attn build artifacts. (#242)Laurent Mazare2023-07-261-10/+10
|
* Add flash attention (#241)Laurent Mazare2023-07-2615-0/+2655
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.