summaryrefslogtreecommitdiff
path: root/candle-flash-attn
Commit message (Collapse)AuthorAgeFilesLines
* Bump the caret version to 0.8.2. (#2703)Laurent Mazare2025-01-071-2/+2
|
* Flash-Attn upgrade / SoftCap Candle-FlashAttn [3/n] (#2690)Michael Feil2024-12-313-4/+7
| | | | | | | | | | | | | | | * update flash-attn v1 * restore: hdim224 * add 224 flash_fwd_template * remove whitespace * softcap is working, including test and api. * make softcap test case better * unpadded lse added
* Flash-Attn upgrade / SoftCap Candle-FlashAttn [2/n] (#2689)Michael Feil2024-12-314-3/+182
| | | | | | | | | | | | | | | | | * update flash-attn v1 * restore: hdim224 * add 224 flash_fwd_template * remove whitespace * softcap is working, including test and api. * make softcap test case better --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Flash-Attn upgrade / SoftCap Candle-FlashAttn [1/n] (#2688)Michael Feil2024-12-3141-82/+139
| | | | | | | | | * update flash-attn v1 * restore: hdim224 * add 224 flash_fwd_template * remove whitespace
* Bump the crate version to 0.8.1. (#2662)Laurent Mazare2024-12-071-2/+2
|
* Bump the crate version to 0.8.0. (#2612)Laurent Mazare2024-11-121-2/+2
|
* Bump the crate version to 0.7.2. (#2517)Laurent Mazare2024-09-291-2/+2
|
* Move the candle version to 0.7.1. (#2495)Laurent Mazare2024-09-221-2/+2
|
* Bump the crate version. (#2491)Laurent Mazare2024-09-211-2/+2
|
* Bump the version to 0.6.1. (#2438)Laurent Mazare2024-08-221-2/+2
|
* Update the flash attn kernels. (#2333)Laurent Mazare2024-07-1551-899/+2274
|
* Bump the crate version. (#2248)Laurent Mazare2024-06-051-2/+2
|
* Use flash-attn in gemma. (#2195)Laurent Mazare2024-05-182-1/+7
| | | | | * Use flash-attn in gemma. * Fix flash-attn for head dim 256.
* Bump the version number to 0.5.1. (#2155)Laurent Mazare2024-05-031-2/+2
| | | | | | | * Bump the version number to 0.5.1. * Fix clippy lints for 1.78. * More clippy fixes.
* Bumping the version number to 0.5.0. (#2009)Laurent Mazare2024-04-041-2/+2
|
* Bump the crate versions to 0.4.2. (#1821)Laurent Mazare2024-03-081-2/+2
|
* Bump the version number to 0.4.1. (#1768)Laurent Mazare2024-02-271-2/+2
| | | | | * Fix the block size for some cuda kernels. * Bump the version number to 0.4.1.
* Bump the crate version to 0.4.0. (#1658)Laurent Mazare2024-02-041-2/+2
|
* Explicit version for packages that are not in the workspace. (#1642)Laurent Mazare2024-01-311-1/+1
|
* Moving to a proper build crate `bindgen_cuda`. (#1531)Nicolas Patry2024-01-072-241/+36
| | | | | * Moving to a proper build crate `bindgen_cuda`. * Fmt.
* Unpin more of the workplace relative dependencies. (#1535)Laurent Mazare2024-01-071-2/+2
|
* chore: update flash attention kernels (#1518)OlivierDehaene2024-01-0528-465/+1086
| | | | | | | | | | | * chore: update flash attention kernels * fmt * remove unused kernels * force f32 * correct stride
* Bump the crate version to 0.3.3. (#1490)Laurent Mazare2023-12-281-3/+3
|
* Bump the crate version to 0.3.2. (#1452)Laurent Mazare2023-12-171-3/+3
|
* Update for 0.3.1. (#1324)Laurent Mazare2023-11-111-3/+3
|
* Fix for flash-attn. (#1310)Laurent Mazare2023-11-101-2/+2
| | | Co-authored-by: laurent <laurent@par2dc5-ai-prd-cl01dgx02.cm.cluster>
* feat: parse Cuda compute cap from env (#1066)OlivierDehaene2023-10-161-36/+52
| | | | | | | | | * feat: add support for multiple compute caps * Revert to one compute cap * fmt * fix
* Bump the version to 0.3.0. (#1014)Laurent Mazare2023-10-011-3/+3
| | | | | * Bump the version to 0.3.0. * Changelog update.
* Bump the crate versions to v0.2.3. (#886)Laurent Mazare2023-09-181-3/+3
| | | | | * Bump the crate version. * Also update the python bindings.
* Bump the crate version + update the changelog. (#822)Laurent Mazare2023-09-121-3/+3
|
* Shape with holes (#770)Laurent Mazare2023-09-081-3/+6
| | | | | * Shape with holes. * rustfmt.
* Add small customization to the build (#768)Zsombor2023-09-081-4/+20
| | | | | | | | | * Add ability to override the compiler used by NVCC from an environment variable * Allow relative paths in CANDLE_FLASH_ATTN_BUILD_DIR * Add the compilation failure to the readme, with a possible solution * Adjust the error message, and remove the special handling of the relative paths
* Properly set the is_bf16 flag. (#738)Laurent Mazare2023-09-041-6/+10
|
* BF16 support for flash-attn. (#737)Laurent Mazare2023-09-041-41/+81
|
* Add back the bf16 flash-attn kernels. (#730)Laurent Mazare2023-09-044-22/+25
|
* Add some documentation. (#673)Laurent Mazare2023-08-301-3/+3
| | | | | * Add some documentation. * Bump the crate version.
* Bump the crate version + update CHANGELOG. (#628)Laurent Mazare2023-08-271-3/+3
|
* Add some group parameter to convolutions. (#566)Laurent Mazare2023-08-231-3/+3
| | | | | | | | | | | | | * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.
* Bump the crates version to 0.1.2. (#522)Laurent Mazare2023-08-201-3/+3
|
* Relax the requirements on CustomOp. (#486)Laurent Mazare2023-08-171-2/+2
| | | | | * Relax the requirements on CustomOp. * Simplify the custom-ops when no backward is required.
* add c++17 flags (#452)Chengxu Yang2023-08-151-0/+1
|
* Rename vec-dot to vec-ops. (#449)Laurent Mazare2023-08-151-3/+3
| | | | | | | * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.
* Add the license files. (#335)Laurent Mazare2023-08-071-1/+1
|
* Update the repo location. (#305)Laurent Mazare2023-08-021-1/+1
|
* Add some missing readme files. (#304)Laurent Mazare2023-08-021-0/+1
|
* Add version numbers for all the candle crates (#303)Laurent Mazare2023-08-021-2/+2
| | | | | * Switch to candle-gemm for the time being. * Add the missing versions.
* Rename the candle crate to candle-core (#301)Laurent Mazare2023-08-021-1/+1
| | | | | * Rename to candle-core. * More candle-core renaming.
* Fix the flash-attention function names. (#282)Laurent Mazare2023-07-311-2/+2
|
* Flash attention without padding (varlen). (#281)Laurent Mazare2023-07-314-4/+283
| | | | | | | | | | | | | * Expose the seqlen variable for flash-attn without padding. * Fix the batched call. * Adapt for the varlen variant. * No need to set the batch strides when in varlen mode. * Add a test (disabled at the moment). * Get the test to work properly.
* Softmax numerical stability. (#267)Laurent Mazare2023-07-282-1/+2
| | | | | * Softmax numerical stability. * Fix the flash-attn test.