summaryrefslogtreecommitdiff
path: root/candle-flash-attn/Cargo.toml
Commit message (Collapse)AuthorAgeFilesLines
* Bump the caret version to 0.8.2. (#2703)Laurent Mazare2025-01-071-2/+2
|
* Bump the crate version to 0.8.1. (#2662)Laurent Mazare2024-12-071-2/+2
|
* Bump the crate version to 0.8.0. (#2612)Laurent Mazare2024-11-121-2/+2
|
* Bump the crate version to 0.7.2. (#2517)Laurent Mazare2024-09-291-2/+2
|
* Move the candle version to 0.7.1. (#2495)Laurent Mazare2024-09-221-2/+2
|
* Bump the crate version. (#2491)Laurent Mazare2024-09-211-2/+2
|
* Bump the version to 0.6.1. (#2438)Laurent Mazare2024-08-221-2/+2
|
* Bump the crate version. (#2248)Laurent Mazare2024-06-051-2/+2
|
* Bump the version number to 0.5.1. (#2155)Laurent Mazare2024-05-031-2/+2
| | | | | | | * Bump the version number to 0.5.1. * Fix clippy lints for 1.78. * More clippy fixes.
* Bumping the version number to 0.5.0. (#2009)Laurent Mazare2024-04-041-2/+2
|
* Bump the crate versions to 0.4.2. (#1821)Laurent Mazare2024-03-081-2/+2
|
* Bump the version number to 0.4.1. (#1768)Laurent Mazare2024-02-271-2/+2
| | | | | * Fix the block size for some cuda kernels. * Bump the version number to 0.4.1.
* Bump the crate version to 0.4.0. (#1658)Laurent Mazare2024-02-041-2/+2
|
* Explicit version for packages that are not in the workspace. (#1642)Laurent Mazare2024-01-311-1/+1
|
* Moving to a proper build crate `bindgen_cuda`. (#1531)Nicolas Patry2024-01-071-2/+2
| | | | | * Moving to a proper build crate `bindgen_cuda`. * Fmt.
* Unpin more of the workplace relative dependencies. (#1535)Laurent Mazare2024-01-071-2/+2
|
* Bump the crate version to 0.3.3. (#1490)Laurent Mazare2023-12-281-3/+3
|
* Bump the crate version to 0.3.2. (#1452)Laurent Mazare2023-12-171-3/+3
|
* Update for 0.3.1. (#1324)Laurent Mazare2023-11-111-3/+3
|
* Bump the version to 0.3.0. (#1014)Laurent Mazare2023-10-011-3/+3
| | | | | * Bump the version to 0.3.0. * Changelog update.
* Bump the crate versions to v0.2.3. (#886)Laurent Mazare2023-09-181-3/+3
| | | | | * Bump the crate version. * Also update the python bindings.
* Bump the crate version + update the changelog. (#822)Laurent Mazare2023-09-121-3/+3
|
* Add some documentation. (#673)Laurent Mazare2023-08-301-3/+3
| | | | | * Add some documentation. * Bump the crate version.
* Bump the crate version + update CHANGELOG. (#628)Laurent Mazare2023-08-271-3/+3
|
* Add some group parameter to convolutions. (#566)Laurent Mazare2023-08-231-3/+3
| | | | | | | | | | | | | * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.
* Bump the crates version to 0.1.2. (#522)Laurent Mazare2023-08-201-3/+3
|
* Rename vec-dot to vec-ops. (#449)Laurent Mazare2023-08-151-3/+3
| | | | | | | * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.
* Add the license files. (#335)Laurent Mazare2023-08-071-1/+1
|
* Update the repo location. (#305)Laurent Mazare2023-08-021-1/+1
|
* Add version numbers for all the candle crates (#303)Laurent Mazare2023-08-021-2/+2
| | | | | * Switch to candle-gemm for the time being. * Add the missing versions.
* Rename the candle crate to candle-core (#301)Laurent Mazare2023-08-021-1/+1
| | | | | * Rename to candle-core. * More candle-core renaming.
* Softmax numerical stability. (#267)Laurent Mazare2023-07-281-0/+1
| | | | | * Softmax numerical stability. * Fix the flash-attn test.
* Add some flash attn test (#253)Laurent Mazare2023-07-261-0/+3
| | | | | | | | | * Add some flash-attn test. * Add the cpu test. * Fail when the head is not a multiple of 8. * Polish the flash attention test.
* Again set a few extra params in flash-attn. (#245)Laurent Mazare2023-07-261-0/+2
| | | | | | | | | | | | | | | | | * Again set a few extra params. * Use the appropriate kernel sizes. * Add all the kernel sizes. * Parallel compiling. * Reduce the amount of parallelism. * Add the missing kernel. * Fix a typo. * Remove bf16 support for now.
* Add flash attention (#241)Laurent Mazare2023-07-261-0/+18
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.