diff options
author | Laurent Mazare <laurent.mazare@gmail.com> | 2023-07-26 07:48:10 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-07-26 07:48:10 +0100 |
commit | d9f9c859afaeed95df420aca5fdb73f52f9239c5 (patch) | |
tree | 2ef898b2906a24b57ea42b0294bc51b928f0513c /candle-flash-attn/Cargo.toml | |
parent | c97d51243c177e0497ea7147f426c4cc1e532c9b (diff) | |
download | candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.tar.gz candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.tar.bz2 candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.zip |
Add flash attention (#241)
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.
* More flash attn.
* Set up the flash attn parameters.
* Get things to compile locally.
* Move the flash attention files in a different directory.
* Build the static C library with nvcc.
* Add more flash attention.
* Update the build part.
* Better caching.
* Exclude flash attention from the default workspace.
* Put flash-attn behind a feature gate.
* Get the flash attn kernel to run.
* Move the flags to a more appropriate place.
* Enable flash attention in llama.
* Use flash attention in llama.
Diffstat (limited to 'candle-flash-attn/Cargo.toml')
-rw-r--r-- | candle-flash-attn/Cargo.toml | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/candle-flash-attn/Cargo.toml b/candle-flash-attn/Cargo.toml new file mode 100644 index 00000000..25201a0e --- /dev/null +++ b/candle-flash-attn/Cargo.toml @@ -0,0 +1,18 @@ +[package] +name = "candle-flash-attn" +version = "0.1.0" +edition = "2021" + +description = "Flash attention layer for the candle ML framework." +repository = "https://github.com/LaurentMazare/candle" +keywords = ["blas", "tensor", "machine-learning"] +categories = ["science"] +license = "MIT/Apache-2.0" +readme = "README.md" + +[dependencies] +candle = { path = "../candle-core", features = ["cuda"] } +half = { version = "2.3.1", features = ["num-traits"] } + +[build-dependencies] +anyhow = { version = "1", features = ["backtrace"] } |