diff options
author | Laurent Mazare <laurent.mazare@gmail.com> | 2023-07-26 07:48:10 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-07-26 07:48:10 +0100 |
commit | d9f9c859afaeed95df420aca5fdb73f52f9239c5 (patch) | |
tree | 2ef898b2906a24b57ea42b0294bc51b928f0513c /.gitmodules | |
parent | c97d51243c177e0497ea7147f426c4cc1e532c9b (diff) | |
download | candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.tar.gz candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.tar.bz2 candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.zip |
Add flash attention (#241)
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.
* More flash attn.
* Set up the flash attn parameters.
* Get things to compile locally.
* Move the flash attention files in a different directory.
* Build the static C library with nvcc.
* Add more flash attention.
* Update the build part.
* Better caching.
* Exclude flash attention from the default workspace.
* Put flash-attn behind a feature gate.
* Get the flash attn kernel to run.
* Move the flags to a more appropriate place.
* Enable flash attention in llama.
* Use flash attention in llama.
Diffstat (limited to '.gitmodules')
-rw-r--r-- | .gitmodules | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/.gitmodules b/.gitmodules new file mode 100644 index 00000000..12631cbc --- /dev/null +++ b/.gitmodules @@ -0,0 +1,3 @@ +[submodule "candle-examples/examples/flash-attn/cutlass"] + path = candle-flash-attn/cutlass + url = https://github.com/NVIDIA/cutlass.git |