Add flash attention (#241)

* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.
author: Laurent Mazare <laurent.mazare@gmail.com> 2023-07-26 07:48:10 +0100
committer: GitHub <noreply@github.com> 2023-07-26 07:48:10 +0100
commit: d9f9c859afaeed95df420aca5fdb73f52f9239c5 (patch)
tree: 2ef898b2906a24b57ea42b0294bc51b928f0513c /.gitmodules
parent: c97d51243c177e0497ea7147f426c4cc1e532c9b (diff)
download: candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.tar.gz
candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.tar.bz2
candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.zip
1 files changed, 3 insertions, 0 deletions
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 00000000..12631cbc
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "candle-examples/examples/flash-attn/cutlass"]
+	path = candle-flash-attn/cutlass
+	url = https://github.com/NVIDIA/cutlass.git
author	Laurent Mazare <laurent.mazare@gmail.com>	2023-07-26 07:48:10 +0100
committer	GitHub <noreply@github.com>	2023-07-26 07:48:10 +0100
commit	d9f9c859afaeed95df420aca5fdb73f52f9239c5 (patch)
tree	2ef898b2906a24b57ea42b0294bc51b928f0513c /.gitmodules
parent	c97d51243c177e0497ea7147f426c4cc1e532c9b (diff)
download	candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.tar.gz candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.tar.bz2 candle-d9f9c859afaeed95df420aca5fdb73f52f9239c5.zip