From d9f9c859afaeed95df420aca5fdb73f52f9239c5 Mon Sep 17 00:00:00 2001 From: Laurent Mazare Date: Wed, 26 Jul 2023 07:48:10 +0100 Subject: Add flash attention (#241) * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama. --- .gitmodules | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 .gitmodules (limited to '.gitmodules') diff --git a/.gitmodules b/.gitmodules new file mode 100644 index 00000000..12631cbc --- /dev/null +++ b/.gitmodules @@ -0,0 +1,3 @@ +[submodule "candle-examples/examples/flash-attn/cutlass"] + path = candle-flash-attn/cutlass + url = https://github.com/NVIDIA/cutlass.git -- cgit v1.2.3