improve (#2548)

author: Jorge António <matroid@outlook.com> 2024-10-07 16:30:56 +0100
committer: GitHub <noreply@github.com> 2024-10-07 17:30:56 +0200
commit: edf7668291a30d6c73dd0fb884a74d1d78e5786d (patch)
tree: f3ac50d3adfb1006ca89a72a05fb6baa006ef1c7
parent: e4a96f9e7c2b88dec33b6076cc9756ac76d44df1 (diff)
download: candle-edf7668291a30d6c73dd0fb884a74d1d78e5786d.tar.gz
candle-edf7668291a30d6c73dd0fb884a74d1d78e5786d.tar.bz2
candle-edf7668291a30d6c73dd0fb884a74d1d78e5786d.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/README.md b/README.md
index a351ab66..4c84a091 100644
--- a/README.md
+++ b/README.md
@@ -187,6 +187,7 @@ And then head over to
 - [`candle-sampling`](https://github.com/EricLBuehler/candle-sampling): Sampling techniques for Candle.
 - [`gpt-from-scratch-rs`](https://github.com/jeroenvlek/gpt-from-scratch-rs): A port of Andrej Karpathy's _Let's build GPT_ tutorial on YouTube showcasing the Candle API on a toy problem.
 - [`candle-einops`](https://github.com/tomsanbear/candle-einops): A pure rust implementation of the python [einops](https://github.com/arogozhnikov/einops) library.
+- [`atoma-infer`](https://github.com/atoma-network/atoma-infer): A Rust library for fast inference at scale, leveraging FlashAttention2 for efficient attention computation, PagedAttention for efficient KV-cache memory management, and multi-GPU support. It is OpenAI api compatible.
 
 If you have an addition to this list, please submit a pull request.
author	Jorge António <matroid@outlook.com>	2024-10-07 16:30:56 +0100
committer	GitHub <noreply@github.com>	2024-10-07 17:30:56 +0200
commit	edf7668291a30d6c73dd0fb884a74d1d78e5786d (patch)
tree	f3ac50d3adfb1006ca89a72a05fb6baa006ef1c7
parent	e4a96f9e7c2b88dec33b6076cc9756ac76d44df1 (diff)
download	candle-edf7668291a30d6c73dd0fb884a74d1d78e5786d.tar.gz candle-edf7668291a30d6c73dd0fb884a74d1d78e5786d.tar.bz2 candle-edf7668291a30d6c73dd0fb884a74d1d78e5786d.zip