Adding more details on how to load things.

- Loading with memmap - Loading a sharded tensor - Moved some snippets to `candle-examples/src/lib.rs` This is because managing book specific dependencies is a pain https://github.com/rust-lang/mdBook/issues/706 - This causes a non aligned inclusion https://github.com/rust-lang/mdBook/pull/1856 which we have to ignore fmt to remove. mdbook might need some more love :)
author: Nicolas Patry <patry.nicolas@protonmail.com> 2023-08-01 16:36:53 +0200
committer: Nicolas Patry <patry.nicolas@protonmail.com> 2023-08-02 18:40:24 +0200
commit: a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408 (patch)
tree: f2f51f7e58f0fd7bfb03bc67e4b7bac99278d340 /candle-book
parent: 45642a8530fdfbd64fcac118aed59b7cb7dfaf45 (diff)
download: candle-a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408.tar.gz
candle-a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408.tar.bz2
candle-a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408.zip
1 files changed, 35 insertions, 11 deletions
diff --git a/candle-book/src/inference/hub.md b/candle-book/src/inference/hub.md
index de514322..01492df1 100644
--- a/candle-book/src/inference/hub.md
+++ b/candle-book/src/inference/hub.md
@@ -25,6 +25,8 @@ let weights = candle::safetensors::load(weights, &Device::Cpu);
 
 We now have access to all the [tensors](https://huggingface.co/bert-base-uncased?show_tensors=true) within the file.
 
+You can check all the names of the tensors [here](https://huggingface.co/bert-base-uncased?show_tensors=true)
+
 
 ## Using async 
 
@@ -35,17 +37,9 @@ cargo add hf-hub --features tokio
 ```
 
 ```rust,ignore
-# extern crate candle;
-# extern crate hf_hub;
-use hf_hub::api::tokio::Api;
-use candle::Device;
-
-let api = Api::new().unwrap();
-let repo = api.model("bert-base-uncased".to_string());
-
-let weights = repo.get("model.safetensors").await.unwrap();
-
-let weights = candle::safetensors::load(weights, &Device::Cpu);
+# This is tested directly in examples crate because it needs external dependencies unfortunately:
+# See [this](https://github.com/rust-lang/mdBook/issues/706)
+{{#include ../../../candle-examples/src/lib.rs:book_hub_1}}
 ```
 
 
@@ -78,3 +72,33 @@ let output = linear.forward(&input_ids);
 ```
 
 For a full reference, you can check out the full [bert](https://github.com/LaurentMazare/candle/tree/main/candle-examples/examples/bert) example.
+
+## Memory mapping
+
+For more efficient loading, instead of reading the file, you could use [`memmap2`](https://docs.rs/memmap2/latest/memmap2/)
+
+**Note**: Be careful about memory mapping it seems to cause issues on [Windows, WSL](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5893)
+and will definitely be slower on network mounted disk, because it will issue more read calls.
+
+```rust,ignore
+{{#include ../../../candle-examples/src/lib.rs:book_hub_2}}
+```
+
+**Note**: This operation is **unsafe**. [See the safety notice](https://docs.rs/memmap2/latest/memmap2/struct.Mmap.html#safety).
+In practice model files should never be modified, and the mmaps should be mostly READONLY anyway, so the caveat most likely does not apply, but always keep it in mind.
+
+
+## Tensor Parallel Sharding
+
+When using multiple GPUs to use in Tensor Parallel in order to get good latency, you can load only the part of the Tensor you need.
+
+For that you need to use [`safetensors`](https://crates.io/crates/safetensors) directly.
+
+```bash
+cargo add safetensors
+```
+
+
+```rust,ignore
+{{#include ../../../candle-examples/src/lib.rs:book_hub_3}}
+```
author	Nicolas Patry <patry.nicolas@protonmail.com>	2023-08-01 16:36:53 +0200
committer	Nicolas Patry <patry.nicolas@protonmail.com>	2023-08-02 18:40:24 +0200
commit	a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408 (patch)
tree	f2f51f7e58f0fd7bfb03bc67e4b7bac99278d340 /candle-book
parent	45642a8530fdfbd64fcac118aed59b7cb7dfaf45 (diff)
download	candle-a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408.tar.gz candle-a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408.tar.bz2 candle-a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408.zip