diff options
author | Nicolas Patry <patry.nicolas@protonmail.com> | 2023-08-01 16:36:53 +0200 |
---|---|---|
committer | Nicolas Patry <patry.nicolas@protonmail.com> | 2023-08-02 18:40:24 +0200 |
commit | a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408 (patch) | |
tree | f2f51f7e58f0fd7bfb03bc67e4b7bac99278d340 /candle-book | |
parent | 45642a8530fdfbd64fcac118aed59b7cb7dfaf45 (diff) | |
download | candle-a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408.tar.gz candle-a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408.tar.bz2 candle-a44471a305f2bc768c4f0dd0e7d23a7cfe3cb408.zip |
Adding more details on how to load things.
- Loading with memmap
- Loading a sharded tensor
- Moved some snippets to `candle-examples/src/lib.rs` This is because
managing book specific dependencies is a pain https://github.com/rust-lang/mdBook/issues/706
- This causes a non aligned inclusion https://github.com/rust-lang/mdBook/pull/1856 which we have
to ignore fmt to remove.
mdbook might need some more love :)
Diffstat (limited to 'candle-book')
-rw-r--r-- | candle-book/src/inference/hub.md | 46 |
1 files changed, 35 insertions, 11 deletions
diff --git a/candle-book/src/inference/hub.md b/candle-book/src/inference/hub.md index de514322..01492df1 100644 --- a/candle-book/src/inference/hub.md +++ b/candle-book/src/inference/hub.md @@ -25,6 +25,8 @@ let weights = candle::safetensors::load(weights, &Device::Cpu); We now have access to all the [tensors](https://huggingface.co/bert-base-uncased?show_tensors=true) within the file. +You can check all the names of the tensors [here](https://huggingface.co/bert-base-uncased?show_tensors=true) + ## Using async @@ -35,17 +37,9 @@ cargo add hf-hub --features tokio ``` ```rust,ignore -# extern crate candle; -# extern crate hf_hub; -use hf_hub::api::tokio::Api; -use candle::Device; - -let api = Api::new().unwrap(); -let repo = api.model("bert-base-uncased".to_string()); - -let weights = repo.get("model.safetensors").await.unwrap(); - -let weights = candle::safetensors::load(weights, &Device::Cpu); +# This is tested directly in examples crate because it needs external dependencies unfortunately: +# See [this](https://github.com/rust-lang/mdBook/issues/706) +{{#include ../../../candle-examples/src/lib.rs:book_hub_1}} ``` @@ -78,3 +72,33 @@ let output = linear.forward(&input_ids); ``` For a full reference, you can check out the full [bert](https://github.com/LaurentMazare/candle/tree/main/candle-examples/examples/bert) example. + +## Memory mapping + +For more efficient loading, instead of reading the file, you could use [`memmap2`](https://docs.rs/memmap2/latest/memmap2/) + +**Note**: Be careful about memory mapping it seems to cause issues on [Windows, WSL](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5893) +and will definitely be slower on network mounted disk, because it will issue more read calls. + +```rust,ignore +{{#include ../../../candle-examples/src/lib.rs:book_hub_2}} +``` + +**Note**: This operation is **unsafe**. [See the safety notice](https://docs.rs/memmap2/latest/memmap2/struct.Mmap.html#safety). +In practice model files should never be modified, and the mmaps should be mostly READONLY anyway, so the caveat most likely does not apply, but always keep it in mind. + + +## Tensor Parallel Sharding + +When using multiple GPUs to use in Tensor Parallel in order to get good latency, you can load only the part of the Tensor you need. + +For that you need to use [`safetensors`](https://crates.io/crates/safetensors) directly. + +```bash +cargo add safetensors +``` + + +```rust,ignore +{{#include ../../../candle-examples/src/lib.rs:book_hub_3}} +``` |