summaryrefslogtreecommitdiff
path: root/candle-book
diff options
context:
space:
mode:
authorNicolas Patry <patry.nicolas@protonmail.com>2023-08-09 16:50:11 +0200
committerGitHub <noreply@github.com>2023-08-09 16:50:11 +0200
commitdece0b8a76c5e816cf93013f2ee54fd6e2bcbcae (patch)
treedb22146ca8d4bacd9cdc76672f87d949d84cb36e /candle-book
parentb80348d22f8f0dadb6cc4101bde031d5de69a9a5 (diff)
parentdba31473d40c88fed22574ba96021dc59f25f3f7 (diff)
downloadcandle-dece0b8a76c5e816cf93013f2ee54fd6e2bcbcae.tar.gz
candle-dece0b8a76c5e816cf93013f2ee54fd6e2bcbcae.tar.bz2
candle-dece0b8a76c5e816cf93013f2ee54fd6e2bcbcae.zip
Merge pull request #263 from huggingface/book_3
Book 3 (advanced loading + hub)
Diffstat (limited to 'candle-book')
-rw-r--r--candle-book/src/SUMMARY.md26
-rw-r--r--candle-book/src/cuda/README.md1
-rw-r--r--candle-book/src/cuda/porting.md1
-rw-r--r--candle-book/src/cuda/writing.md1
-rw-r--r--candle-book/src/error_manage.md50
-rw-r--r--candle-book/src/inference/README.md6
-rw-r--r--candle-book/src/inference/hub.md103
-rw-r--r--candle-book/src/training/serialization.md (renamed from candle-book/src/inference/serialization.md)0
8 files changed, 175 insertions, 13 deletions
diff --git a/candle-book/src/SUMMARY.md b/candle-book/src/SUMMARY.md
index ddd6e916..3432f66f 100644
--- a/candle-book/src/SUMMARY.md
+++ b/candle-book/src/SUMMARY.md
@@ -12,16 +12,16 @@
- [Running a model](inference/README.md)
- [Using the hub](inference/hub.md)
- - [Serialization](inference/serialization.md)
- - [Advanced Cuda usage](inference/cuda/README.md)
- - [Writing a custom kernel](inference/cuda/writing.md)
- - [Porting a custom kernel](inference/cuda/porting.md)
-- [Error management](error_manage.md)
-- [Creating apps](apps/README.md)
- - [Creating a WASM app](apps/wasm.md)
- - [Creating a REST api webserver](apps/rest.md)
- - [Creating a desktop Tauri app](apps/dekstop.md)
-- [Training](training/README.md)
- - [MNIST](training/mnist.md)
- - [Fine-tuning](training/finetuning.md)
-- [Using MKL](advanced/mkl.md)
+- [Error management]()
+- [Advanced Cuda usage]()
+ - [Writing a custom kernel]()
+ - [Porting a custom kernel]()
+- [Using MKL]()
+- [Creating apps]()
+ - [Creating a WASM app]()
+ - [Creating a REST api webserver]()
+ - [Creating a desktop Tauri app]()
+- [Training]()
+ - [MNIST]()
+ - [Fine-tuning]()
+ - [Serialization]()
diff --git a/candle-book/src/cuda/README.md b/candle-book/src/cuda/README.md
new file mode 100644
index 00000000..68434cbf
--- /dev/null
+++ b/candle-book/src/cuda/README.md
@@ -0,0 +1 @@
+# Advanced Cuda usage
diff --git a/candle-book/src/cuda/porting.md b/candle-book/src/cuda/porting.md
new file mode 100644
index 00000000..e332146d
--- /dev/null
+++ b/candle-book/src/cuda/porting.md
@@ -0,0 +1 @@
+# Porting a custom kernel
diff --git a/candle-book/src/cuda/writing.md b/candle-book/src/cuda/writing.md
new file mode 100644
index 00000000..0fe1f3dc
--- /dev/null
+++ b/candle-book/src/cuda/writing.md
@@ -0,0 +1 @@
+# Writing a custom kernel
diff --git a/candle-book/src/error_manage.md b/candle-book/src/error_manage.md
index 042e191f..c1a16bd9 100644
--- a/candle-book/src/error_manage.md
+++ b/candle-book/src/error_manage.md
@@ -1 +1,51 @@
# Error management
+
+You might have seen in the code base a lot of `.unwrap()` or `?`.
+If you're unfamiliar with Rust check out the [Rust book](https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html)
+for more information.
+
+What's important to know though, is that if you want to know *where* a particular operation failed
+You can simply use `RUST_BACKTRACE=1` to get the location of where the model actually failed.
+
+Let's see on failing code:
+
+```rust,ignore
+let x = Tensor::zeros((1, 784), DType::F32, &device)?;
+let y = Tensor::zeros((1, 784), DType::F32, &device)?;
+let z = x.matmul(&y)?;
+```
+
+Will print at runtime:
+
+```bash
+Error: ShapeMismatchBinaryOp { lhs: [1, 784], rhs: [1, 784], op: "matmul" }
+```
+
+
+After adding `RUST_BACKTRACE=1`:
+
+
+```bash
+Error: WithBacktrace { inner: ShapeMismatchBinaryOp { lhs: [1, 784], rhs: [1, 784], op: "matmul" }, backtrace: Backtrace [{ fn: "candle::error::Error::bt", file: "/home/nicolas/.cargo/git/checkouts/candle-5bb8ef7e0626d693/f291065/candle-core/src/error.rs", line: 200 }, { fn: "candle::tensor::Tensor::matmul", file: "/home/nicolas/.cargo/git/checkouts/candle-5bb8ef7e0626d693/f291065/candle-core/src/tensor.rs", line: 816 }, { fn: "myapp::main", file: "./src/main.rs", line: 29 }, { fn: "core::ops::function::FnOnce::call_once", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs", line: 250 }, { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/sys_common/backtrace.rs", line: 135 }, { fn: "std::rt::lang_start::{{closure}}", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 166 }, { fn: "core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs", line: 284 }, { fn: "std::panicking::try::do_call", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 500 }, { fn: "std::panicking::try", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 464 }, { fn: "std::panic::catch_unwind", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs", line: 142 }, { fn: "std::rt::lang_start_internal::{{closure}}", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 148 }, { fn: "std::panicking::try::do_call", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 500 }, { fn: "std::panicking::try", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 464 }, { fn: "std::panic::catch_unwind", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs", line: 142 }, { fn: "std::rt::lang_start_internal", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 148 }, { fn: "std::rt::lang_start", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 165 }, { fn: "main" }, { fn: "__libc_start_main" }, { fn: "_start" }] }
+```
+
+Not super pretty at the moment, but we can see error occured on `{ fn: "myapp::main", file: "./src/main.rs", line: 29 }`
+
+
+Another thing to note, is that since Rust is compiled it is not necessarily as easy to recover proper stacktraces
+especially in release builds. We're using [`anyhow`](https://docs.rs/anyhow/latest/anyhow/) for that.
+The library is still young, please [report](https://github.com/LaurentMazare/candle/issues) any issues detecting where an error is coming from.
+
+## Cuda error management
+
+When running a model on Cuda, you might get a stacktrace not really representing the error.
+The reason is that CUDA is async by nature, and therefore the error might be caught while you were sending totally different kernels.
+
+One way to avoid this is to use `CUDA_LAUNCH_BLOCKING=1` as an environment variable. This will force every kernel to be launched sequentially.
+You might still however see the error happening on other kernels as the faulty kernel might exit without an error but spoiling some pointer for which the error will happen when dropping the `CudaSlice` only.
+
+
+If this occurs, you can use [`compute-sanitizer`](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html)
+This tool is like `valgrind` but for cuda. It will help locate the errors in the kernels.
+
+
diff --git a/candle-book/src/inference/README.md b/candle-book/src/inference/README.md
index c82f85e1..1b75a310 100644
--- a/candle-book/src/inference/README.md
+++ b/candle-book/src/inference/README.md
@@ -1 +1,7 @@
# Running a model
+
+
+In order to run an existing model, you will need to download and use existing weights.
+Most models are already available on https://huggingface.co/ in [`safetensors`](https://github.com/huggingface/safetensors) format.
+
+Let's get started by running an old model : `bert-base-uncased`.
diff --git a/candle-book/src/inference/hub.md b/candle-book/src/inference/hub.md
index 6242c070..b924b76d 100644
--- a/candle-book/src/inference/hub.md
+++ b/candle-book/src/inference/hub.md
@@ -1 +1,104 @@
# Using the hub
+
+Install the [`hf-hub`](https://github.com/huggingface/hf-hub) crate:
+
+```bash
+cargo add hf-hub
+```
+
+Then let's start by downloading the [model file](https://huggingface.co/bert-base-uncased/tree/main).
+
+
+```rust
+# extern crate candle_core;
+# extern crate hf_hub;
+use hf_hub::api::sync::Api;
+use candle_core::Device;
+
+let api = Api::new().unwrap();
+let repo = api.model("bert-base-uncased".to_string());
+
+let weights = repo.get("model.safetensors").unwrap();
+
+let weights = candle_core::safetensors::load(weights, &Device::Cpu);
+```
+
+We now have access to all the [tensors](https://huggingface.co/bert-base-uncased?show_tensors=true) within the file.
+
+You can check all the names of the tensors [here](https://huggingface.co/bert-base-uncased?show_tensors=true)
+
+
+## Using async
+
+`hf-hub` comes with an async API.
+
+```bash
+cargo add hf-hub --features tokio
+```
+
+```rust,ignore
+# This is tested directly in examples crate because it needs external dependencies unfortunately:
+# See [this](https://github.com/rust-lang/mdBook/issues/706)
+{{#include ../../../candle-examples/src/lib.rs:book_hub_1}}
+```
+
+
+## Using in a real model.
+
+Now that we have our weights, we can use them in our bert architecture:
+
+```rust
+# extern crate candle_core;
+# extern crate candle_nn;
+# extern crate hf_hub;
+# use hf_hub::api::sync::Api;
+#
+# let api = Api::new().unwrap();
+# let repo = api.model("bert-base-uncased".to_string());
+#
+# let weights = repo.get("model.safetensors").unwrap();
+use candle_core::{Device, Tensor, DType};
+use candle_nn::Linear;
+
+let weights = candle_core::safetensors::load(weights, &Device::Cpu).unwrap();
+
+let weight = weights.get("bert.encoder.layer.0.attention.self.query.weight").unwrap();
+let bias = weights.get("bert.encoder.layer.0.attention.self.query.bias").unwrap();
+
+let linear = Linear::new(weight.clone(), Some(bias.clone()));
+
+let input_ids = Tensor::zeros((3, 768), DType::F32, &Device::Cpu).unwrap();
+let output = linear.forward(&input_ids).unwrap();
+```
+
+For a full reference, you can check out the full [bert](https://github.com/LaurentMazare/candle/tree/main/candle-examples/examples/bert) example.
+
+## Memory mapping
+
+For more efficient loading, instead of reading the file, you could use [`memmap2`](https://docs.rs/memmap2/latest/memmap2/)
+
+**Note**: Be careful about memory mapping it seems to cause issues on [Windows, WSL](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5893)
+and will definitely be slower on network mounted disk, because it will issue more read calls.
+
+```rust,ignore
+{{#include ../../../candle-examples/src/lib.rs:book_hub_2}}
+```
+
+**Note**: This operation is **unsafe**. [See the safety notice](https://docs.rs/memmap2/latest/memmap2/struct.Mmap.html#safety).
+In practice model files should never be modified, and the mmaps should be mostly READONLY anyway, so the caveat most likely does not apply, but always keep it in mind.
+
+
+## Tensor Parallel Sharding
+
+When using multiple GPUs to use in Tensor Parallel in order to get good latency, you can load only the part of the Tensor you need.
+
+For that you need to use [`safetensors`](https://crates.io/crates/safetensors) directly.
+
+```bash
+cargo add safetensors
+```
+
+
+```rust,ignore
+{{#include ../../../candle-examples/src/lib.rs:book_hub_3}}
+```
diff --git a/candle-book/src/inference/serialization.md b/candle-book/src/training/serialization.md
index 0dfc62d3..0dfc62d3 100644
--- a/candle-book/src/inference/serialization.md
+++ b/candle-book/src/training/serialization.md