| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
* feat: add silu activation function
* use silu/arg in grad
* update candle-nn
* use node
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Detach the tensors on batch-norm eval.
* Fix pyo3 bindings.
* Black tweak.
* Formatting.
* Also update the pyo3-onnx formatting.
* Apply black.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
`candle-nn` already exposes a trait to define custom backends. However,
it's not possible to actually construct a `VarBuilder` with a custom
backend because the constructor is not exposed.
This change makes the constructor public and renames it from `new` to
`from_backend` to avoid that it is seen as the primary
constructor (which could be confusing to users).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Metal quantized modifications proposal.
- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.
Fix Python.
Fixing examples.
Fix: fmt + clippy + stub.
Moving everything around.
Only missing the actual implems.
Fixing everything + adding dequantized kernels.
More work.
Fixing matmul.
Fmt + Clippy
Some clippy fixes.
Working state.
Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal
Fixing Q2K bug (present in ggml).
* Cleanup.
* Fix the rebase.
* Removing the fences speeds everything up and *is* correct this time...
* Cleanup the fence.
* After rebase.
* Bad code removal.
* Rebase after phi2 merge + fix replit default to CPU.
* Making the CI happy.
* More happy tests.
---------
Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Update the Phi model to use the updated architecture.
* Add more of the phi model.
* Repeat KV + caching.
* Apply the rotary embeddings.
* Add support for the new phi model in the phi example.
* Fix a couple glitches.
* Fix a couple more glitches.
|
| |
|
|
|
|
|
| |
* Simplify the one-hot implementation, support arbitrary rank.
* More cleanup.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* add one-hot encoding
* one_hot: improve error handling, use generic to_vecN::<D>
Bails if the index value is equal to or greater than the depth value,
which would result in an out-of-bounds error.
A redundant check is added to ensure the index value does not exceed
the length of the one-hot matrix size, which would also result in an
out-of-bounds error.
Bails if the index value is less than -1. If the index value is -1,
then it ignores the setting of the on_value for the index value. Only
values that are less than -1 are considered errors.
* one-hot: use two generics, one_hot::<I, O>, for input and output data types
Separating the input and output data types allows the input tensor
indices to be a different data type than the output encoded tensor data type.
For example, one_hot::<i64, u8>(...) will take an input tensor of i64 values
and encode the output tensor using u8 values.
The generic I::DTYPE must match the data type of the input indices, otherwise
the method will bail.
Additionally, this method adds an `allow_f64` option to enable the input indices
data type to be f64 values. f64 values are disabled by default.
TODO: indices data type and the generic I data type are currently not compile-time
checked.
* one_hot: remove input generic, use indices dtype matching
This commit removes the to_f64() type cast and explicitly
matches the DType from the input tensor. Currently, only U8,
U32 and I64 is supported for input tensors.
The match arms on the dtype is verbose. It would be nice
to use a generic type with the WithDtype traitbound to
pass to the to_vecN method and then return an inner value.
Open to suggestions for better approaches here to reduce
the match arm verbosity.
* one_hot: use flat_map iterator over dims instead of nested for loop
This commit replaces the nested for loops with an flat map iter over
the dimensions of the input tensor.
This commit also adds a test for a rank 3 input tensor.
* one_hot: use mandatory on/off-values, remove const msgs
This commit also updates doc tests, comments and test cases.
* Small cleanups.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add training to batchnorm with exponential moving average
* Add more checks to batch norm
* Resolve some review comments
* Add with_momentum varients of `new` methods
* Add check for range of momentum variable; update batch norm test
* Run cargo fmt
* Add back num_features parameter
* Format; tiny simplification
|
| |
|
|\
| |
| | |
Starting to fix some tests.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Few fixes.
Going back on remote metal-rs.
Reusing a single buffer (for now) to speed things up.
Adding some half kernels.
All tests are panicking instead of random failure.
Putting back f16 index select.
Add erf.
Working version for llama2-c.
Fixes + cache compute_pipeline_state.
BF16 metal fix.
Remove some prints.
new_owned -> new()..to_owned().
Better batched matmul.
Metal operational.
Reuse buffers on our own reference counts.
Tmp gemm.
Revert "Tmp gemm."
This reverts commit c65f68e98814b65daa596696bda076a73303dd82.
Interleave committing.
Speeding up copies using blit.
Fmt.
Fmt.
Remove the assert!
Fmt all.
Fixes after big rebase.
Add softmax for half and bfloat + tests
Fixing Llama example + accumulate softmax in float.
|
| | |
|
| |
| |
| |
| |
| | |
* Mixtral quantized instruct.
* Fix a couple typos.
|
| |
| |
| |
| |
| | |
* Expose AdamW parameters
* Use reference
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Speedup ShardedSafeTensors to load Tensors with default hints
* Tweaks.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
| | |
|
| | |
|
|/ |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add support to UL2 model family
* Update docs with UL2
* Create ActivationWithOptionalGating to avoid polluting activations
* Also refactor quantized t5
* Remove useless conversion
* Revert Activation::NewGelu name change
* Remove useless return
* Apply rustfmt and clippy recommendations
* Reuse t5::ActivationWithOptionalGating in quantized version
* (cosmetic change) use a match rather than ifs + avoid early returns.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Add hard-sigmoid and hard-swish activations
* Update ops.rs
* Use / rather than div.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
| |
|
|
|
|
|
| |
* Forward with training.
* Do not use dropout on vgg evaluation.
|
| |
|
|
|
|
|
|
|
| |
* Add fuse-conv-bn method for Conv2d
* no unwrap
* run rustfmp and clippy
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* add bce with logit loss
* add bce with logit loss
* remove imports
* fix tiny bug
* add test documentation and refactor function
* fix test cases and formatting
|
| |
|
| |
|
|
|
|
|
| |
* Add some preliminary support for resnet.
* Add an actual resnet example.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Only optimize float tensors.
* Use full tensors for zeros and ones.
* Add a benchmark for the matmul slowness.
* Add the convmixer model.
* Proper adaptive pooling.
|
| |
|