| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
* Fixes for lint errors introduced with Rust 1.83
* rustfmt
* Fix more lints.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
|
|
|
|
| |
* Remove some unused macros.
* More unused fixes.
|
|
|
|
|
| |
* pyo3 update.
* Stub fix.
|
|
|
|
|
|
|
|
|
|
|
| |
* Update for pyo3 0.21.
* Also adapt the RL example.
* Fix for the pyo3-onnx bindings...
* Print details on failures.
* Revert pyi.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Detach the tensors on batch-norm eval.
* Fix pyo3 bindings.
* Black tweak.
* Formatting.
* Also update the pyo3-onnx formatting.
* Apply black.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Metal quantized modifications proposal.
- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.
Fix Python.
Fixing examples.
Fix: fmt + clippy + stub.
Moving everything around.
Only missing the actual implems.
Fixing everything + adding dequantized kernels.
More work.
Fixing matmul.
Fmt + Clippy
Some clippy fixes.
Working state.
Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal
Fixing Q2K bug (present in ggml).
* Cleanup.
* Fix the rebase.
* Removing the fences speeds everything up and *is* correct this time...
* Cleanup the fence.
* After rebase.
* Bad code removal.
* Rebase after phi2 merge + fix replit default to CPU.
* Making the CI happy.
* More happy tests.
---------
Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
|
| |
|
| |
|
| |
|
|
|
|
|
| |
* Mixtral quantized instruct.
* Fix a couple typos.
|
| |
|
| |
|
|
|
|
|
| |
* Metal part 1 - Scaffolding for metal.
* Remove tracing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Start onnx integration
* Merge remote-tracking branch 'upstream/main' into feat/pyo3-onnx
* Implement ONNXModel
* `fmt`
* add `onnx` flag to python ci
* Pin `protoc` to `25.0`
* Setup `protoc` in wheel builds
* Build wheels with `onnx`
* Install `protoc` in manylinux containers
* `apt` -> `yum`
* Download `protoc` via bash script
* Back to `manylinux: auto`
* Disable `onnx` builds for linux
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* add `equal` to tensor
* add `__richcmp__` support for tensors and scalars
* typo
* more typos
* Add `abs` + `candle.testing`
* remove duplicated `broadcast_shape_binary_op`
* `candle.i16` => `candle.i64`
* `tensor.nelements` -> `tensor.nelement`
* Cleanup `abs`
|
|
|
|
|
|
|
|
|
|
|
| |
* Negative and `*args` shape handling
* Rename to `PyShapeWithHole` + validate that only one hole exists
* Regenerate stubs
---------
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
|
|
|
|
|
|
|
| |
* Add maturin ci
* fix paths
* Change sdist path
|
|
|
|
|
| |
* convert pytorch's tensor
* separate tests for convert pytorch tensor
|
| |
|
|
|
|
|
| |
* Add `mkl` support
* Set `mkl` path on linux
|
|
|
|
|
|
|
| |
* Add PyO3 ci
* Update python.yml
* Format `bert.py`
|
|
|
|
|
| |
* Add proper `None` and `tensor` indexing
* Allow indexing via lists + allow tensor/list indexing outside of first dimension
|
|
|
|
|
| |
* add `.to()` operator
* Only allow each value to be provided once via `args` or `kwargs`
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Some first `Module` implementations
* Add `state_dict` and `load_state_dict` functionality
* Move modules around and create `candle.nn.Linear`
* Add `nn.Embedding` and `nn.LayerNorm`
* Add BERT implementation
* Batch q-matmul
* Automatically dequantize `QTensors` if a `Tensor` is expected
* Add Module `.to()`, `.cuda()`, `cpu()` and `.type()` functionality
* Unittests for `Module`, `Tensor` and `candle.utils`
* Add `pytorch` like slicing to `Tensor`
* Cleanup and BERT fixes
* `black` formatting + unit-test for `nn.Linear`
* Refactor slicing implementation
|
|
|
|
|
|
|
| |
* Improve the quantized whisper setup.
* Fix the config file paths.
* Use the standard matmul where possible.
|
|
|
|
|
| |
* Bump the version to 0.3.0.
* Changelog update.
|
|
|
|
|
| |
* Bump the crate version.
* Also update the python bindings.
|
|
|
|
|
|
|
|
|
| |
* Start generating return types
* Finish tensor type hinting
* Add `save_gguf` to `utils`
* Typehint `quant-llama.py`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Begin to generate typehints.
* generate correct stubs
* Correctly include stubs
* Add comments and typhints to static functions
* ensure candle-pyo3 directory
* Make `llama.rope.freq_base` optional
* `fmt`
|
| |
|
| |
|
|
|
|
|
|
|
| |
* Return the metadata in the gguf pyo3 bindings.
* Read the metadata in the quantized llama example.
* Get inference to work on gguf files.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
* More quantized llama in python.
* Expose a couple more functions.
* Apply the last layer.
* Use the vocab from the ggml files.
|
|
|
|
|
|
|
|
|
|
|
| |
* Sketch a quantized llama using the pyo3 api.
* Add more ops.
* Expose a few more functions to use in the quantized model.
* Rope embeddings.
* Get the forward pass to work.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add more pyo3 support.
* Add some support for quantized tensors in pyo3.
* Add an arc layer on qmatmul.
* Add the quantized matmul.
* Quantization support.
* More quantization support.
* Test the python quantization.
|
| |
|
|
|
|
|
| |
* Add some documentation.
* Bump the crate version.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
|
|
|
|
|
| |
* Add the i64 dtype.
* Adapt the cuda kernels.
|