| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Quantized version of mistral.
* Integrate the quantized mistral variant.
* Use the quantized weight files.
* Tweak the quantization command.
* Fix the dtype when computing the rotary embeddings.
* Update the readme with the quantized version.
* Fix the decoding of the remaining tokens.
|
|
|
|
|
|
|
|
|
| |
* Use yoke to provide a self-referential container for mmaped safetensor files.
* Add the new self-owned type for safetensor files without removing the previous version.
* Add routing.
* Add an initializer for the case of multiple files.
|
|
|
|
|
| |
* Use the proper block size for quantizing models.
* Use the proper dimension.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Load gguf files for the quantized t5.
* Add the quantized t5 example.
* Allow for loading local files.
* Add some support for quantizing safetensor files.
* Transpose before quantizing.
* Quantized t5.
* Retrieve the weights from the hub.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a custom softmax implementation.
* Add softmaxlastdim to the benchmarks.
* And add a test.
* Support more dtypes.
* Polish the code.
* Use the slow implementation on cuda.
* Add a todo for the cuda kernel.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the dilation parameter.
* Restore the basic optimizer example.
* Dilation support in cudnn.
* Use the dilation parameter in the cpu backend.
* More dilation support.
* No support for dilation in transposed convolutions.
* Add dilation to a test.
* Remove a print.
* Helper function.
|
| |
|
|
|
|
|
|
|
| |
* Add the quantize command.
* Bugfix for writing gguf files.
* And add a comment.
|
|
|
|
|
| |
* More pickle support.
* Be more verbose.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add to the cuda example a reproduction of the issue.
* Tweak.
* Add a test using non-square matrixes.
* Fix the conv2d kernel.
* Display the error.
* And tweak the comment.
|
|
|
|
|
|
|
|
|
| |
cuda. (#578)
* Add a test for conv2d with padding.
* Cosmetic changes.
* Bugfix the rand function on the cuda backend.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
* Retrieve more information from PyTorch checkpoints.
* Add enough support to load dino-v2 backbone weights.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Pickle work-in-progress.
* More unpickling.
* More pickling.
* Proper handling of setitems.
* Clippy.
* Again more pickling.
* Restore the example.
* Add enough pickle support to get the list of tensors.
* Read the data from zip files.
* Retrieve the tensor shape.
* Extract the size and dtype.
* More storage types.
* Improve the destructuring.
* Also support ggml files.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Pickle work-in-progress.
* More unpickling.
* More pickling.
* Proper handling of setitems.
* Clippy.
* Again more pickling.
* Restore the example.
* Add enough pickle support to get the list of tensors.
* Read the data from zip files.
* Retrieve the tensor shape.
* Extract the size and dtype.
* More storage types.
* Improve the destructuring.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Sketch some qmatmul test.
* Add the quantization function.
* More testing.
* Make the test smaller and faster.
* Add some shape checking.
|
|
|
|
|
|
|
|
|
| |
* AVX version of the vecdot for q4_0.
* Tweak the avx bits.
* Add a qmatmul benchmark.
* Fix the quantized test.
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a cudnn feature to be used for conv2d.
* Allocate the proper workspace.
* Only create a single cudnn handle per cuda device.
* Proper cudnn usage.
* Bugfix.
|
|
|
|
|
| |
* Add a softmax bench.
* Add the vectorized sum reduce.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Add more tracing to the whisper example.
* Support accelerate in more examples.
* Use accelerate for pointwise functions.
* Use accelerate for binary operations too.
* Bugfix for binary operation: use the rhs before the lhs.
|
|
|
|
|
|
|
| |
* Refactor the benchmark example.
* Rename the example.
* Add some comments.
|
|
|
|
|
| |
* Add a conv1d benchmark based on the whisper sizes.
* Enforce the batch-dim in conv1d.
|
| |
|
|
|
|
|
| |
* Add the accelerate feature.
* Ffi tweaks.
|
|
|
|
|
| |
* Rename to candle-core.
* More candle-core renaming.
|
|
|
|
|
|
|
|
|
| |
* Simplify Tensor::randn.
* Also switch Tensor::rand to use a generic dtype.
* Support sampling for f16.
* Cleanup.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Sketch a fast cuda kernel for reduce-sum.
* Sketch the rust support code for the fast sum kernel.
* More work on the fast kernel.
* Add some testing ground.
* A couple fixes for the fast sum kernel.
|
|
|
|
|
| |
* Add some very simple sum benchmark.
* Rename the file.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix some rebase issues.
* Use mkl instead.
* Use mkl in bert.
* Add the optional mkl feature.
* Conditional compilation based on the mkl feature.
* Add more mkl support.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|