| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Only optimize float tensors.
* Use full tensors for zeros and ones.
* Add a benchmark for the matmul slowness.
* Add the convmixer model.
* Proper adaptive pooling.
|
|
|
|
|
|
|
| |
* Improve the quantized whisper setup.
* Fix the config file paths.
* Use the standard matmul where possible.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* im2col implementation for conv2d.
* Fix for the im2col implementation to match the current conv2d.
* Small optimization.
* Add a cuda kernel.
* Handle arbitrary layouts.
* Im2Col cuda code.
|
| |
|
|
|
|
|
| |
* Add an im2col based benchmark.
* Reshape the final result.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a custom softmax implementation.
* Add softmaxlastdim to the benchmarks.
* And add a test.
* Support more dtypes.
* Polish the code.
* Use the slow implementation on cuda.
* Add a todo for the cuda kernel.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a couple functions required for yolo.
* Add the yolo-v3 example.
* Add minimum and maximum.
* Use the newly introduced maximum.
* Cuda support for min/max + add some testing.
* Allow for more tests to work with accelerate.
* Fix a typo.
|
|
|
|
|
|
|
| |
* Start adding the module trait.
* Use the module trait.
* Implement module for qmatmul.
|
|
|