| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Only optimize float tensors.
* Use full tensors for zeros and ones.
* Add a benchmark for the matmul slowness.
* Add the convmixer model.
* Proper adaptive pooling.
|
|
|
|
|
|
|
| |
* Improve the quantized whisper setup.
* Fix the config file paths.
* Use the standard matmul where possible.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* im2col implementation for conv2d.
* Fix for the im2col implementation to match the current conv2d.
* Small optimization.
* Add a cuda kernel.
* Handle arbitrary layouts.
* Im2Col cuda code.
|
| |
|
|
|
|
|
| |
* Add an im2col based benchmark.
* Reshape the final result.
|
|
* Add a custom softmax implementation.
* Add softmaxlastdim to the benchmarks.
* And add a test.
* Support more dtypes.
* Polish the code.
* Use the slow implementation on cuda.
* Add a todo for the cuda kernel.
|