| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* feat: Add unipc multistep scheduler
* chore: Clippy and formatting
* chore: Update comments
* chore: Avoid unsafety in float ordering
* refactor: Update Scheduler::step mutability requirements
* fix: Corrector img2img
* chore: Update unipc ref link to latest diffusers release
* chore: Deduplicate float ordering
* fix: Panic when running with dev profile
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* onnx: fix pad, unsqueeze
both implementations have off-by-one errors:
- Pad 'reflect' cycle for eg `dim==3` is `[0,1,2,1]` which has length of
4 (or `dim*2 - 2`) not 5 (current code `dim*2 - 1`)
- Unsqueeze(-1) for tensor with `dim==3` should be 3 (ie `dim+index+1`)
not 2 (ie currently `dim+index`)
in addition, Pad is incorrectly calculating the starting padding.
If we want to pad out 2 elements to the start, and we have this cycle
of indices of length 6, then we should skip 4 elements, but currently
we skip 2. A more visual representation of what's going on is below:
```
pad_start: 2
data: [a,b,c,d]
indices: [0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0, ..] // zigzag between 0..4
actual: skip [ c d| c b a b]
expected: ~ skip ~ [ c b| a b c d]
```
The values between `[` and `|` are padding and the values between
`|` and `]` in the example should match the original data being padded.
* Fix clippy lints.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
* Utilize batches in Stable Diffusion that were already there, but unutilized.
Also refactor out the `save_image` function.
* Clippy + cosmetic fixes.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
* Add a --seed argument to the stable-diffusion example.
* Make the case when no seed is specified, that it will not be set, but use the engine's default. This will make the CPU engine work again when no --seed is given, and will cause a bailout when a seed is there, as the engine does not currently support it.
---------
Co-authored-by: niklas <niklas@appli.se>
|
| |
|
|
|
| |
Move out the --sd-version flag out of the prompt.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Add support for SD Turbo
* Set Leading as default in euler_ancestral discrete
* Use the appropriate default values for n_steps and guidance_scale.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
|
|
|
| |
- clippy::needless-borrows-for-generic-args
- clippy::reserve-after-initialization
|
| |
|
| |
|
|
|
|
|
| |
* Override the repo for SDXL f16 vae weights.
* Slightly simpler change.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Stable Diffusion readme.
* Fix the image path.
* Move the assets.
* Resize the sample image.
* Lower resolution.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* img2img pipeline for stable diffusion.
* Rename the arguments + fix.
* Fix for zero strength.
* Another fix.
* Another fix.
* Revert.
* Include the backtrace.
* Noise scaling.
* Fix the height/width.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a custom softmax implementation.
* Add softmaxlastdim to the benchmarks.
* And add a test.
* Support more dtypes.
* Polish the code.
* Use the slow implementation on cuda.
* Add a todo for the cuda kernel.
|
|
|
|
|
|
|
| |
* Simplify usage of the pool functions.
* Small tweak.
* Attempt at using apply to simplify the convnet definition.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the dilation parameter.
* Restore the basic optimizer example.
* Dilation support in cudnn.
* Use the dilation parameter in the cpu backend.
* More dilation support.
* No support for dilation in transposed convolutions.
* Add dilation to a test.
* Remove a print.
* Helper function.
|
|
|
|
|
| |
* Use multiple transformer layer in the same cross-attn blocks.
* Make the context contiguous if required.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Preliminary support for SDXL.
* More SDXL support.
* More SDXL.
* Use the proper clip config.
* Querying for existing tensors.
* More robust test.
|
|
|
|
|
|
|
|
|
| |
* Remove some dead-code annotations.
* More dead code removal.
* One more.
* CI fix.
|
|
|
|
|
|
|
| |
* Trace the softmax op.
* Inline the sum.
* Add min/max vec operations.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
|
|
|
|
|
|
|
|
|
| |
* Skeleton files for neon support of quantization.
* SIMD version for q4 vecdot.
* Also simdify the q6k multiplication.
* Add some timings to stable-diffusion.
|
|
|
|
|
| |
* Load the image from disk and convert it to a tensor.
* Tweak the function name.
|
|
|
|
|
|
|
| |
* Start adding the module trait.
* Use the module trait.
* Implement module for qmatmul.
|
|
|
|
|
|
|
|
|
| |
* F16 support for stable diffusion.
* Keep the attention bits in F32.
* Keep more of the attention bits in F32.
* More mixed precision support.
|
|
|
|
|
|
|
|
|
|
|
| |
* Add flash-attention for the stable-diffusion example.
* Change the dtype.
* Silly fix.
* Another fix.
* Revert the dtype back to the query dtype after apply flash-attn.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Track the conv2d operations in stable-diffusion.
* Add more tracing to stable-diffusion.
* Also trace the resnet bits.
* Trace the attention blocks.
* Also trace the attention inner part.
* Small tweak.
|
| |
|
|
|
|
|
| |
* Retrieve the model files from the HF hub in the stable diffusion example.
* Add to the readme.
|
|
|
|
|
| |
* Fix the stable-diffusion vae.
* Fix for saving images.
|
|
|
|
|
| |
* Use the image crate to write the generated images.
* Make the dependency optional.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Fixes for the stable diffusion example.
* Bugfix.
* Another fix.
* Fix for group-norm.
* More fixes to get SD to work.
|
|
|
|
|
| |
* Some CLIP fixes for stable diffusion.
* Add the avg-pool2d operation on cpu.
|
|
|
|
|
| |
* Skeleton for the avg-pool2d and upsample-nearest2d ops.
* Preliminary conv2d support.
|
|
|
|
|
| |
* Simple pad support.
* Fix the tensor indexing when padding.
|
|
|
|
|
| |
* Implement group-norm.
* Add some testing for group-norm.
|
| |
|
|
|
|
|
|
|
| |
* Add the recip unary op.
* Fix the cuda kernel.
* Use the recip op in sigmoid.
|
| |
|
|
* Start adding a stable-diffusion example.
* Proper computation of the causal mask.
* Add the chunk operation.
* Work in progress: port the attention module.
* Add some dummy modules for conv2d and group-norm, get the attention module to compile.
* Re-enable the 2d convolution.
* Add the embeddings module.
* Add the resnet module.
* Add the unet blocks.
* Add the unet.
* And add the variational auto-encoder.
* Use the pad function from utils.
|