summaryrefslogtreecommitdiff
path: root/candle-examples/examples/stable-diffusion
Commit message (Collapse)AuthorAgeFilesLines
* UniPC for diffusion sampling (#2684)Nick Senger2025-01-011-2/+2
| | | | | | | | | | | | | | | | | | | * feat: Add unipc multistep scheduler * chore: Clippy and formatting * chore: Update comments * chore: Avoid unsafety in float ordering * refactor: Update Scheduler::step mutability requirements * fix: Corrector img2img * chore: Update unipc ref link to latest diffusers release * chore: Deduplicate float ordering * fix: Panic when running with dev profile
* onnx: fix pad, unsqueeze (#2317)shua2024-07-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * onnx: fix pad, unsqueeze both implementations have off-by-one errors: - Pad 'reflect' cycle for eg `dim==3` is `[0,1,2,1]` which has length of 4 (or `dim*2 - 2`) not 5 (current code `dim*2 - 1`) - Unsqueeze(-1) for tensor with `dim==3` should be 3 (ie `dim+index+1`) not 2 (ie currently `dim+index`) in addition, Pad is incorrectly calculating the starting padding. If we want to pad out 2 elements to the start, and we have this cycle of indices of length 6, then we should skip 4 elements, but currently we skip 2. A more visual representation of what's going on is below: ``` pad_start: 2 data: [a,b,c,d] indices: [0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0, ..] // zigzag between 0..4 actual: skip [ c d| c b a b] expected: ~ skip ~ [ c b| a b c d] ``` The values between `[` and `|` are padding and the values between `|` and `]` in the example should match the original data being padded. * Fix clippy lints. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Utilize batches in Stable Diffusion (#2071)NorilskMajor2024-04-162-17/+60
| | | | | | | | | | | * Utilize batches in Stable Diffusion that were already there, but unutilized. Also refactor out the `save_image` function. * Clippy + cosmetic fixes. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Improve the error message on overlong prompts. (#1908)Laurent Mazare2024-03-211-0/+14
|
* Add a --seed argument to the stable-diffusion example. (#1812)Niklas Hallqvist2024-03-081-0/+8
| | | | | | | | | * Add a --seed argument to the stable-diffusion example. * Make the case when no seed is specified, that it will not be set, but use the engine's default. This will make the CPU engine work again when no --seed is given, and will cause a bailout when a seed is there, as the engine does not currently support it. --------- Co-authored-by: niklas <niklas@appli.se>
* Fix typo in README (#1740)Daniel Varga2024-02-221-1/+1
|
* Format properly the Stable Diffusion example run with params (#1511)stano2024-01-011-1/+1
| | | Move out the --sd-version flag out of the prompt.
* Add more mentions to SDXL Turbo in the readme. (#1397)Laurent Mazare2023-12-031-6/+16
|
* Stable Diffusion Turbo Support (#1395)Edwin Cheng2023-12-031-31/+90
| | | | | | | | | | | * Add support for SD Turbo * Set Leading as default in euler_ancestral discrete * Use the appropriate default values for n_steps and guidance_scale. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* fix: address clippy 0.1.74 issues (#1336)drbh2023-11-161-2/+2
| | | | - clippy::needless-borrows-for-generic-args - clippy::reserve-after-initialization
* Mention the flash-attention restriction in the readme. (#1158)Laurent Mazare2023-10-231-0/+3
|
* Remove some unusued bits. (#1067)Laurent Mazare2023-10-091-1/+0
|
* Override the repo for SDXL f16 vae weights. (#1064)Laurent Mazare2023-10-091-2/+13
| | | | | * Override the repo for SDXL f16 vae weights. * Slightly simpler change.
* Add the clamping for stable-diffusion. (#1041)Laurent Mazare2023-10-051-2/+1
|
* Use the module trait in stable-diffusion. (#817)Laurent Mazare2023-09-111-1/+1
|
* Stable-Diffusion readme (#814)Laurent Mazare2023-09-112-0/+63
| | | | | | | | | | | * Stable Diffusion readme. * Fix the image path. * Move the assets. * Resize the sample image. * Lower resolution.
* Move the stable-diffusion modeling code so that it's easier to re-use. (#812)Laurent Mazare2023-09-1112-3303/+1
|
* Fix for cudnn to work with img2img. (#753)Laurent Mazare2023-09-061-0/+2
|
* img2img pipeline for stable diffusion. (#752)Laurent Mazare2023-09-062-11/+85
| | | | | | | | | | | | | | | | | | | * img2img pipeline for stable diffusion. * Rename the arguments + fix. * Fix for zero strength. * Another fix. * Another fix. * Revert. * Include the backtrace. * Noise scaling. * Fix the height/width.
* Add a custom softmax implementation. (#744)Laurent Mazare2023-09-051-1/+1
| | | | | | | | | | | | | | | * Add a custom softmax implementation. * Add softmaxlastdim to the benchmarks. * And add a test. * Support more dtypes. * Polish the code. * Use the slow implementation on cuda. * Add a todo for the cuda kernel.
* Simplify usage of the pool functions. (#662)Laurent Mazare2023-08-291-1/+1
| | | | | | | * Simplify usage of the pool functions. * Small tweak. * Attempt at using apply to simplify the convnet definition.
* Dilated convolutions (#657)Laurent Mazare2023-08-291-0/+2
| | | | | | | | | | | | | | | | | | | * Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function.
* Use multiple transformer layer in the same cross-attn blocks. (#653)Laurent Mazare2023-08-294-22/+43
| | | | | * Use multiple transformer layer in the same cross-attn blocks. * Make the context contiguous if required.
* Preliminary support for SDXL. (#647)Laurent Mazare2023-08-294-57/+253
| | | | | | | | | | | | | * Preliminary support for SDXL. * More SDXL support. * More SDXL. * Use the proper clip config. * Querying for existing tensors. * More robust test.
* Remove some dead-code annotations. (#629)Laurent Mazare2023-08-274-15/+0
| | | | | | | | | * Remove some dead-code annotations. * More dead code removal. * One more. * CI fix.
* Trace softmax (#568)Laurent Mazare2023-08-231-3/+8
| | | | | | | * Trace the softmax op. * Inline the sum. * Add min/max vec operations.
* Add some group parameter to convolutions. (#566)Laurent Mazare2023-08-234-4/+10
| | | | | | | | | | | | | * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.
* Print some per-step timings in stable-diffusion. (#520)Laurent Mazare2023-08-201-1/+4
| | | | | | | | | * Skeleton files for neon support of quantization. * SIMD version for q4 vecdot. * Also simdify the q6k multiplication. * Add some timings to stable-diffusion.
* dinov2 - read images from disk and compute the class probabilities (#503)Laurent Mazare2023-08-182-21/+2
| | | | | * Load the image from disk and convert it to a tensor. * Tweak the function name.
* Add a simple Module trait and implement it for the various nn layers (#500)Laurent Mazare2023-08-187-0/+7
| | | | | | | * Start adding the module trait. * Use the module trait. * Implement module for qmatmul.
* F16 support for stable diffusion (#488)Laurent Mazare2023-08-176-43/+99
| | | | | | | | | * F16 support for stable diffusion. * Keep the attention bits in F32. * Keep more of the attention bits in F32. * More mixed precision support.
* Flash-attention support in stable diffusion (#487)Laurent Mazare2023-08-175-32/+78
| | | | | | | | | | | * Add flash-attention for the stable-diffusion example. * Change the dtype. * Silly fix. * Another fix. * Revert the dtype back to the query dtype after apply flash-attn.
* Track the conv2d operations in stable-diffusion. (#431)Laurent Mazare2023-08-136-22/+144
| | | | | | | | | | | | | * Track the conv2d operations in stable-diffusion. * Add more tracing to stable-diffusion. * Also trace the resnet bits. * Trace the attention blocks. * Also trace the attention inner part. * Small tweak.
* Allow using accelerate with stable-diffusion. (#430)Laurent Mazare2023-08-131-0/+3
|
* Stable diffusion: retrieve the model files from the HF hub. (#414)Laurent Mazare2023-08-112-34/+71
| | | | | * Retrieve the model files from the HF hub in the stable diffusion example. * Add to the readme.
* Fix the stable-diffusion vae. (#398)Laurent Mazare2023-08-102-4/+4
| | | | | * Fix the stable-diffusion vae. * Fix for saving images.
* Write the generated images using the image crate. (#363)Laurent Mazare2023-08-092-6/+25
| | | | | * Use the image crate to write the generated images. * Make the dependency optional.
* Fix the padding used in stable diffusion. (#362)Laurent Mazare2023-08-092-5/+4
|
* Fixes for the stable diffusion example. (#342)Laurent Mazare2023-08-084-6/+17
| | | | | | | | | | | * Fixes for the stable diffusion example. * Bugfix. * Another fix. * Fix for group-norm. * More fixes to get SD to work.
* Some CLIP fixes for stable diffusion. (#338)Laurent Mazare2023-08-072-14/+10
| | | | | * Some CLIP fixes for stable diffusion. * Add the avg-pool2d operation on cpu.
* Skeleton for the avg-pool2d and upsample-nearest2d ops. (#337)Laurent Mazare2023-08-072-15/+4
| | | | | * Skeleton for the avg-pool2d and upsample-nearest2d ops. * Preliminary conv2d support.
* Simple pad support. (#336)Laurent Mazare2023-08-073-7/+5
| | | | | * Simple pad support. * Fix the tensor indexing when padding.
* Implement group-norm. (#334)Laurent Mazare2023-08-072-6/+2
| | | | | * Implement group-norm. * Add some testing for group-norm.
* Main diffusion loop for the SD example. (#332)Laurent Mazare2023-08-063-8/+242
|
* Add the recip op + use it in stable-diffusion. (#331)Laurent Mazare2023-08-061-5/+13
| | | | | | | * Add the recip unary op. * Fix the cuda kernel. * Use the recip op in sigmoid.
* Add the ddim scheduler. (#330)Laurent Mazare2023-08-065-0/+445
|
* Add a stable diffusion example (#328)Laurent Mazare2023-08-069-0/+2560
* Start adding a stable-diffusion example. * Proper computation of the causal mask. * Add the chunk operation. * Work in progress: port the attention module. * Add some dummy modules for conv2d and group-norm, get the attention module to compile. * Re-enable the 2d convolution. * Add the embeddings module. * Add the resnet module. * Add the unet blocks. * Add the unet. * And add the variational auto-encoder. * Use the pad function from utils.