Skip to content

[Feature Request] taesd VAE (distilled VAE) #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ClashSAN opened this issue Aug 28, 2023 · 10 comments · Fixed by #88
Closed

[Feature Request] taesd VAE (distilled VAE) #36

ClashSAN opened this issue Aug 28, 2023 · 10 comments · Fixed by #88

Comments

@ClashSAN
Copy link

"taesd is tiny, distilled version of a stable diffusion vae."

Image generation results for this vae (showcased on their github) looks nearly identical, maybe having this supported in stable-diffusion-cpp this could increase generation speed.

Github: https://github.com/madebyollin/taesd

@leejet
Copy link
Owner

leejet commented Aug 28, 2023

I've taken a look, and it appears that the VAE model structure differs from the original. I'll find some time to study whether corresponding support needs to be added.

@leejet
Copy link
Owner

leejet commented Aug 28, 2023

By the way, I believe the speed bottleneck is not in the VAE image generation phase but rather in the UNet sampling phase.

@Green-Sky
Copy link
Contributor

Green-Sky commented Nov 18, 2023

Now with LCM support, VAE is now ~50% of the time (on my cpu).

Also, VAE is the point with peak memory consumption.

@FSSRepo
Copy link
Contributor

FSSRepo commented Nov 20, 2023

@leejet The operation that consumes the most memory in the VAE is im2col, up to 1 GB for a 512 x 512 image and 2 GB for a 512 x 768 image. I am considering a way to split the input into chunks and merge them in the output as memory optimization but the time of compute probably will be a little bit slow.

@Green-Sky
Copy link
Contributor

here they are talking about tiled-decoding madebyollin/taesd#8

@FSSRepo
Copy link
Contributor

FSSRepo commented Nov 23, 2023

@leejet I have been working since yesterday on implementing this autoencoder, but for some reason, I'm getting a somewhat over-saturated image. I don't know what I have wrong.

https://github.com/FSSRepo/stable-diffusion.cpp/tree/taesd-impl

Ported from: https://github.com/madebyollin/taesd/blob/main/taesd.py

AutoEncoderKL

compute buffer: 1664 MB
time with cpu backend: 23 segs

output

TinyAutoEncoder

compute buffer: 416 MB
time with cpu backend: 2 segs

output

@Green-Sky
Copy link
Contributor

@FSSRepo do you know if your model has VAE finetuning?

@FSSRepo
Copy link
Contributor

FSSRepo commented Nov 23, 2023

i don't know

@Green-Sky
Copy link
Contributor

@FSSRepo in any case, good job :) the speed improvement and low memory is insane.

@FSSRepo
Copy link
Contributor

FSSRepo commented Nov 24, 2023

in any case, good job :) the speed improvement and low memory is insane.

I have not yet checked if there is memory fragmentation since many operations are repeated, so memory usage could be improved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants