TY - GEN
T1 - Improved Diffusion Model for Fast Image Generation
AU - Mao, Maoyu
AU - Shen, Zhuoyi
AU - Fan, Pengfei
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/12/2
Y1 - 2024/12/2
N2 - This paper provides methods and implementations to increase the speed of image generation with less sacrifice of image generation quality. This study reviews the development of image generation techniques, including early convolutional neural networks, variational autoencoders (VAEs), and generative models such as generative adversarial networks (GANs). In recent years, denoising diffusion probabilistic models (DDPM) have become a new trend in image generation. This trend is mainly attributed to the superior performance of DDPMs in generating high-quality images, as well as their demonstration of higher stability and consistency when dealing with complex data distributions. The ability of DDPMs to provide more granular control and predict the quality of generation through a process of gradual noise reduction compared to traditional GANs has led to their widespread interest and use in academic and industrial applications. However, in order to reduce the time and computational cost required for DDPM to generate images, an improved version of DDPM, the Denoising Diffusion Implicit Model (DDIM), is used in this study. This model effectively speeds up the image generation by optimising the diffusion process while maintaining less loss in the quality of the generated images. Firstly, The DDIM model was experimented on four different datasets, including Swiss Volume dataset, MNIST dataset, CIFAR10 dataset, and Celeba dataset, in order to comprehensively evaluate the performance and applicability of the model. In addition, in order to compare the difference in the quality of generated images between DDIM and DDPM, the experiments provide an objective measure of the quality of the generated images of the two by using the Frechette Initiation Distance (FID) as an evaluation metric. The final experimental results show that the DDIM model obtained a FID score of 21.67 on the unconditional CIFAR10 dataset and a FID score of 18.87 on the unconditional Celeba dataset. In comparison, the DDPM model obtained a FID score of 12.14 on the unconditional CIFAR10 dataset and a FID score of 5.25 on the unconditional Celeba dataset. As for the image generation rate, the experiment is 793.50% faster than DDPM for generating 50 CIFAR10 images at time step 1000. These results demonstrate the effectiveness of the DDIM model in dealing with different types of datasets, and also highlight the fact that it sacrifices some of the quality of the generated images for its advantage in terms of speed of image generation compared to the DDPM model. This study demonstrates the potential and usefulness of the DDIM model in the field of fast and high-quality image generation, and points out the direction of further improvement of the model in the future.
AB - This paper provides methods and implementations to increase the speed of image generation with less sacrifice of image generation quality. This study reviews the development of image generation techniques, including early convolutional neural networks, variational autoencoders (VAEs), and generative models such as generative adversarial networks (GANs). In recent years, denoising diffusion probabilistic models (DDPM) have become a new trend in image generation. This trend is mainly attributed to the superior performance of DDPMs in generating high-quality images, as well as their demonstration of higher stability and consistency when dealing with complex data distributions. The ability of DDPMs to provide more granular control and predict the quality of generation through a process of gradual noise reduction compared to traditional GANs has led to their widespread interest and use in academic and industrial applications. However, in order to reduce the time and computational cost required for DDPM to generate images, an improved version of DDPM, the Denoising Diffusion Implicit Model (DDIM), is used in this study. This model effectively speeds up the image generation by optimising the diffusion process while maintaining less loss in the quality of the generated images. Firstly, The DDIM model was experimented on four different datasets, including Swiss Volume dataset, MNIST dataset, CIFAR10 dataset, and Celeba dataset, in order to comprehensively evaluate the performance and applicability of the model. In addition, in order to compare the difference in the quality of generated images between DDIM and DDPM, the experiments provide an objective measure of the quality of the generated images of the two by using the Frechette Initiation Distance (FID) as an evaluation metric. The final experimental results show that the DDIM model obtained a FID score of 21.67 on the unconditional CIFAR10 dataset and a FID score of 18.87 on the unconditional Celeba dataset. In comparison, the DDPM model obtained a FID score of 12.14 on the unconditional CIFAR10 dataset and a FID score of 5.25 on the unconditional Celeba dataset. As for the image generation rate, the experiment is 793.50% faster than DDPM for generating 50 CIFAR10 images at time step 1000. These results demonstrate the effectiveness of the DDIM model in dealing with different types of datasets, and also highlight the fact that it sacrifices some of the quality of the generated images for its advantage in terms of speed of image generation compared to the DDPM model. This study demonstrates the potential and usefulness of the DDIM model in the field of fast and high-quality image generation, and points out the direction of further improvement of the model in the future.
KW - Denoising Diffusion Implicit Models
KW - Denoising Diffusion Probabilistic Models
KW - Fast Image generation
KW - Frechet Inception Distance
UR - http://www.scopus.com/inward/record.url?scp=85215940832&partnerID=8YFLogxK
U2 - 10.1145/3696271.3696305
DO - 10.1145/3696271.3696305
M3 - Conference Proceeding
AN - SCOPUS:85215940832
T3 - ACM International Conference Proceeding Series
SP - 209
EP - 216
BT - MLMI 2024 - Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence
PB - Association for Computing Machinery
T2 - 7th International Conference on Machine Learning and Machine Intelligence, MLMI 2024
Y2 - 2 August 2024 through 4 August 2024
ER -