Improved Diffusion Model for Fast Image Generation

Maoyu Mao, Zhuoyi Shen, Pengfei Fan

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

This paper provides methods and implementations to increase the speed of image generation with less sacrifice of image generation quality. This study reviews the development of image generation techniques, including early convolutional neural networks, variational autoencoders (VAEs), and generative models such as generative adversarial networks (GANs). In recent years, denoising diffusion probabilistic models (DDPM) have become a new trend in image generation. This trend is mainly attributed to the superior performance of DDPMs in generating high-quality images, as well as their demonstration of higher stability and consistency when dealing with complex data distributions. The ability of DDPMs to provide more granular control and predict the quality of generation through a process of gradual noise reduction compared to traditional GANs has led to their widespread interest and use in academic and industrial applications. However, in order to reduce the time and computational cost required for DDPM to generate images, an improved version of DDPM, the Denoising Diffusion Implicit Model (DDIM), is used in this study. This model effectively speeds up the image generation by optimising the diffusion process while maintaining less loss in the quality of the generated images. Firstly, The DDIM model was experimented on four different datasets, including Swiss Volume dataset, MNIST dataset, CIFAR10 dataset, and Celeba dataset, in order to comprehensively evaluate the performance and applicability of the model. In addition, in order to compare the difference in the quality of generated images between DDIM and DDPM, the experiments provide an objective measure of the quality of the generated images of the two by using the Frechette Initiation Distance (FID) as an evaluation metric. The final experimental results show that the DDIM model obtained a FID score of 21.67 on the unconditional CIFAR10 dataset and a FID score of 18.87 on the unconditional Celeba dataset. In comparison, the DDPM model obtained a FID score of 12.14 on the unconditional CIFAR10 dataset and a FID score of 5.25 on the unconditional Celeba dataset. As for the image generation rate, the experiment is 793.50% faster than DDPM for generating 50 CIFAR10 images at time step 1000. These results demonstrate the effectiveness of the DDIM model in dealing with different types of datasets, and also highlight the fact that it sacrifices some of the quality of the generated images for its advantage in terms of speed of image generation compared to the DDPM model. This study demonstrates the potential and usefulness of the DDIM model in the field of fast and high-quality image generation, and points out the direction of further improvement of the model in the future.

Original languageEnglish
Title of host publicationMLMI 2024 - Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence
PublisherAssociation for Computing Machinery
Pages209-216
Number of pages8
ISBN (Electronic)9798400717833
DOIs
Publication statusPublished - 2 Dec 2024
Event7th International Conference on Machine Learning and Machine Intelligence, MLMI 2024 - Osaka, Japan
Duration: 2 Aug 20244 Aug 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference7th International Conference on Machine Learning and Machine Intelligence, MLMI 2024
Country/TerritoryJapan
CityOsaka
Period2/08/244/08/24

Keywords

  • Denoising Diffusion Implicit Models
  • Denoising Diffusion Probabilistic Models
  • Fast Image generation
  • Frechet Inception Distance

Fingerprint

Dive into the research topics of 'Improved Diffusion Model for Fast Image Generation'. Together they form a unique fingerprint.

Cite this