Improved Diffusion Model for Fast Image Generation

Maoyu Mao; Zhuoyi Shen; Pengfei Fan

doi:10.1145/3696271.3696305

Improved Diffusion Model for Fast Image Generation

Maoyu Mao, Zhuoyi Shen, Pengfei Fan

Department of Intelligent Science

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

This paper provides methods and implementations to increase the speed of image generation with less sacrifice of image generation quality. This study reviews the development of image generation techniques, including early convolutional neural networks, variational autoencoders (VAEs), and generative models such as generative adversarial networks (GANs). In recent years, denoising diffusion probabilistic models (DDPM) have become a new trend in image generation. This trend is mainly attributed to the superior performance of DDPMs in generating high-quality images, as well as their demonstration of higher stability and consistency when dealing with complex data distributions. The ability of DDPMs to provide more granular control and predict the quality of generation through a process of gradual noise reduction compared to traditional GANs has led to their widespread interest and use in academic and industrial applications. However, in order to reduce the time and computational cost required for DDPM to generate images, an improved version of DDPM, the Denoising Diffusion Implicit Model (DDIM), is used in this study. This model effectively speeds up the image generation by optimising the diffusion process while maintaining less loss in the quality of the generated images. Firstly, The DDIM model was experimented on four different datasets, including Swiss Volume dataset, MNIST dataset, CIFAR10 dataset, and Celeba dataset, in order to comprehensively evaluate the performance and applicability of the model. In addition, in order to compare the difference in the quality of generated images between DDIM and DDPM, the experiments provide an objective measure of the quality of the generated images of the two by using the Frechette Initiation Distance (FID) as an evaluation metric. The final experimental results show that the DDIM model obtained a FID score of 21.67 on the unconditional CIFAR10 dataset and a FID score of 18.87 on the unconditional Celeba dataset. In comparison, the DDPM model obtained a FID score of 12.14 on the unconditional CIFAR10 dataset and a FID score of 5.25 on the unconditional Celeba dataset. As for the image generation rate, the experiment is 793.50% faster than DDPM for generating 50 CIFAR10 images at time step 1000. These results demonstrate the effectiveness of the DDIM model in dealing with different types of datasets, and also highlight the fact that it sacrifices some of the quality of the generated images for its advantage in terms of speed of image generation compared to the DDPM model. This study demonstrates the potential and usefulness of the DDIM model in the field of fast and high-quality image generation, and points out the direction of further improvement of the model in the future.

Original language	English
Title of host publication	MLMI 2024 - Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence
Publisher	Association for Computing Machinery
Pages	209-216
Number of pages	8
ISBN (Electronic)	9798400717833
DOIs	https://doi.org/10.1145/3696271.3696305
Publication status	Published - 2 Dec 2024
Event	7th International Conference on Machine Learning and Machine Intelligence, MLMI 2024 - Osaka, Japan Duration: 2 Aug 2024 → 4 Aug 2024

Publication series

Name	ACM International Conference Proceeding Series

Conference

Conference	7th International Conference on Machine Learning and Machine Intelligence, MLMI 2024
Country/Territory	Japan
City	Osaka
Period	2/08/24 → 4/08/24

Keywords

Denoising Diffusion Implicit Models
Denoising Diffusion Probabilistic Models
Fast Image generation
Frechet Inception Distance

Access to Document

10.1145/3696271.3696305

Cite this

@inproceedings{8e0b4696bc98431ba2798ed4734ebb77,

title = "Improved Diffusion Model for Fast Image Generation",

abstract = "This paper provides methods and implementations to increase the speed of image generation with less sacrifice of image generation quality. This study reviews the development of image generation techniques, including early convolutional neural networks, variational autoencoders (VAEs), and generative models such as generative adversarial networks (GANs). In recent years, denoising diffusion probabilistic models (DDPM) have become a new trend in image generation. This trend is mainly attributed to the superior performance of DDPMs in generating high-quality images, as well as their demonstration of higher stability and consistency when dealing with complex data distributions. The ability of DDPMs to provide more granular control and predict the quality of generation through a process of gradual noise reduction compared to traditional GANs has led to their widespread interest and use in academic and industrial applications. However, in order to reduce the time and computational cost required for DDPM to generate images, an improved version of DDPM, the Denoising Diffusion Implicit Model (DDIM), is used in this study. This model effectively speeds up the image generation by optimising the diffusion process while maintaining less loss in the quality of the generated images. Firstly, The DDIM model was experimented on four different datasets, including Swiss Volume dataset, MNIST dataset, CIFAR10 dataset, and Celeba dataset, in order to comprehensively evaluate the performance and applicability of the model. In addition, in order to compare the difference in the quality of generated images between DDIM and DDPM, the experiments provide an objective measure of the quality of the generated images of the two by using the Frechette Initiation Distance (FID) as an evaluation metric. The final experimental results show that the DDIM model obtained a FID score of 21.67 on the unconditional CIFAR10 dataset and a FID score of 18.87 on the unconditional Celeba dataset. In comparison, the DDPM model obtained a FID score of 12.14 on the unconditional CIFAR10 dataset and a FID score of 5.25 on the unconditional Celeba dataset. As for the image generation rate, the experiment is 793.50% faster than DDPM for generating 50 CIFAR10 images at time step 1000. These results demonstrate the effectiveness of the DDIM model in dealing with different types of datasets, and also highlight the fact that it sacrifices some of the quality of the generated images for its advantage in terms of speed of image generation compared to the DDPM model. This study demonstrates the potential and usefulness of the DDIM model in the field of fast and high-quality image generation, and points out the direction of further improvement of the model in the future.",

keywords = "Denoising Diffusion Implicit Models, Denoising Diffusion Probabilistic Models, Fast Image generation, Frechet Inception Distance",

author = "Maoyu Mao and Zhuoyi Shen and Pengfei Fan",

note = "Publisher Copyright: {\textcopyright} 2024 Copyright held by the owner/author(s).; 7th International Conference on Machine Learning and Machine Intelligence, MLMI 2024 ; Conference date: 02-08-2024 Through 04-08-2024",

year = "2024",

month = dec,

day = "2",

doi = "10.1145/3696271.3696305",

language = "English",

series = "ACM International Conference Proceeding Series",

publisher = "Association for Computing Machinery",

pages = "209--216",

booktitle = "MLMI 2024 - Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence",

}

Mao, M, Shen, Z & Fan, P 2024, Improved Diffusion Model for Fast Image Generation. in MLMI 2024 - Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 209-216, 7th International Conference on Machine Learning and Machine Intelligence, MLMI 2024, Osaka, Japan, 2/08/24. https://doi.org/10.1145/3696271.3696305

Improved Diffusion Model for Fast Image Generation. / Mao, Maoyu; Shen, Zhuoyi; Fan, Pengfei.
MLMI 2024 - Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence. Association for Computing Machinery, 2024. p. 209-216 (ACM International Conference Proceeding Series).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Improved Diffusion Model for Fast Image Generation

AU - Mao, Maoyu

AU - Shen, Zhuoyi

AU - Fan, Pengfei

PY - 2024/12/2

Y1 - 2024/12/2

N2 - This paper provides methods and implementations to increase the speed of image generation with less sacrifice of image generation quality. This study reviews the development of image generation techniques, including early convolutional neural networks, variational autoencoders (VAEs), and generative models such as generative adversarial networks (GANs). In recent years, denoising diffusion probabilistic models (DDPM) have become a new trend in image generation. This trend is mainly attributed to the superior performance of DDPMs in generating high-quality images, as well as their demonstration of higher stability and consistency when dealing with complex data distributions. The ability of DDPMs to provide more granular control and predict the quality of generation through a process of gradual noise reduction compared to traditional GANs has led to their widespread interest and use in academic and industrial applications. However, in order to reduce the time and computational cost required for DDPM to generate images, an improved version of DDPM, the Denoising Diffusion Implicit Model (DDIM), is used in this study. This model effectively speeds up the image generation by optimising the diffusion process while maintaining less loss in the quality of the generated images. Firstly, The DDIM model was experimented on four different datasets, including Swiss Volume dataset, MNIST dataset, CIFAR10 dataset, and Celeba dataset, in order to comprehensively evaluate the performance and applicability of the model. In addition, in order to compare the difference in the quality of generated images between DDIM and DDPM, the experiments provide an objective measure of the quality of the generated images of the two by using the Frechette Initiation Distance (FID) as an evaluation metric. The final experimental results show that the DDIM model obtained a FID score of 21.67 on the unconditional CIFAR10 dataset and a FID score of 18.87 on the unconditional Celeba dataset. In comparison, the DDPM model obtained a FID score of 12.14 on the unconditional CIFAR10 dataset and a FID score of 5.25 on the unconditional Celeba dataset. As for the image generation rate, the experiment is 793.50% faster than DDPM for generating 50 CIFAR10 images at time step 1000. These results demonstrate the effectiveness of the DDIM model in dealing with different types of datasets, and also highlight the fact that it sacrifices some of the quality of the generated images for its advantage in terms of speed of image generation compared to the DDPM model. This study demonstrates the potential and usefulness of the DDIM model in the field of fast and high-quality image generation, and points out the direction of further improvement of the model in the future.

AB - This paper provides methods and implementations to increase the speed of image generation with less sacrifice of image generation quality. This study reviews the development of image generation techniques, including early convolutional neural networks, variational autoencoders (VAEs), and generative models such as generative adversarial networks (GANs). In recent years, denoising diffusion probabilistic models (DDPM) have become a new trend in image generation. This trend is mainly attributed to the superior performance of DDPMs in generating high-quality images, as well as their demonstration of higher stability and consistency when dealing with complex data distributions. The ability of DDPMs to provide more granular control and predict the quality of generation through a process of gradual noise reduction compared to traditional GANs has led to their widespread interest and use in academic and industrial applications. However, in order to reduce the time and computational cost required for DDPM to generate images, an improved version of DDPM, the Denoising Diffusion Implicit Model (DDIM), is used in this study. This model effectively speeds up the image generation by optimising the diffusion process while maintaining less loss in the quality of the generated images. Firstly, The DDIM model was experimented on four different datasets, including Swiss Volume dataset, MNIST dataset, CIFAR10 dataset, and Celeba dataset, in order to comprehensively evaluate the performance and applicability of the model. In addition, in order to compare the difference in the quality of generated images between DDIM and DDPM, the experiments provide an objective measure of the quality of the generated images of the two by using the Frechette Initiation Distance (FID) as an evaluation metric. The final experimental results show that the DDIM model obtained a FID score of 21.67 on the unconditional CIFAR10 dataset and a FID score of 18.87 on the unconditional Celeba dataset. In comparison, the DDPM model obtained a FID score of 12.14 on the unconditional CIFAR10 dataset and a FID score of 5.25 on the unconditional Celeba dataset. As for the image generation rate, the experiment is 793.50% faster than DDPM for generating 50 CIFAR10 images at time step 1000. These results demonstrate the effectiveness of the DDIM model in dealing with different types of datasets, and also highlight the fact that it sacrifices some of the quality of the generated images for its advantage in terms of speed of image generation compared to the DDPM model. This study demonstrates the potential and usefulness of the DDIM model in the field of fast and high-quality image generation, and points out the direction of further improvement of the model in the future.

KW - Denoising Diffusion Implicit Models

KW - Denoising Diffusion Probabilistic Models

KW - Fast Image generation

KW - Frechet Inception Distance

UR - http://www.scopus.com/inward/record.url?scp=85215940832&partnerID=8YFLogxK

U2 - 10.1145/3696271.3696305

DO - 10.1145/3696271.3696305

M3 - Conference Proceeding

AN - SCOPUS:85215940832

T3 - ACM International Conference Proceeding Series

SP - 209

EP - 216

BT - MLMI 2024 - Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence

PB - Association for Computing Machinery

T2 - 7th International Conference on Machine Learning and Machine Intelligence, MLMI 2024

Y2 - 2 August 2024 through 4 August 2024

ER -

Improved Diffusion Model for Fast Image Generation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this