Referring flexible image restoration

Runwei Guan; Rongsheng Hu; Zhuhao Zhou; Tianlang Xue; Ka Lok Man; Jeremy Smith; Eng Gee Lim; Weiping Ding; Yutao Yue

doi:10.1016/j.eswa.2025.126857

Referring flexible image restoration

Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue, Ka Lok Man, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised modules, Multi-Head Agent Self-Attention (MHASA) for multi-degradation context modeling and Multi-Head Agent Cross Attention (MHACA) for efficient alignment between prompt and referred degradations, where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtain competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective basic structure for image restoration. We release our project at https://github.com/GuanRunwei/FIR-CP.

Original language	English
Article number	126857
Journal	Expert Systems with Applications
Volume	274
DOIs	https://doi.org/10.1016/j.eswa.2025.126857
Publication status	Published - 15 May 2025

Keywords

Cross attention
Multi-modal learning
Prompt learning
Referring flexible image restoration

Access to Document

10.1016/j.eswa.2025.126857

Cite this

@article{e06bd602d3754e55a473dfdee91dd0d1,

title = "Referring flexible image restoration",

abstract = "In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised modules, Multi-Head Agent Self-Attention (MHASA) for multi-degradation context modeling and Multi-Head Agent Cross Attention (MHACA) for efficient alignment between prompt and referred degradations, where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtain competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective basic structure for image restoration. We release our project at https://github.com/GuanRunwei/FIR-CP.",

keywords = "Cross attention, Multi-modal learning, Prompt learning, Referring flexible image restoration",

author = "Runwei Guan and Rongsheng Hu and Zhuhao Zhou and Tianlang Xue and Man, {Ka Lok} and Jeremy Smith and Lim, {Eng Gee} and Weiping Ding and Yutao Yue",

note = "Publisher Copyright: {\textcopyright} 2025 The Authors",

year = "2025",

month = may,

day = "15",

doi = "10.1016/j.eswa.2025.126857",

language = "English",

volume = "274",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier",

}

TY - JOUR

T1 - Referring flexible image restoration

AU - Guan, Runwei

AU - Hu, Rongsheng

AU - Zhou, Zhuhao

AU - Xue, Tianlang

AU - Man, Ka Lok

AU - Smith, Jeremy

AU - Lim, Eng Gee

AU - Ding, Weiping

AU - Yue, Yutao

PY - 2025/5/15

Y1 - 2025/5/15

N2 - In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised modules, Multi-Head Agent Self-Attention (MHASA) for multi-degradation context modeling and Multi-Head Agent Cross Attention (MHACA) for efficient alignment between prompt and referred degradations, where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtain competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective basic structure for image restoration. We release our project at https://github.com/GuanRunwei/FIR-CP.

AB - In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised modules, Multi-Head Agent Self-Attention (MHASA) for multi-degradation context modeling and Multi-Head Agent Cross Attention (MHACA) for efficient alignment between prompt and referred degradations, where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtain competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective basic structure for image restoration. We release our project at https://github.com/GuanRunwei/FIR-CP.

KW - Cross attention

KW - Multi-modal learning

KW - Prompt learning

KW - Referring flexible image restoration

UR - http://www.scopus.com/inward/record.url?scp=85218456974&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2025.126857

DO - 10.1016/j.eswa.2025.126857

M3 - Article

AN - SCOPUS:85218456974

SN - 0957-4174

VL - 274

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 126857

ER -

Referring flexible image restoration

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this