Design and Benchmarking of a Multimodality Sensor for Robotic Manipulation With GAN-Based Cross-Modality Interpretation

Dandan Zhang; Wen Fan; Jialin Lin; Haoran Li; Qingzheng Cong; Weiru Liu; Nathan F. Lepora; Shan Luo

doi:10.1109/TRO.2025.3526296

Design and Benchmarking of a Multimodality Sensor for Robotic Manipulation With GAN-Based Cross-Modality Interpretation

Dandan Zhang^*, Wen Fan, Jialin Lin, Haoran Li, Qingzheng Cong, Weiru Liu, Nathan F. Lepora, Shan Luo

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

In this article, we present the design and benchmark of an innovative sensor, ViTacTip, which fulfills the demand for advanced multimodal sensing in a compact design. A notable feature of ViTacTip is its transparent skin, which incorporates a "see-through-skin"mechanism. This mechanism aims at capturing detailed object features upon contact, significantly improving both vision-based and proximity perception capabilities. In parallel, the biomimetic tips embedded in the sensor's skin are designed to amplify contact details, thus substantially augmenting tactile and derived force perception abilities. To demonstrate the multimodal capabilities of ViTacTip, we developed a multitask learning model that enables simultaneous recognition of hardness, material, and textures. To assess the functionality and validate the versatility of ViTacTip, we conducted extensive benchmarking experiments, including object recognition, contact point detection, pose regression, and grating identification. To facilitate seamless switching between various sensing modalities, we employed a generative adversarial network (GAN)-based approach. This method enhances the applicability of the ViTacTip sensor across diverse environments by enabling cross-modality interpretation.

Original language	English
Pages (from-to)	1278-1295
Number of pages	18
Journal	IEEE Transactions on Robotics
Volume	41
DOIs	https://doi.org/10.1109/TRO.2025.3526296
Publication status	Published - 2025
Externally published	Yes

Keywords

Cross-modality interpretation
generative adversarial network (GAN)
multimodality sensing
vision-based tactile sensor (VBTS)

Access to Document

10.1109/TRO.2025.3526296

Cite this

@article{13985c285b3e431da7185c084c8e1b14,

title = "Design and Benchmarking of a Multimodality Sensor for Robotic Manipulation With GAN-Based Cross-Modality Interpretation",

abstract = "In this article, we present the design and benchmark of an innovative sensor, ViTacTip, which fulfills the demand for advanced multimodal sensing in a compact design. A notable feature of ViTacTip is its transparent skin, which incorporates a {"}see-through-skin{"}mechanism. This mechanism aims at capturing detailed object features upon contact, significantly improving both vision-based and proximity perception capabilities. In parallel, the biomimetic tips embedded in the sensor's skin are designed to amplify contact details, thus substantially augmenting tactile and derived force perception abilities. To demonstrate the multimodal capabilities of ViTacTip, we developed a multitask learning model that enables simultaneous recognition of hardness, material, and textures. To assess the functionality and validate the versatility of ViTacTip, we conducted extensive benchmarking experiments, including object recognition, contact point detection, pose regression, and grating identification. To facilitate seamless switching between various sensing modalities, we employed a generative adversarial network (GAN)-based approach. This method enhances the applicability of the ViTacTip sensor across diverse environments by enabling cross-modality interpretation.",

keywords = "Cross-modality interpretation, generative adversarial network (GAN), multimodality sensing, vision-based tactile sensor (VBTS)",

author = "Dandan Zhang and Wen Fan and Jialin Lin and Haoran Li and Qingzheng Cong and Weiru Liu and Lepora, {Nathan F.} and Shan Luo",

note = "Publisher Copyright: {\textcopyright} 2025 IEEE.",

year = "2025",

doi = "10.1109/TRO.2025.3526296",

language = "English",

volume = "41",

pages = "1278--1295",

journal = "IEEE Transactions on Robotics",

issn = "1552-3098",

}

TY - JOUR

T1 - Design and Benchmarking of a Multimodality Sensor for Robotic Manipulation With GAN-Based Cross-Modality Interpretation

AU - Zhang, Dandan

AU - Fan, Wen

AU - Lin, Jialin

AU - Li, Haoran

AU - Cong, Qingzheng

AU - Liu, Weiru

AU - Lepora, Nathan F.

AU - Luo, Shan

PY - 2025

Y1 - 2025

N2 - In this article, we present the design and benchmark of an innovative sensor, ViTacTip, which fulfills the demand for advanced multimodal sensing in a compact design. A notable feature of ViTacTip is its transparent skin, which incorporates a "see-through-skin"mechanism. This mechanism aims at capturing detailed object features upon contact, significantly improving both vision-based and proximity perception capabilities. In parallel, the biomimetic tips embedded in the sensor's skin are designed to amplify contact details, thus substantially augmenting tactile and derived force perception abilities. To demonstrate the multimodal capabilities of ViTacTip, we developed a multitask learning model that enables simultaneous recognition of hardness, material, and textures. To assess the functionality and validate the versatility of ViTacTip, we conducted extensive benchmarking experiments, including object recognition, contact point detection, pose regression, and grating identification. To facilitate seamless switching between various sensing modalities, we employed a generative adversarial network (GAN)-based approach. This method enhances the applicability of the ViTacTip sensor across diverse environments by enabling cross-modality interpretation.

AB - In this article, we present the design and benchmark of an innovative sensor, ViTacTip, which fulfills the demand for advanced multimodal sensing in a compact design. A notable feature of ViTacTip is its transparent skin, which incorporates a "see-through-skin"mechanism. This mechanism aims at capturing detailed object features upon contact, significantly improving both vision-based and proximity perception capabilities. In parallel, the biomimetic tips embedded in the sensor's skin are designed to amplify contact details, thus substantially augmenting tactile and derived force perception abilities. To demonstrate the multimodal capabilities of ViTacTip, we developed a multitask learning model that enables simultaneous recognition of hardness, material, and textures. To assess the functionality and validate the versatility of ViTacTip, we conducted extensive benchmarking experiments, including object recognition, contact point detection, pose regression, and grating identification. To facilitate seamless switching between various sensing modalities, we employed a generative adversarial network (GAN)-based approach. This method enhances the applicability of the ViTacTip sensor across diverse environments by enabling cross-modality interpretation.

KW - Cross-modality interpretation

KW - generative adversarial network (GAN)

KW - multimodality sensing

KW - vision-based tactile sensor (VBTS)

UR - http://www.scopus.com/inward/record.url?scp=85214555401&partnerID=8YFLogxK

U2 - 10.1109/TRO.2025.3526296

DO - 10.1109/TRO.2025.3526296

M3 - Article

AN - SCOPUS:85214555401

SN - 1552-3098

VL - 41

SP - 1278

EP - 1295

JO - IEEE Transactions on Robotics

JF - IEEE Transactions on Robotics

ER -

Design and Benchmarking of a Multimodality Sensor for Robotic Manipulation With GAN-Based Cross-Modality Interpretation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this