Balancing State Exploration and Skill Diversity in Unsupervised Skill Discovery

Xin Liu; Yaran Chen; Guixing Chen; Haoran Li; Dongbin Zhao

doi:10.1109/TCYB.2025.3548821

Balancing State Exploration and Skill Diversity in Unsupervised Skill Discovery

Xin Liu, Yaran Chen, Guixing Chen, Haoran Li, Dongbin Zhao^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Unsupervised skill discovery seeks to acquire different useful skills without extrinsic reward via unsupervised reinforcement learning (RL), with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced skill discovery methods struggle to well balance state exploration and skill diversity, particularly when the potential skills are rich and hard to discern. In this article, we propose contrastive dynamic skill discovery (ComSD) which generates diverse and exploratory unsupervised skills through a novel intrinsic incentive, named contrastive dynamic reward. It contains a particle-based exploration reward to make agents access far-reaching states for exploratory skill acquisition, and a novel contrastive diversity reward to promote the discriminability between different skills. Moreover, a novel dynamic weighting mechanism between the above two rewards is proposed to balance state exploration and skill diversity, which further enhances the quality of the discovered skills. Extensive experiments and analysis demonstrate that ComSD can generate diverse behaviors at different exploratory levels for multijoint robots, enabling state-of-the-art adaptation performance on challenging downstream tasks. It can also discover distinguishable and far-reaching exploration skills in the challenging tree-like 2-D maze.

Original language	English
Pages (from-to)	2234-2247
Number of pages	14
Journal	IEEE Transactions on Cybernetics
Volume	55
Issue number	5
DOIs	https://doi.org/10.1109/TCYB.2025.3548821
Publication status	Published - 2025
Externally published	Yes

Keywords

Contrastive learning (CL)
deep reinforcement learning (DRL)
exploration and exploitation
multitask adaptation
reinforcement learning (RL) pretraining
skill discovery
unsupervised RL

Access to Document

10.1109/TCYB.2025.3548821

Cite this

@article{09189c3b8e2c482688baa922c2a153ce,

title = "Balancing State Exploration and Skill Diversity in Unsupervised Skill Discovery",

abstract = "Unsupervised skill discovery seeks to acquire different useful skills without extrinsic reward via unsupervised reinforcement learning (RL), with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced skill discovery methods struggle to well balance state exploration and skill diversity, particularly when the potential skills are rich and hard to discern. In this article, we propose contrastive dynamic skill discovery (ComSD) which generates diverse and exploratory unsupervised skills through a novel intrinsic incentive, named contrastive dynamic reward. It contains a particle-based exploration reward to make agents access far-reaching states for exploratory skill acquisition, and a novel contrastive diversity reward to promote the discriminability between different skills. Moreover, a novel dynamic weighting mechanism between the above two rewards is proposed to balance state exploration and skill diversity, which further enhances the quality of the discovered skills. Extensive experiments and analysis demonstrate that ComSD can generate diverse behaviors at different exploratory levels for multijoint robots, enabling state-of-the-art adaptation performance on challenging downstream tasks. It can also discover distinguishable and far-reaching exploration skills in the challenging tree-like 2-D maze.",

keywords = "Contrastive learning (CL), deep reinforcement learning (DRL), exploration and exploitation, multitask adaptation, reinforcement learning (RL) pretraining, skill discovery, unsupervised RL",

author = "Xin Liu and Yaran Chen and Guixing Chen and Haoran Li and Dongbin Zhao",

note = "Publisher Copyright: {\textcopyright} 2025 IEEE.",

year = "2025",

doi = "10.1109/TCYB.2025.3548821",

language = "English",

volume = "55",

pages = "2234--2247",

journal = "IEEE Transactions on Cybernetics",

issn = "2168-2267",

number = "5",

}

TY - JOUR

T1 - Balancing State Exploration and Skill Diversity in Unsupervised Skill Discovery

AU - Liu, Xin

AU - Chen, Yaran

AU - Chen, Guixing

AU - Li, Haoran

AU - Zhao, Dongbin

PY - 2025

Y1 - 2025

N2 - Unsupervised skill discovery seeks to acquire different useful skills without extrinsic reward via unsupervised reinforcement learning (RL), with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced skill discovery methods struggle to well balance state exploration and skill diversity, particularly when the potential skills are rich and hard to discern. In this article, we propose contrastive dynamic skill discovery (ComSD) which generates diverse and exploratory unsupervised skills through a novel intrinsic incentive, named contrastive dynamic reward. It contains a particle-based exploration reward to make agents access far-reaching states for exploratory skill acquisition, and a novel contrastive diversity reward to promote the discriminability between different skills. Moreover, a novel dynamic weighting mechanism between the above two rewards is proposed to balance state exploration and skill diversity, which further enhances the quality of the discovered skills. Extensive experiments and analysis demonstrate that ComSD can generate diverse behaviors at different exploratory levels for multijoint robots, enabling state-of-the-art adaptation performance on challenging downstream tasks. It can also discover distinguishable and far-reaching exploration skills in the challenging tree-like 2-D maze.

AB - Unsupervised skill discovery seeks to acquire different useful skills without extrinsic reward via unsupervised reinforcement learning (RL), with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced skill discovery methods struggle to well balance state exploration and skill diversity, particularly when the potential skills are rich and hard to discern. In this article, we propose contrastive dynamic skill discovery (ComSD) which generates diverse and exploratory unsupervised skills through a novel intrinsic incentive, named contrastive dynamic reward. It contains a particle-based exploration reward to make agents access far-reaching states for exploratory skill acquisition, and a novel contrastive diversity reward to promote the discriminability between different skills. Moreover, a novel dynamic weighting mechanism between the above two rewards is proposed to balance state exploration and skill diversity, which further enhances the quality of the discovered skills. Extensive experiments and analysis demonstrate that ComSD can generate diverse behaviors at different exploratory levels for multijoint robots, enabling state-of-the-art adaptation performance on challenging downstream tasks. It can also discover distinguishable and far-reaching exploration skills in the challenging tree-like 2-D maze.

KW - Contrastive learning (CL)

KW - deep reinforcement learning (DRL)

KW - exploration and exploitation

KW - multitask adaptation

KW - reinforcement learning (RL) pretraining

KW - skill discovery

KW - unsupervised RL

UR - http://www.scopus.com/inward/record.url?scp=105003703930&partnerID=8YFLogxK

U2 - 10.1109/TCYB.2025.3548821

DO - 10.1109/TCYB.2025.3548821

M3 - Article

C2 - 40138236

AN - SCOPUS:105003703930

SN - 2168-2267

VL - 55

SP - 2234

EP - 2247

JO - IEEE Transactions on Cybernetics

JF - IEEE Transactions on Cybernetics

IS - 5

ER -

Balancing State Exploration and Skill Diversity in Unsupervised Skill Discovery

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this