GENRE-CONDITIONED LONG-TERM 3D DANCE GENERATION DRIVEN BY MUSIC

Yuhang Huang; Junjie Zhang; Shuyan Liu; Qian Bao; Dan Zeng; Zhineng Chen; Wu Liu

doi:10.1109/ICASSP43922.2022.9747838

GENRE-CONDITIONED LONG-TERM 3D DANCE GENERATION DRIVEN BY MUSIC

Yuhang Huang, Junjie Zhang, Shuyan Liu, Qian Bao^*, Dan Zeng^*, Zhineng Chen, Wu Liu

^*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

15 Citations (Scopus)

Abstract

Dancing to music is an artistic behavior of humans, however, letting machines generate dances from music is still challenging. Most existing works have been made progress in tackling the problem of motion prediction conditioned by music, yet they rarely consider the importance of the musical genre. In this paper, we focus on generating long-term 3D dance from music with a specific genre. Specifically, we construct a pure transformer-based architecture to correlate motion features and music features. To utilize the genre information, we propose to embed the genre categories into the transformer decoder so that it can guide every frame. Moreover, different from previous inference schemes, we introduce the motion queries to output the dance sequence in parallel that significantly improves the efficiency. Extensive experiments on AIST++[1] dataset show that our model outperforms state-of-the-art methods with a much faster inference speed.

Original language	English
Title of host publication	2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	4858-4862
Number of pages	5
ISBN (Electronic)	9781665405409
DOIs	https://doi.org/10.1109/ICASSP43922.2022.9747838
Publication status	Published - 2022
Externally published	Yes
Event	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore Duration: 23 May 2022 → 27 May 2022

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2022-May
ISSN (Print)	1520-6149

Conference

Conference	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/Territory	Singapore
City	Virtual, Online
Period	23/05/22 → 27/05/22

Keywords

3D dance generation
genre-conditioned
modality fusion
music-driven

Access to Document

10.1109/ICASSP43922.2022.9747838

Cite this

Huang, Y., Zhang, J., Liu, S., Bao, Q., Zeng, D., Chen, Z., & Liu, W. (2022). GENRE-CONDITIONED LONG-TERM 3D DANCE GENERATION DRIVEN BY MUSIC. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings (pp. 4858-4862). (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP43922.2022.9747838

Huang, Yuhang ; Zhang, Junjie ; Liu, Shuyan et al. / GENRE-CONDITIONED LONG-TERM 3D DANCE GENERATION DRIVEN BY MUSIC. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 4858-4862 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{7ec9ede869b84d6e80b6783ad62090db,

title = "GENRE-CONDITIONED LONG-TERM 3D DANCE GENERATION DRIVEN BY MUSIC",

abstract = "Dancing to music is an artistic behavior of humans, however, letting machines generate dances from music is still challenging. Most existing works have been made progress in tackling the problem of motion prediction conditioned by music, yet they rarely consider the importance of the musical genre. In this paper, we focus on generating long-term 3D dance from music with a specific genre. Specifically, we construct a pure transformer-based architecture to correlate motion features and music features. To utilize the genre information, we propose to embed the genre categories into the transformer decoder so that it can guide every frame. Moreover, different from previous inference schemes, we introduce the motion queries to output the dance sequence in parallel that significantly improves the efficiency. Extensive experiments on AIST++[1] dataset show that our model outperforms state-of-the-art methods with a much faster inference speed.",

keywords = "3D dance generation, genre-conditioned, modality fusion, music-driven",

author = "Yuhang Huang and Junjie Zhang and Shuyan Liu and Qian Bao and Dan Zeng and Zhineng Chen and Wu Liu",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE; 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 ; Conference date: 23-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9747838",

language = "English",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "4858--4862",

booktitle = "2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings",

}

Huang, Y, Zhang, J, Liu, S, Bao, Q, Zeng, D, Chen, Z & Liu, W 2022, GENRE-CONDITIONED LONG-TERM 3D DANCE GENERATION DRIVEN BY MUSIC. in 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2022-May, Institute of Electrical and Electronics Engineers Inc., pp. 4858-4862, 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, Virtual, Online, Singapore, 23/05/22. https://doi.org/10.1109/ICASSP43922.2022.9747838

GENRE-CONDITIONED LONG-TERM 3D DANCE GENERATION DRIVEN BY MUSIC. / Huang, Yuhang; Zhang, Junjie; Liu, Shuyan et al.
2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. p. 4858-4862 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - GENRE-CONDITIONED LONG-TERM 3D DANCE GENERATION DRIVEN BY MUSIC

AU - Huang, Yuhang

AU - Zhang, Junjie

AU - Liu, Shuyan

AU - Bao, Qian

AU - Zeng, Dan

AU - Chen, Zhineng

AU - Liu, Wu

PY - 2022

Y1 - 2022

N2 - Dancing to music is an artistic behavior of humans, however, letting machines generate dances from music is still challenging. Most existing works have been made progress in tackling the problem of motion prediction conditioned by music, yet they rarely consider the importance of the musical genre. In this paper, we focus on generating long-term 3D dance from music with a specific genre. Specifically, we construct a pure transformer-based architecture to correlate motion features and music features. To utilize the genre information, we propose to embed the genre categories into the transformer decoder so that it can guide every frame. Moreover, different from previous inference schemes, we introduce the motion queries to output the dance sequence in parallel that significantly improves the efficiency. Extensive experiments on AIST++[1] dataset show that our model outperforms state-of-the-art methods with a much faster inference speed.

AB - Dancing to music is an artistic behavior of humans, however, letting machines generate dances from music is still challenging. Most existing works have been made progress in tackling the problem of motion prediction conditioned by music, yet they rarely consider the importance of the musical genre. In this paper, we focus on generating long-term 3D dance from music with a specific genre. Specifically, we construct a pure transformer-based architecture to correlate motion features and music features. To utilize the genre information, we propose to embed the genre categories into the transformer decoder so that it can guide every frame. Moreover, different from previous inference schemes, we introduce the motion queries to output the dance sequence in parallel that significantly improves the efficiency. Extensive experiments on AIST++[1] dataset show that our model outperforms state-of-the-art methods with a much faster inference speed.

KW - 3D dance generation

KW - genre-conditioned

KW - modality fusion

KW - music-driven

UR - http://www.scopus.com/inward/record.url?scp=85134065235&partnerID=8YFLogxK

U2 - 10.1109/ICASSP43922.2022.9747838

DO - 10.1109/ICASSP43922.2022.9747838

M3 - Conference Proceeding

AN - SCOPUS:85134065235

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 4858

EP - 4862

BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022

Y2 - 23 May 2022 through 27 May 2022

ER -

Huang Y, Zhang J, Liu S, Bao Q, Zeng D, Chen Z et al. GENRE-CONDITIONED LONG-TERM 3D DANCE GENERATION DRIVEN BY MUSIC. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. p. 4858-4862. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP43922.2022.9747838

GENRE-CONDITIONED LONG-TERM 3D DANCE GENERATION DRIVEN BY MUSIC

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this