Nonparametric Learning of Two-Layer ReLU Residual Units

Zhunxuan Wang; Linyun He; Chunchuan Lyu; Shay B. Cohen

Nonparametric Learning of Two-Layer ReLU Residual Units

Zhunxuan Wang, Linyun He, Chunchuan Lyu, Shay B. Cohen

Research output: Contribution to journal › Article › peer-review

Abstract

We describe an algorithm that learns two-layer residual units using rectified linear unit (ReLU) activation: suppose the input x is from a distribution with support space R^d and the]ground truth generative model is a residual unit of this type, given by y = B [(A^∗^∗x)⁺ + x, where ground-truth network parameters A^∗ ∈ R^d×d represent a full-rank matrix with nonnegative entries and B^∗ ∈ R^m×d is full-rank with m ≥ d and for c ∈ R^d, [c⁺]_i = max{0, c_i}. We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities. Following this objective landscape, learning residual units from finite samples can be formulated using convex optimization of a nonparametric function: for each layer, we first formulate the corresponding empirical risk minimization (ERM) as a positive semi-definite quadratic program (QP), then we show the solution space of the QP can be equivalently determined by a set of linear inequalities, which can then be efficiently solved by linear programming (LP). We further prove the strong statistical consistency of our algorithm, and demonstrate its robustness and sample efficiency through experimental results on synthetic data and a set of benchmark regression datasets.

Original language	English
Journal	Transactions on Machine Learning Research
Volume	2022-November
Publication status	Published - 1 Nov 2022
Externally published	Yes

Cite this

@article{a91c3836bff54e2593891a3157bc7d05,

title = "Nonparametric Learning of Two-Layer ReLU Residual Units",

abstract = "We describe an algorithm that learns two-layer residual units using rectified linear unit (ReLU) activation: suppose the input x is from a distribution with support space Rd and the]ground truth generative model is a residual unit of this type, given by y = B [(A∗∗x)+ + x, where ground-truth network parameters A∗ ∈ Rd×d represent a full-rank matrix with nonnegative entries and B∗ ∈ Rm×d is full-rank with m ≥ d and for c ∈ Rd, [c+]i = max{0, ci}. We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities. Following this objective landscape, learning residual units from finite samples can be formulated using convex optimization of a nonparametric function: for each layer, we first formulate the corresponding empirical risk minimization (ERM) as a positive semi-definite quadratic program (QP), then we show the solution space of the QP can be equivalently determined by a set of linear inequalities, which can then be efficiently solved by linear programming (LP). We further prove the strong statistical consistency of our algorithm, and demonstrate its robustness and sample efficiency through experimental results on synthetic data and a set of benchmark regression datasets.",

author = "Zhunxuan Wang and Linyun He and Chunchuan Lyu and Cohen, {Shay B.}",

year = "2022",

month = nov,

day = "1",

language = "English",

volume = "2022-November",

journal = "Transactions on Machine Learning Research",

issn = "2835-8856",

}

TY - JOUR

T1 - Nonparametric Learning of Two-Layer ReLU Residual Units

AU - Wang, Zhunxuan

AU - He, Linyun

AU - Lyu, Chunchuan

AU - Cohen, Shay B.

PY - 2022/11/1

Y1 - 2022/11/1

N2 - We describe an algorithm that learns two-layer residual units using rectified linear unit (ReLU) activation: suppose the input x is from a distribution with support space Rd and the]ground truth generative model is a residual unit of this type, given by y = B [(A∗∗x)+ + x, where ground-truth network parameters A∗ ∈ Rd×d represent a full-rank matrix with nonnegative entries and B∗ ∈ Rm×d is full-rank with m ≥ d and for c ∈ Rd, [c+]i = max{0, ci}. We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities. Following this objective landscape, learning residual units from finite samples can be formulated using convex optimization of a nonparametric function: for each layer, we first formulate the corresponding empirical risk minimization (ERM) as a positive semi-definite quadratic program (QP), then we show the solution space of the QP can be equivalently determined by a set of linear inequalities, which can then be efficiently solved by linear programming (LP). We further prove the strong statistical consistency of our algorithm, and demonstrate its robustness and sample efficiency through experimental results on synthetic data and a set of benchmark regression datasets.

AB - We describe an algorithm that learns two-layer residual units using rectified linear unit (ReLU) activation: suppose the input x is from a distribution with support space Rd and the]ground truth generative model is a residual unit of this type, given by y = B [(A∗∗x)+ + x, where ground-truth network parameters A∗ ∈ Rd×d represent a full-rank matrix with nonnegative entries and B∗ ∈ Rm×d is full-rank with m ≥ d and for c ∈ Rd, [c+]i = max{0, ci}. We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities. Following this objective landscape, learning residual units from finite samples can be formulated using convex optimization of a nonparametric function: for each layer, we first formulate the corresponding empirical risk minimization (ERM) as a positive semi-definite quadratic program (QP), then we show the solution space of the QP can be equivalently determined by a set of linear inequalities, which can then be efficiently solved by linear programming (LP). We further prove the strong statistical consistency of our algorithm, and demonstrate its robustness and sample efficiency through experimental results on synthetic data and a set of benchmark regression datasets.

UR - http://www.scopus.com/inward/record.url?scp=105000039419&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:105000039419

SN - 2835-8856

VL - 2022-November

JO - Transactions on Machine Learning Research

JF - Transactions on Machine Learning Research

ER -

Nonparametric Learning of Two-Layer ReLU Residual Units

Abstract

Other files and links

Fingerprint

Cite this