Contractivity of neural ODEs: an eigenvalue optimization problem
Nicola Guglielmi,
Arturo De Marinis,
Anton Savostianov,
Francesco Tudisco,
Mathematics of Computation (In press),
(2024)
Abstract
We propose a novel methodology to solve a key eigenvalue optimization problem which arises in the contractivity analysis of neural ODEs. When looking at contractivity properties of a one layer weight-tied neural ODE $\dot{u}(t)=σ(Au(t)+b)$ (with $u,b \in {\mathbb R}^n$, $A$ is a given $n \times n$ matrix, $σ: {\mathbb R} \to {\mathbb R}^+$ denotes an activation function and for a vector $z \in {\mathbb R}^n$, $σ(z) \in {\mathbb R}^n$ has to be interpreted entry-wise), we are led to study the logarithmic norm of a set of products of type $D A$, where $D$ is a diagonal matrix such that ${\mathrm{diag}}(D) \in σ'({\mathbb R}^n)$. Specifically, given a real number $c$ (usually $c=0$), the problem consists in finding the largest positive interval $χ\subseteq \mathbb [0,\infty)$ such that the logarithmic norm $μ(DA) \le c$ for all diagonal matrices $D$ with $D_{ii}\in χ$. We propose a two-level nested methodology: an inner level where, for a given $χ$, we compute an optimizer $D^\star(χ)$ by a gradient system approach, and an outer level where we tune $χ$ so that the value $c$ is reached by $μ(D^\star(χ)A)$. We extend the proposed two-level approach to the general multilayer, and possibly time-dependent, case $\dot{u}(t) = σ( A_k(t) \ldots σ( A_{1}(t) u(t) + b_{1}(t) ) \ldots + b_{k}(t) )$ and we propose several numerical examples to illustrate its behaviour, including its stabilizing performance on a one-layer neural ODE applied to the classification of the MNIST handwritten digits dataset.
Please cite this paper as:
@article{guglielmi2024contractivity,
title={Contractivity of neural ODEs: an eigenvalue optimization problem},
author={Guglielmi, Nicola and De Marinis, Arturo and Savostianov, Anton and Tudisco, Francesco},
journal={arXiv:2402.13092},
year={2024}
}
Links:
arxiv
Keywords:
neural ode
deep learning
neural networks
adversarial attacks
nonlinear eigenvalues
eigenvalue optimization