Contractivity of neural ODEs: an eigenvalue optimization problem

Nicola Guglielmi, Arturo De Marinis, Anton Savostianov, Francesco Tudisco,
Mathematics of Computation (In press), (2024)

Abstract

We propose a novel methodology to solve a key eigenvalue optimization problem which arises in the contractivity analysis of neural ODEs. When looking at contractivity properties of a one layer weight-tied neural ODE $\dot{u}(t)=σ(Au(t)+b)$ (with $u,b \in {\mathbb R}^n$ , $A$ is a given $n \times n$ matrix, $σ: {\mathbb R} \to {\mathbb R}^+$ denotes an activation function and for a vector $z \in {\mathbb R}^n$ , $σ(z) \in {\mathbb R}^n$ has to be interpreted entry-wise), we are led to study the logarithmic norm of a set of products of type $D A$ , where $D$ is a diagonal matrix such that ${\mathrm{diag}}(D) \in σ'({\mathbb R}^n)$ . Specifically, given a real number $c$ (usually $c=0$ ), the problem consists in finding the largest positive interval $χ\subseteq \mathbb [0,\infty)$ such that the logarithmic norm $μ(DA) \le c$ for all diagonal matrices $D$ with $D_{ii}\in χ$ . We propose a two-level nested methodology: an inner level where, for a given $χ$ , we compute an optimizer $D^\star(χ)$ by a gradient system approach, and an outer level where we tune $χ$ so that the value $c$ is reached by $μ(D^\star(χ)A)$ . We extend the proposed two-level approach to the general multilayer, and possibly time-dependent, case $\dot{u}(t) = σ( A_k(t) \ldots σ( A_{1}(t) u(t) + b_{1}(t) ) \ldots + b_{k}(t) )$ and we propose several numerical examples to illustrate its behaviour, including its stabilizing performance on a one-layer neural ODE applied to the classification of the MNIST handwritten digits dataset.

Please cite this paper as:

@article{guglielmi2024contractivity,
  title={Contractivity of neural ODEs: an eigenvalue optimization problem},
  author={Guglielmi, Nicola and  De Marinis, Arturo and Savostianov, Anton and Tudisco, Francesco},
  journal={arXiv:2402.13092},
  year={2024}
}

Links: arxiv doi

Keywords: neural ode deep learning neural networks adversarial attacks nonlinear eigenvalues eigenvalue optimization