Many practical models in engineering sciences belong to the class of generalized linear models y = Kx + ε, with y ∈ Rm, x ∈ Rn, and K ∈ Rm×n a linear operator mapping x to y. In system identification, these include OKID where y is the response of an unknown linear system, K a Toeplitz matrix constructed from an input sequence and x is the vector of Markov parameters of the unknown system. Likewise, linear stochastic estimation or the linear deconvolution problem can be cast as generalized linear models. Of interest to us are situations where the operator K itself is unknown. Given training pairs (xi, yi), it is possible to formulate an ordinary least-squares regression problem to identify it. Yet, for typical engineering problems, x and y are in general high-dimensional vectors. Hence, we are unlikely to have sufficient data to obtain a good statistical estimate of K. One can however regularize the problem by assuming K to be a low-rank linear operator. A good estimate can then be obtained by solving minimize the norm ∥Y − KX∥ subject to rank(K) = r, where X and Y are data matrices and r is the rank of the desired approximation. Although non-convex, this rank-constrained problem admits a closed-form solution. We will in particu- lar illustrate how many of the classical modal decompositions used in the community fall into this framework, e.g. POD, DMD or CCA to name just a few. One key feature of this work is that it actually provides a tractable optimal formulation of Dynamic Mode Decomposition as well as clearly establishing the connection between POD and DMD modes, a long standing question in the community. Under certain conditions, it also relates the problem of finding the best rank-r DMD model to that of maximizing the mutual information between x and y, providing a better understanding of the statistical interpretation of DMD analysis.