(higher order) gradient of vector valued functions

1,005

Suppose you are given a function $g$ of a scalar argument $s$, and its first few derivatives $$\eqalign{ g^{(0)}=g(s),\,\,\,g^{(1)}=\frac{dg(s)}{ds},\,\,\,g^{(2)}=\frac{d^2g(s)}{ds^2},\,\,\ldots\cr }$$ Applying the function element-wise to a vector $(z=A^Tx)$ yields a vector of the same size. The differentials of such a vectorized function can be written using the elementwise/Hadamard product $$\eqalign{ dg^{(0)} &= g^{(1)}\circ dz= g^{(1)}\circ A^Tdx \cr dg^{(1)} &= g^{(2)}\circ dz= g^{(2)}\circ A^Tdx \cr }$$ To eliminate the Hadamard products, you can create a diagonal matrix from the vector, e.g. $G={\rm Diag}(g),$ and use a regular matrix product. For such diagonal matrices, I'll use the corresponding uppercase letter $$\eqalign{ dg^{(0)} &= G^{(1)}A^Tdx \cr dg^{(1)} &= G^{(2)}A^Tdx \cr dg^{(2)} &= G^{(3)}A^Tdx \cr }$$

Use this to find the differential of the $f$ function.
NB: Instead of $\langle v,g\rangle\,\,$ I use $\,v\!:\!g,\,$ which is easier to type. $$\eqalign{ f(x) &= v:g^{(0)} \cr df &= v:dg^{(0)} = v:G^{(1)}A^Tdx \cr &= AG^{(1)}v:dx \cr f^{(1)} &= AG^{(1)}v \cr }$$ There's the first derivative. Taking its differential leads us to the second derivative $$\eqalign{ df^{(1)} &= A\,dG^{(1)}v \cr &= A(dg^{(1)}\circ v) \cr &= A(v\circ dg^{(1)}) \cr &= AV\,dg^{(1)} \cr &= AVG^{(2)}A^T\,dx \cr f^{(2)} &= AVG^{(2)}A^T \cr }$$ That's the second derivative, now on to the third $$\eqalign{ df^{(2)} &= AV\,dG^{(2)}\,A^T \cr &= AV\,{\rm Diag}\big(dg^{(2)}\big)\,A^T \cr &= AV\,{\rm Diag}\big(G^{(3)}A^Tdx\big)\,A^T \cr &= AV\,{\mathbb E}\,A:{\rm Diag}\big(G^{(3)}A^Tdx\big) \cr &= AV\,{\mathbb E}\,A:{\mathbb H}\,G^{(3)}A^T\,dx \cr f^{(3)} &= AV\,{\mathbb E}\,A:{\mathbb H}\,G^{(3)}A^T \cr }$$ where ${\mathbb E}$ is a 4th order tensor whose components can be written in terms of Kronecker deltas $${\mathbb E}_{ijkl}=\delta_{ik}\delta_{jl}$$ ${\mathbb H}$ is a 3rd order tensor whose components ${\mathbb H}_{ijk}=1$ when all indices are equal, but are zero otherwise.

You can write the 3rd derivative of $f$ in index notation as $$\eqalign{ f^{(3)}_{ils} &= A_{ij}V_{jk}\,{\mathbb E}_{klmn}\,A_{np}{\mathbb H}_{mpq}\,G^{(3)}_{qr}A^T_{rs} \cr }$$ If you're not familiar with the summation convention used in index notation, a repeated index implies a summation over that index. For example $$C_{ip} = A_{ijk}B_{jkp} \equiv \sum_j\sum_kA_{ijk}B_{jkp}$$

Share:
1,005

Related videos on Youtube

user3536409
Author by

user3536409

Updated on April 05, 2020

Comments

  • user3536409
    user3536409 over 3 years

    I need to find the expression of the third order gradient (with respect to the input vector $\mathbf{x} \in \mathbb{R}^{n} $) of the following expression:

    $\triangledown^{(3)}_{x}f(\mathbf{x}) = <\mathbf{v},g(A^{T}.\mathbf{x})>$

    Where $f\mathbf(x): \mathbb{R}^{n} \rightarrow \mathbb{R} $, $\mathbf{x} \in \mathbb{R}^{n} $, $v \in \mathbb{R}^{m}$ and $A^{T} \in \mathbb{R}^{m \times n}$ and $g(\mathbf{y}):\mathbb{R}^{m} \rightarrow \mathbb{R}^{m}$. It is assumed that $g()$ sufficiently continous as necessary to calculate gradients. Also $g()$ is an elementwise operator (for example sigmoid transfer function)

    I am not experienced calculating higher order gradients and working out the steps needed to derive the answer is very confusing. I have tried to expand the inner product and the matrix multiplication to eliminate all the vector/matrix products. This results in an expression to which I can more familiarly calculate the partial derivatives necessary to compute the gradient (ie. consider all the variables as constants that are not equal to the variable of which the partial derivative is taken). However this method seems very cumbersome and matrix calculus (https://en.wikipedia.org/wiki/Matrix_calculus) seems to be a more efficient way of solving the problem.

    So I am wondering if anyone knows how to compute the given third order gradient using matrix calculus and could possible elaborate on the steps used? Hopefully this will help me in understanding matrix calculus better.