Finding a derivative of a dot product between matrices

1,266

I'm going to use the name $x$ for $w_0$, $y$ for $w_1$, $P$ for $L_0$ and $Q$ for $L_1$. Then the $ij$ entry of $Q$ is \begin{align} q_{ij} &= \sum_s x_{is} (y \cdot P)_{sj}\\ &= \sum_s x_{is} \sum_t y_{st} p_{tj}\\ &= \sum_{s,t} x_{is} y_{st} p_{tj}\\ \end{align} The derivative of this with respect to $x_{ab}$ involves derivatives of $y_{is}$ with respect to that (all zeroes!), derivatives of $p_{tj}$ with respect to that (all zeroes!), and derivatives of $x_{is}$ with respect to that, which are all zeroes unless $i = a$ and $s = b$. So we get \begin{align} \frac{\partial q_{ij} }{\partial x_{ab}} &= \frac{\partial}{\partial x_{ab}} \left( \sum_{s,t} x_{is} y_{st} p_{tj}\right)\\ &= \begin{cases} 0 & i \ne a \\ \sum_{t}y_{bt}p_{tb} & i = a \end{cases} \end{align}

If you apply this formula to every value $i,j$ that constitutes a legal index of $L_1$, and every $a,b$ that constitutes a legal index of $w_0$, you'll get your answer.

By the way, your statement that "The right derivative with respect to some matrix must yield a matrix with the same shape as that varying matrix." doesn't seem entirely correct to me, at least not without some very clear description of what you think a derivative is. If you have a function from $\Bbb R$ to $M(n, k)$ (the set of $n \times k$ matrices, say $ t \mapsto A(t)$, then the derivative with respect to the single variable $t$ is a matrix with the same shape as $A(t)$. If you have a function $(x, y) \mapsto B(x, y)$ from two variables to $M(n, k)$, then the derivative with respect to each is an $n \times k$ matrix, so the derivative with respect to both $x$ and $y$ should have at least two times $nk$ components.

Share:
1,266

Related videos on Youtube

dsillman2000
Author by

dsillman2000

Current Purdue undergrad majoring in Mathematics and Data Science. Expected to graduate Spring '22.

Updated on August 20, 2020

Comments

  • dsillman2000
    dsillman2000 about 3 years

    I'm trying to work out a machine learning program that minimizes error by taking the derivative of an error function and changing matrices that represent parameters to minimize that error.

    I have four matrices, $\mathbf{L_2}\in\mathbb{R}^{5\times 1}, \mathbf{L_0}\in\mathbb{R}^{3\times 1}, \mathbf{w_0}\in\mathbb{R}^{5\times 3}, \mathbf{w_1}\in\mathbb{R}^{4\times 5}$. The latter two matrices are parameter matrices, and the initial is a matrix input. I have an expression that describes the output of the algorithm(which is a matrix), $\mathbf{L_2} = \mathbf{w_1}\cdot\left(\mathbf{w_0\cdot L_0}\right)$. Rather easily, I was able to treat the matrices symbolically to find: $$\frac{\partial \mathbf{L_2}}{\partial \mathbf{w_1}} = \left(\mathbf{w_0\cdot L_0}\right)\in\mathbb{R}^{4\times 5}$$ And the output of my program regarding this calculation substantiates that this is the correct derivative. This is also confirmed by the shape of the matrix. The right derivative with respect to some matrix must yield a matrix with the same shape as that varying matrix.

    However, I ran into issues calculating $\frac{\partial \mathbf{L_2}}{\partial \mathbf{w_0}}$ because, symbolically, the derivative looks like it should come out to be:

    $$\frac{\partial \mathbf{L_2}}{\partial \mathbf{w_0}} = \mathbf{w_1\cdot L_0}$$ but that doesn't work at all, because the shapes of those matrices are incompatible. Trying to work it out by hand was difficult due to my limited knowledge of multivariable linear algebra calculus, so I was hoping I could find some help here.

  • dsillman2000
    dsillman2000 about 6 years
    Thanks for your answer, but you found what I already knew(the derivative with respect to $w_1$). I want to find the derivative with respect to $w_0$.
  • John Hughes
    John Hughes about 6 years
    Edited to reflect this request. Sorry about the misreading.