Matrix Derivatives I: Low Order Traces

Definition

Here I will explain how to compute particular cases of matrix derivatives using tensor notation. First, a simple and quite intuitive rule for determining derivatives of matrix components with respect to an arbitrary matrix component is stated.

XijXkl=δikδjl \frac{\partial X_{ij}}{\partial X_{kl}}=\delta_{ik} \delta_{jl}

We'll first begin by exposing the differentiation of simple expressions involving traces of the matrix with which we differentiate and traces of products of constant matrices with the variable matrix. (1), along with the product rule are the protagonists in all these manipulations.

First-Order Trace Derivatives

Trace of XX

XTr(X)=I \frac{\partial}{ \partial X} \textnormal{Tr}(X)=I \\

To solve this first problem notice that the trace of XX in tensor notation is written as XiiX_{ii}, that is, with same first and last indices

XTr(X)=XiiXkl=δikδil=δkl=I \frac{\partial}{\partial X} \textnormal{Tr}(X) = \frac{\partial X_{ii}}{\partial X_{kl}} = \delta_{ik} \delta_{il}=\delta_{kl}=I

where the last equality is true because the derivative is a second-order tensor whose indices must be the same as the ones used to derivate.

Traces of XAXA, XTAX^T A, AXBAXB and AXTBAX^T B

XTr(XA)=AT \frac{\partial}{\partial X} \textnormal{Tr}(XA)=A^T

For this situation we use the fact that dot product of tensors must have equal adjacent indices between terms

XTr(XA)=(XijAji)Xkl=XijXklAji=δikδjlAji=Alk=(AT)kl=AT\begin{aligned} \frac{\partial}{\partial X} \textnormal{Tr}(XA) & = \frac{\partial (X_{ij} A_{ji})}{\partial X_{kl}} =\frac{\partial X_{ij}}{\partial X_{kl}} A_{ji} \\ & = \delta_{ik} \delta_{jl} A_{ji} =A_{lk} = (A^T)_{kl}=A^T \end{aligned}

The same problem but with the transpose is tackled in the same fashion

XTr(XTA)=(XijTAji)Xkl=XjiXklAji=δjkδilAji=Akl=AT\begin{aligned} \frac{\partial}{\partial X} \textnormal{Tr}(X^T A) & = \frac{\partial (X_{ij}^T A_{ji})}{\partial X_{kl}} =\frac{\partial X_{ji}}{\partial X_{kl}} A_{ji} \\ & = \delta_{jk} \delta_{il} A_{ji} = A_{kl} =A^T \end{aligned}

As for the product of XX with two constant matrices, there are two index concatenations in tensorial notation

XTr(AXB)=(AijXjkBki)Xpq=XjkXpqAijBki=δjpδkqAijBki=AipBqi=BqiAip=(BA)qp=(BA)pqT=(BA)T=ATBT\begin{aligned} \frac{\partial}{\partial X} \textnormal{Tr}(AXB) & =\frac{\partial (A_{ij} X_{jk} B_{ki})}{\partial X_{pq}}= \frac{\partial X_{jk}}{\partial X_{pq}} A_{ij} B_{ki} = \delta_{jp} \delta_{kq} A_{ij}B_{ki} \\ & = A_{ip} B_{qi} = B_{qi} A_{ip} = (BA)_{qp}=(BA)_{pq}^T = (BA)^T \\ & = A^T B^T \end{aligned}

and swapping XX for XTX^T

XTr(AXTB)=(AijXjkTBki)Xpq=XkjXpqAijBki=δkpδjqAijBki=AiqBpi=BpiAiq=(BA)pq=BA\begin{aligned} \frac{\partial}{\partial X}\textnormal{Tr}(AX^T B) & =\frac{\partial (A_{ij}X^T_{jk}B_{ki})}{\partial X_{pq}} = \frac{\partial X_{kj}}{\partial X_{pq}} A_{ij} B_{ki} = \delta_{kp}\delta_{jq}A_{ij}B_{ki} \\ & = A_{iq}B_{pi}=B_{pi}A_{iq} = (BA)_{pq}=BA \end{aligned}

Second-Order Trace Derivatives

Traces of X2X^2 and X2BX^2 B

XTr(X2)=(XijXji)Xpq=XijXpqXji+XijXjiXpq=δipδjqXji+Xijδjpδiq=Xqp+Xqp=2(XT)pq=2XT\begin{aligned} \frac{\partial}{\partial X} \textnormal{Tr}(X^2) & =\frac{\partial (X_{ij}X_{ji})}{\partial X_{pq}} =\frac{\partial X_{ij}}{\partial X_{pq}} X_{ji}+X_{ij}\frac{\partial X_{ji}}{\partial X_{pq}} \\ & = \delta_{ip}\delta_{jq} X_{ji} +X_{ij} \delta_{jp}\delta_{iq} \\ & =X_{qp}+X_{qp}=2(X^T)_{pq} =2X^T \end{aligned} XpqTr(X2B)=(XijXjkBki)Xpq=XijXpqXjkBki+XijXjkXpqBki=δipδjqXjkBki+XijδjpδkqBki=XqkBkp+XipBqi=(XB)qp+BqiXip=(XB)qp+(BX)qp=(XB+BX)pqT=(XB+BX)T\begin{aligned} \frac{\partial}{\partial X_{pq}} \textnormal{Tr}(X^2 B) & = \frac{\partial (X_{ij}X_{jk}B_{ki})}{\partial X_{pq}} \\ & = \frac{\partial X_{ij}}{\partial X_{pq}} X_{jk}B_{ki} + X_{ij}\frac{\partial X_{jk}}{\partial X_{pq}}B_{ki} \\ & = \delta_{ip} \delta_{jq} X_{jk} B_{ki} + X_{ij} \delta_{jp} \delta_{kq} B_{ki} = X_{qk}B_{kp}+X_{ip}B_{qi}= (XB)_{qp} + B_{qi}X_{ip} \\ &=(XB)_{qp}+(BX)_{qp}=(XB+BX)^T_{pq}=(XB+BX)^T \end{aligned}

Traces of XTBXX^T B X and XBXTXBX^T

XTr(XTBX)=(XjiBjkXki)Xpq=XjiXpqBjkXki+XjiBjkXkiXpq=δjpδiqBjkXki+XjiBjkδkpδiq=BpkXkq+XjqBjp=(BX)pq+(BT)pjXjq=(BX)pq+(BTX)pq=(BX+BTX)pq=BX+BTX\begin{aligned} \frac{\partial}{\partial X} \textnormal{Tr}(X^T BX) & = \frac{\partial (X_{ji}B_{jk}X_{ki})}{\partial X_{pq}}\\ & = \frac{\partial X_{ji}}{\partial X_{pq}} B_{jk}X_{ki} +X_{ji}B_{jk} \frac{\partial X_{ki}}{\partial X_{pq}} \\ & = \delta_{jp}\delta_{iq} B_{jk}X_{ki}+X_{ji}B_{jk}\delta_{kp}\delta_{iq} = B_{pk}X_{kq}+X_{jq}B_{jp} \\ & = (BX)_{pq}+ (B^T)_{pj} X_{jq} = (BX)_{pq}+(B^T X)_{pq}\\ &=(BX+B^T X)_{pq}=BX+B^T X \end{aligned} XTr(XBXT)=(XijBjkXik)Xpq=XijXpqBjkXik+XijBjkXikXpq=δipδjqBjkXik+XijBjkδipδkq=BqkXpk+XpjBjq=Xpk(BT)kq+(XB)pq=(XBT)pq+(XB)pq=(XBT+XB)pq=XBT+XB\begin{aligned} \frac{\partial}{\partial X} \textnormal{Tr}(XBX^T) &= \frac{\partial (X_{ij}B_{jk}X_{ik})}{\partial X_{pq}} \\ &=\frac{\partial X_{ij}}{\partial X_{pq}} B_{jk}X_{ik}+ X_{ij}B_{jk} \frac{\partial X_{ik}}{\partial X_{pq}} \\ &=\delta_{ip}\delta_{jq}B_{jk}X_{ik}+X_{ij}B_{jk}\delta_{ip}\delta_{kq} =B_{qk} X_{pk}+X_{pj} B_{jq} \\ &=X_{pk} (B^T)_{kq}+(XB)_{pq}=(XB^T)_{pq}+(XB)_{pq}=(XB^T+XB)_{pq} \\ &=XB^T+XB \end{aligned}

Traces of AXBXAXBX, BTXCXBB^T XCXB, XTBXCX^T BXC and AXBXTCAXBX^TC

XTr(AXBX)=(AijXjkBklXli)Xpq=AijXjkXpqBklXli+AijXjkBklXliXpq=AijδjpδkqBklXli+AijXjkBklδlpδiq=AipBqlXli+AqjXjkBkp=(AT)pi(XT)il(BT)lq+(AXB)qp=(ATXTBT)pq+(AXB)pqT=(ATXTBT+(AXB)T)pq=ATXTBT+BTXTAT\begin{aligned} \frac{\partial}{\partial X} \textnormal{Tr}(AXBX) &= \frac{\partial (A_{ij}X_{jk}B_{kl}X_{li})}{\partial X_{pq}} \\ & = A_{ij} \frac{\partial X_{jk}}{\partial X_{pq}} B_{kl}X_{li}+ A_{ij}X_{jk}B_{kl}\frac{\partial X_{li}}{\partial X_{pq}} \\ &=A_{ij}\delta_{jp}\delta_{kq} B_{kl}X_{li}+ A_{ij}X_{jk}B_{kl}\delta_{lp}\delta_{iq} =A_{ip}B_{ql}X_{li}+A_{qj}X_{jk}B_{kp} \\ &=(A^T)_{pi} (X^T)_{il} (B^T)_{lq} + (AXB)_{qp}=(A^T X^T B^T)_{pq}+(AXB)^T_{pq} \\ &=(A^T X^T B^T+(AXB)^T)_{pq}=A^T X^T B^T+B^T X^T A^T \end{aligned} XTr(ATXCAX)=(AjiXjkCklAlmXmi)Xpq=AjiXjkXpqCklAlmXmi+AjiXjkCklAlmXmiXpq=AjiδjpδkqCklAlmXmi+AjiXjkCklAlmδmpδiq=ApiCqlAlmXmi+AjqXjkCklAlp=Api(XT)im(AT)ml(CT)lq+(AT)pl(CT)lk(XT)kjAjq=(AXTATCT)pq+(ATCTXTA)pq=AXTATCT+ATCTXTA\begin{aligned} \frac{\partial}{\partial X} \textnormal{Tr}(A^T XCAX) &= \frac{\partial(A_{ji}X_{jk}C_{kl}A_{lm}X_{mi})}{\partial X_{pq}} \\ &=A_{ji} \frac{\partial X_{jk}}{\partial X_{pq}} C_{kl}A_{lm}X_{mi}+ A_{ji}X_{jk}C_{kl}A_{lm}\frac{\partial X_{mi}}{\partial X_{pq}} \\ &=A_{ji}\delta_{jp}\delta_{kq}C_{kl}A_{lm}X_{mi}+ A_{ji}X_{jk}C_{kl}A_{lm}\delta_{mp}\delta_{iq} \\ &=A_{pi}C_{ql}A_{lm}X_{mi}+A_{jq}X_{jk}C_{kl}A_{lp}\\ &=A_{pi}(X^T)_{im}(A^T)_{ml}(C^T)_{lq}+(A^T)_{pl}(C^T)_{lk}(X^T)_{kj}A_{jq}\\ &=(AX^TA^TC^T)_{pq}+(A^TC^TX^TA)_{pq}=AX^TA^TC^T+A^TC^TX^TA \end{aligned} XTr(XTBXC)=(XjiBjkXklCli)Xpq=δjpδiqBjkXklCli+XjiBjkδkpδlqCli=BpkXklClq+XjiBjpCqi=(BXC)pq+(BT)pjXji(CT)iq=(BXC)pq+(BTXC)pq=BXC+BTXC\begin{aligned} \frac{\partial}{\partial X} \textnormal{Tr}(X^TBXC)&= \frac{\partial(X_{ji}B_{jk}X_{kl}C_{li})}{\partial X_{pq}} \\ &=\delta_{jp}\delta_{iq}B_{jk}X_{kl}C_{li}+X_{ji}B_{jk}\delta_{kp}\delta_{lq}C_{li}\\ &=B_{pk}X_{kl}C_{lq}+X_{ji}B_{jp}C_{qi}=(BXC)_{pq}+(B^T)_{pj}X_{ji}(C^T)_{iq} \\ &=(BXC)_{pq}+(B^TXC)_{pq}=BXC+B^TXC \end{aligned} XTr(AXBXTC)=(XjiBjkXklCli)Xpq=δjpδiqBjkXklCli+XjiBjkδkpδlqCli=BpkXklClq+BjpXjiCqi=(BXC)pq+(BTXCT)pq=BXC+BTXCT\begin{aligned} \frac{\partial}{\partial X} \textnormal{Tr}(AXBX^TC)&= \frac{\partial(X_{ji}B_{jk}X_{kl}C_{li})}{\partial X_{pq}} \\ &=\delta_{jp}\delta_{iq}B_{jk}X_{kl}C_{li}+X_{ji}B_{jk}\delta_{kp}\delta_{lq}C_{li}\\ &=B_{pk}X_{kl}C_{lq}+B_{jp}X_{ji}C_{qi}=(BXC)_{pq}+(B^TXC^T)_{pq}\\ &=BXC+B^TXC^T \end{aligned}
CC BY-SA 4.0 Miguel Bustamante. Last modified: August 08, 2023. Website built with Franklin.jl and the Julia programming language.