核范數求次梯度


Start with the SVD decomposition of $x$:

$$x=U\Sigma V^T$$

Then $$\|x\|_*=tr(\sqrt{x^Tx})=tr(\sqrt{(U\Sigma V^T)^T(U\Sigma V^T)})$$

$$\Rightarrow \|x\|_*=tr(\sqrt{V\Sigma U^T U\Sigma V^T})=tr(\sqrt{V\Sigma^2V^T})$$

By circularity of trace:

$$\Rightarrow \|x\|_*=tr(\sqrt{V^TV\Sigma^2})=tr(\sqrt{V^TV\Sigma^2})=tr(\sqrt{\Sigma^2})=tr(\Sigma)$$

Since the elements of $\Sigma$ are non-negative.

Therefore nuclear norm can be also defined as the sum of the absolute values of the singular value decomposition of the input matrix.

Now, note that the absolute value function is not differentiable on every point in its domain, but you can find a subgradient.


$$\frac{\partial \|x\|_*}{\partial x}=\frac{\partial tr(\Sigma)}{\partial x}=\frac{ tr(\partial\Sigma)}{\partial x}$$

You should find $\partial\Sigma$. Since $\Sigma$ is diagonal, the subdifferential set of $\Sigma$ is: $\partial\Sigma=\Sigma\Sigma^{-1}\partial\Sigma$, now we have:

$$\frac{\partial \|x\|_*}{\partial x}=\frac{ tr(\Sigma\Sigma^{-1}\partial\Sigma)}{\partial x}$$ (I)

So we should find $\partial\Sigma$.

$x=U\Sigma V^T$, therefore:
$$\partial x=\partial U\Sigma V^T+U\partial\Sigma V^T+U\Sigma\partial V^T$$

Therefore:

$$U\partial\Sigma V^T=\partial x-\partial U\Sigma V^T-U\Sigma\partial V^T$$

$$\Rightarrow U^TU\partial\Sigma V^TV=U^T\partial xV-U^T\partial U\Sigma V^TV-U^TU\Sigma\partial V^TV$$


$$\Rightarrow \partial\Sigma =U^T\partial xV-U^T\partial U\Sigma - \Sigma\partial V^TV$$

\begin{align}
\Rightarrow\\
tr(\partial\Sigma) &=& tr(U^T\partial xV-U^T\partial U\Sigma - \Sigma\partial V^TV)\\
&=& tr(U^T\partial xV)+tr(-U^T\partial U\Sigma - \Sigma\partial V^TV)
\end{align}


You can show that $tr(-U^T\partial U\Sigma - \Sigma\partial V^TV)=0$ (Hint: diagonal and antisymmetric matrices, proof in the comments.), therefore:

$$tr(\partial\Sigma) = tr(U^T\partial xV)$$

By substitution into (I):

$$\frac{\partial \|x\|_*}{\partial x}= \frac{ tr(\partial\Sigma)}{\partial x} =\frac{ tr(U^T\partial xV)}{\partial x}=\frac{ tr(VU^T\partial x)}{\partial x}=(VU^T)^T$$

Therefore you can use $U V^T$ as the subgradient.

 

 參考:這里


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM