核范数求次梯度


Start with the SVD decomposition of $x$:

$$x=U\Sigma V^T$$

Then $$\|x\|_*=tr(\sqrt{x^Tx})=tr(\sqrt{(U\Sigma V^T)^T(U\Sigma V^T)})$$

$$\Rightarrow \|x\|_*=tr(\sqrt{V\Sigma U^T U\Sigma V^T})=tr(\sqrt{V\Sigma^2V^T})$$

By circularity of trace:

$$\Rightarrow \|x\|_*=tr(\sqrt{V^TV\Sigma^2})=tr(\sqrt{V^TV\Sigma^2})=tr(\sqrt{\Sigma^2})=tr(\Sigma)$$

Since the elements of $\Sigma$ are non-negative.

Therefore nuclear norm can be also defined as the sum of the absolute values of the singular value decomposition of the input matrix.

Now, note that the absolute value function is not differentiable on every point in its domain, but you can find a subgradient.


$$\frac{\partial \|x\|_*}{\partial x}=\frac{\partial tr(\Sigma)}{\partial x}=\frac{ tr(\partial\Sigma)}{\partial x}$$

You should find $\partial\Sigma$. Since $\Sigma$ is diagonal, the subdifferential set of $\Sigma$ is: $\partial\Sigma=\Sigma\Sigma^{-1}\partial\Sigma$, now we have:

$$\frac{\partial \|x\|_*}{\partial x}=\frac{ tr(\Sigma\Sigma^{-1}\partial\Sigma)}{\partial x}$$ (I)

So we should find $\partial\Sigma$.

$x=U\Sigma V^T$, therefore:
$$\partial x=\partial U\Sigma V^T+U\partial\Sigma V^T+U\Sigma\partial V^T$$

Therefore:

$$U\partial\Sigma V^T=\partial x-\partial U\Sigma V^T-U\Sigma\partial V^T$$

$$\Rightarrow U^TU\partial\Sigma V^TV=U^T\partial xV-U^T\partial U\Sigma V^TV-U^TU\Sigma\partial V^TV$$


$$\Rightarrow \partial\Sigma =U^T\partial xV-U^T\partial U\Sigma - \Sigma\partial V^TV$$

\begin{align}
\Rightarrow\\
tr(\partial\Sigma) &=& tr(U^T\partial xV-U^T\partial U\Sigma - \Sigma\partial V^TV)\\
&=& tr(U^T\partial xV)+tr(-U^T\partial U\Sigma - \Sigma\partial V^TV)
\end{align}


You can show that $tr(-U^T\partial U\Sigma - \Sigma\partial V^TV)=0$ (Hint: diagonal and antisymmetric matrices, proof in the comments.), therefore:

$$tr(\partial\Sigma) = tr(U^T\partial xV)$$

By substitution into (I):

$$\frac{\partial \|x\|_*}{\partial x}= \frac{ tr(\partial\Sigma)}{\partial x} =\frac{ tr(U^T\partial xV)}{\partial x}=\frac{ tr(VU^T\partial x)}{\partial x}=(VU^T)^T$$

Therefore you can use $U V^T$ as the subgradient.

 

 参考:这里


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM