sigmoid(x) 函数定义:
\[\begin{align*} \sigma(x) &= \frac{1}{1+e^{-x}} \\ {\sigma \prime (x)} &= \sigma(x)(1-\sigma(x)) \end{align*} \]
令 \(z=w \cdot x\), 逻辑斯谛回归分类模型定义如下:
\[\begin{align*} P(Y=1|x) &= \frac{e^{w \cdot x}}{1+e^{w \cdot x}}\\ &= \frac{1}{1+e^{-w \cdot x}}\\ &=\sigma (w \cdot x)\\ &=\sigma (z) \quad \quad \quad \quad\\ P(Y=0|x) &= \frac{1}{1+e^{w \cdot x}} \\ &=1 - \sigma (z)\\ \end{align*} \]
由上可知, 二分类问题中, sigmoid 函数的输出就是标签值为 1 的样本对应的预测值
由交叉熵函数定义:
\[crossEntropy = -\sum_{i=0}^{classNum}{y^{label}_i\log {y^{pred}_i}} \]
\(y^{label}, y^{pred}\) 都是概率分布, 而标签 \(y^{label}\) 使用 one-hot 编码来表明概率分布.
对于二分类来说, 交叉熵损失定义如下:
\[\begin{align*} Loss_{crossEntropy}& = - \left[ y^{label}_0 \log y_0^{pred} + y^{label}_1 \log y_1^{pred}\right] \\ &= - \left[ y^{label}_1 \log y_1^{pred} + (1- y^{label}_1) \log(1- y_1^{pred})\right] \end{align*} \]
而且 sigmoid 函数的输出就是标签值为 1 的样本对应的预测值,故
\[\begin{align*} Loss &= - \left[ y^{label}_1 \log \sigma(z) + (1- y^{label}_1) \log (1- \sigma(z)) \right] \\ \frac{\partial{Loss}}{{\partial{\sigma(z)}}} &= - \left[ \frac {y^{label}_1}{\sigma(z)} - \frac{(1- y^{label}_1)}{(1- \sigma(z))} \right] \\ &= \frac {\sigma(z) -y^{label}_1}{\sigma(z){(1- \sigma(z))}} \\ \end{align*} \]
已知 \(z = w \cdot x\), 那么 Loss 函数对 w 的倒数为
\[\begin{align*} \frac{\partial Loss}{\partial w} &= \frac{\partial{Loss}}{{\partial{\sigma(z)}}}\frac{\partial{\sigma(z)}}{{\partial{z}}}\frac{\partial{z}}{{\partial{w}}} \\ &= \frac {\sigma(z) -y^{label}_1}{\sigma(z){(1- \sigma(z))}} \cdot {\sigma(z)(1-\sigma(z))} \cdot x \\ &= x \cdot \left ({\sigma(z) -y^{label}_1} \right) \end{align*} \]
对于偏置 b 而言, 可以看成对应输入为 \(x_0=1\), 则 $\frac{\partial Loss}{\partial b} = \left ({\sigma(z) -y^{label}_1} \right) $