交叉熵损失(Cross Entropy)求导

2019-10-22 14:20:51 浏览数 (1)

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。

本文链接:https://blog.csdn.net/chaipp0607/article/details/101946040

Cross Entropy是分类问题中常见的一种损失函数,我们在之前的文章提到过二值交叉熵的证明和交叉熵的作用,下面解释一下交叉熵损失的求导。 首先一个模型的最后一层神经元的输出记为f0...fif_{0}...f_{i}f0​...fi​, 输出经过softmax激活之后记为p0...pip_{0}...p_{i}p0​...pi​,那么: pi=efi∑k=0C−1efkp_{i} = frac{e^{f_{i}}}{sum_{k=0}^{C-1} e^{f_{k}}}pi​=∑k=0C−1​efk​efi​​ 类别的实际标签记为y0...yiy_{0}...y_{i}y0​...yi​,那么交叉熵损失L为: L=−∑i=0C−1yilogpiL = -sum_{i=0}^{C-1} y_{i}log^{p_{i}}L=−i=0∑C−1​yi​logpi​ 上式中的logloglog是一种简写,为了后续的求导方便,一般我们认为logloglog的底是eee,即logloglog为lnlnln。 那么LLL对第iii个神经元的输出fif_{i}fi​求偏导∂L∂fifrac{partial L}{partial f_{i}}∂fi​∂L​: 根据复合函数求导原则: ∂L∂fi=∑j=0C−1∂Lj∂pj∂pj∂fifrac{partial L}{partial f_{i}} = sum_{j=0}^{C-1} frac{partial L_{j}}{partial p_{j}}frac{partial p_{j}}{partial f_{i}}∂fi​∂L​=j=0∑C−1​∂pj​∂Lj​​∂fi​∂pj​​ 在这里需要说明,在softmax中我们使用了下标iii和kkk,在交叉熵中使用了下标iii,但是这里的两个iii并不等价,因为softmax的分母中包含了每个神经元的输出fff,也就是激活后所有的ppp对任意的fif_{i}fi​求偏导都不为0,同时LLL中又包含了所有的ppp,所以为了避免重复我们需要为ppp引入一个新的下标jjj,jjj有0...C−10...C-10...C−1这C种情况。 那么依次求导:

∂Lj∂pj=∂(−yjlogpj)∂(pj)frac{partial L_{j}}{partial p_{j}}= frac{partial (-y_{j}log^{p_{j}})}{partial (p_{j})}∂pj​∂Lj​​=∂(pj​)∂(−yj​logpj​)​

由于默认一般我们认为logloglog的底是eee,即logloglog为lnlnln,所以:

∂Lj∂pj=∂(−yjlogpj)∂(pj)=−yjpjfrac{partial L_{j}}{partial p_{j}}= frac{partial (-y_{j}log^{p_{j}})}{partial (p_{j})} =-frac{y_{j}}{p_{j}}∂pj​∂Lj​​=∂(pj​)∂(−yj​logpj​)​=−pj​yj​​

接着要求∂pj∂fifrac{partial p_{j}}{partial f_{i}}∂fi​∂pj​​的值,在这里可以发现,每一个pjp_{j}pj​中都包含fif_{i}fi​,所以∂pj∂fifrac{partial p_{j}}{partial f_{i}}∂fi​∂pj​​都不是0,但是j=ij=ij=i和j≠ij neq ij​=i的时候,∂pj∂fifrac{partial p_{j}}{partial f_{i}}∂fi​∂pj​​结果又不相同,所以这里需要分开讨论:

  • 首先j=ij=ij=i时: ∂pj∂fi=∂pi∂fi=∂efi∑k=0C−1efk∂fifrac{partial p_{j}}{partial f_{i}} = frac{partial p_{i}}{partial f_{i}} = frac{partial frac{e^{f_{i}}}{sum_{k=0}^{C-1} e^{f_{k}}}}{partial f_{i}} ∂fi​∂pj​​=∂fi​∂pi​​=∂fi​∂∑k=0C−1​efk​efi​​​ =(efi)′∑k=0C−1efk−efi(∑k=0C−1efk)′(∑k=0C−1efk)2= frac{ (e^{f_{i}})' sum_{k=0}^{C-1} e^{f_{k}} - e^{f_{i}}(sum_{k=0}^{C-1} e^{f_{k}})' }{(sum_{k=0}^{C-1} e^{f_{k}})^{2}} =(∑k=0C−1​efk​)2(efi​)′∑k=0C−1​efk​−efi​(∑k=0C−1​efk​)′​ =efi∑k=0C−1efk−(efi)2(∑k=0C−1efk)2=efi∑k=0C−1efk−(efi∑k=0C−1efk)2= frac{ e^{f_{i}}sum_{k=0}^{C-1} e^{f_{k}} - (e^{f_{i}})^2 }{(sum_{k=0}^{C-1} e^{f_{k}})^{2}}= frac{ e^{f_{i}} }{sum_{k=0}^{C-1} e^{f_{k}}} - (frac{ e^{f_{i}} }{sum_{k=0}^{C-1} e^{f_{k}}})^2=(∑k=0C−1​efk​)2efi​∑k=0C−1​efk​−(efi​)2​=∑k=0C−1​efk​efi​​−(∑k=0C−1​efk​efi​​)2 =pi−(pi)2=pi(1−pi) = p_{i}-(p{i})^2 = p_{i}(1-p_{i})=pi​−(pi)2=pi​(1−pi​)
  • 然后j≠ijneq ij​=i时: ∂pj∂fi=∂efj∑k=0C−1efk∂fifrac{partial p_{j}}{partial f_{i}}= frac{partial frac{e^{f_{j}}}{sum_{k=0}^{C-1} e^{f_{k}}}}{partial f_{i}} ∂fi​∂pj​​=∂fi​∂∑k=0C−1​efk​efj​​​ =(efj)′∑k=0C−1efk−efj(∑k=0C−1efk)′(∑k=0C−1efk)2= frac{ (e^{f_{j}})' sum_{k=0}^{C-1} e^{f_{k}} - e^{f_{j}}(sum_{k=0}^{C-1} e^{f_{k}})' }{(sum_{k=0}^{C-1} e^{f_{k}})^{2}} =(∑k=0C−1​efk​)2(efj​)′∑k=0C−1​efk​−efj​(∑k=0C−1​efk​)′​ =−efiefj(∑k=0C−1efk)2=−efi∑k=0C−1efkefj∑k=0C−1efk= frac{ - e^{f_{i}} e^{f_{j}} }{(sum_{k=0}^{C-1} e^{f_{k}})^{2}} = - frac{ e^{f_{i}} }{sum_{k=0}^{C-1} e^{f_{k}}} frac{ e^{f_{j}} }{sum_{k=0}^{C-1} e^{f_{k}}}=(∑k=0C−1​efk​)2−efi​efj​​=−∑k=0C−1​efk​efi​​∑k=0C−1​efk​efj​​ =−pipj = -p_{i}p_{j}=−pi​pj​

对于最后的偏导数,需要把上述两个部分加起来: ∂L∂fi=∑j=iC−1∂Lj∂pj∂pj∂fi ∑j≠iC−1∂Lj∂pj∂pj∂fifrac{partial L}{partial f_{i}} = sum_{j=i}^{C-1} frac{partial L_{j}}{partial p_{j}}frac{partial p_{j}}{partial f_{i}} sum_{jneq i}^{C-1} frac{partial L_{j}}{partial p_{j}}frac{partial p_{j}}{partial f_{i}}∂fi​∂L​=j=i∑C−1​∂pj​∂Lj​​∂fi​∂pj​​ j​=i∑C−1​∂pj​∂Lj​​∂fi​∂pj​​ =−yipipi(1−pi) ∑j≠iC−1−pipj(−yjpj)=-frac{y_{i}}{p_{i}}p_{i}(1-p_{i}) sum_{jneq i}^{C-1}-p_{i}p_{j}(-frac{y_{j}}{p_{j}})=−pi​yi​​pi​(1−pi​) j​=i∑C−1​−pi​pj​(−pj​yj​​) =−yi(1−pi) ∑j≠iC−1piyj=-y_{i}(1-p_{i}) sum_{jneq i}^{C-1}p_{i}y_{j}=−yi​(1−pi​) j​=i∑C−1​pi​yj​ =yipi−yi ∑j≠iC−1piyj=y_{i}p_{i}-y_{i} sum_{jneq i}^{C-1}p_{i}y_{j}=yi​pi​−yi​ j​=i∑C−1​pi​yj​

在上式中,j≠ijneq ij​=i的情况中刚好缺了j=ij=ij=i,所以可以继续改写为: =∑j=0C−1piyj−yi=sum_{j=0}^{C-1}p_{i}y_{j} - y_{i} =j=0∑C−1​pi​yj​−yi​ =pi∑j=0C−1yj−yi=p_{i}sum_{j=0}^{C-1}y_{j} - y_{i} =pi​j=0∑C−1​yj​−yi​ 而∑j=0C−1yj=1sum_{j=0}^{C-1}y_{j} = 1∑j=0C−1​yj​=1,所以: =pi∑j=0C−1yj−yi=pi−yi=p_{i}sum_{j=0}^{C-1}y_{j} - y_{i} = p_{i}-y_{i} =pi​j=0∑C−1​yj​−yi​=pi​−yi​

0 人点赞