Back Propagation

定义

  • 定义第 jj个神经元激活函数的输入值为 vjv_{j},则 yj=φj(vj)y_{j}=\varphi_{j}(v_{j}),诱导局部域 vj=i=1Nk(wjiyi)v_{j}=\sum_{i=1}^{N_{k}}(w_{ji}y_{i})NkN_{k} 是作用于神经元 jj的所有输入个数,权值 ωj0\omega_{j0}等于神经元 jj的偏置 bjb_{j}
  • 定义误差信号,ej=djyje_{j}=d_{j}-y_{j},第 jj个神经元输出预期值与实际值的差值。
  • 定义瞬时误差能量,εj=12ej2\varepsilon_{j}=\frac{1}{2}e_{j}^2,全部瞬时误差能量为,ε=12jCej2\varepsilon=\frac{1}{2}\sum_{j\in C}e_{j}^2
  • 定义平均误差能量,εav=12Nn=1NjCej2\varepsilon_{av}=\frac{1}{2N}\sum_{n=1}^{N}\sum_{j\in C}e_{j}^2

随机方式的权值调整是以样例-样例(example)为基础的,最小化代价函数为全部瞬时误差能量 ε\varepsilon

批量方式的权值调整是以回合-回合(epoch)为基础的,最小化代价函数为平均误差能量 εav\varepsilon_{av}

随机方式

计算权值关于代价函数的偏导数,使用链式法则如下

εωji=εejejyjyjvjvjωji=ej(1)φj(vj)yi=ejφj(vj)yi\begin{aligned} \frac{\partial \varepsilon}{\partial \omega_{ji}} &= \frac{\partial \varepsilon}{\partial e_j}\frac{\partial e_j}{\partial y_j} \frac{\partial y_j}{\partial v_j} \frac{\partial v_j}{\partial \omega_{ji}}\\ &= e_j\cdot (-1) \cdot\varphi_{j}'(v_j)\cdot{y_i}\\ &= -e_j\varphi_{j}'(v_j){y_i} \end{aligned}

定义局域梯度 δj\delta_{j},可以上式改写

δj=εvj=εejejyjyjvj=ejφj(vj)\delta_j=-\frac{\partial \varepsilon}{\partial v_j}=-\frac{\partial \varepsilon}{\partial e_j} \frac{\partial e_j}{\partial y_j} \frac{\partial y_j}{\partial v_j}=e_j\varphi_j^{\prime}\left(v_j\right)

εωji=δjyi\frac{\partial \varepsilon}{\partial \omega_{ji}}=-\delta_{j}y_{i}

可以得到修正 Δωji\Delta\omega_{ji}η\eta 是反向传播算法的学习率

Δωji=ηεωji=ηδjyi\begin{aligned} \Delta\omega_{ji}&=-\eta\frac{\partial \varepsilon}{\partial \omega_{ji}}\\ &=\eta\delta_{j}y_{i}\\ \end{aligned}

当神经元 jj是输出节点时,可以直接得出局域梯度如下:

δj=ejφj(tj)\delta_{j} = e_{j}\varphi_{j}'(t_{j})

当神经元 jj是隐藏节点时,重新定义局域梯度

δj=εyjyjvj=εyjφj(vj)\delta_j=-\frac{\partial \varepsilon}{\partial y_j} \frac{\partial y_j}{\partial v_j}=-\frac{\partial \varepsilon}{\partial y_j} \varphi_j^{\prime}\left(v_j\right)

εyj=kekejyj=kekejvjvjyj\frac{\partial \varepsilon}{\partial y_j}=\sum_{k}e_{k}\frac{\partial e_j}{\partial y_j}=\sum_{k}e_{k}\frac{\partial e_j}{\partial v_j}\frac{\partial v_j}{\partial y_j}

由于 en=dnyn=dnφn(vn)e_{n}=d_{n}-y_{n}=d_{n}-\varphi_{n}(v_{n}) 以及 vj=i=1Nk(wjiyi)v_{j}=\sum_{i=1}^{N_{k}}(w_{ji}y_{i})

εyj=kekejvjvjyj=kek(φkvk)(wkj)=kδkwkj\begin{aligned} \frac{\partial \varepsilon}{\partial y_j}&=\sum_{k}e_{k}\frac{\partial e_j}{\partial v_j}\frac{\partial v_j}{\partial y_j}\\ &=\sum_{k}e_{k}(-\varphi_k^{\prime}v_{k})(w_{kj})\\ &=-\sum_{k}\delta_{k}w_{kj} \end{aligned}

故可得局域梯度为

δj=εyjφj(vj)=φjkδkwkj\begin{aligned} \delta_j&=-\frac{\partial \varepsilon}{\partial y_j} \varphi_j^{\prime}\left(v_j\right)\\ &=\varphi_j^{\prime}\sum_{k}\delta_{k}w_{kj} \end{aligned}

批量方式

对于批量方式,我们采用的是平均误差能量εav=12Nn=1NjCej2\varepsilon_{av}=\frac{1}{2N}\sum_{n=1}^{N}\sum_{j\in C}e_{j}^2,故

εej=1Nn=1Nej\frac{\partial \varepsilon}{\partial e_j}=\frac{1}{N}\sum_{n=1}^{N}e_{j}

将其带入随机方式推导过程中,计算权值关于代价函数的偏导数:

εωji=εejejyjyjvjvjωji=1Nn=1Nej(1)φj(vj)yi=1Nn=1Nejφj(vj)yi\begin{aligned} \frac{\partial \varepsilon}{\partial \omega_{ji}} &= \frac{\partial \varepsilon}{\partial e_j}\frac{\partial e_j}{\partial y_j} \frac{\partial y_j}{\partial v_j} \frac{\partial v_j}{\partial \omega_{ji}}\\ &= \frac{1}{N}\sum_{n=1}^{N}e_{j}\cdot (-1) \cdot\varphi_{j}'(v_j)\cdot{y_i}\\ &= -\frac{1}{N}\sum_{n=1}^{N}e_{j}\varphi_{j}'(v_j){y_i} \end{aligned}

εωji=1Nn=1Nδjyi\frac{\partial \varepsilon}{\partial \omega_{ji}}=-\frac{1}{N}\sum_{n=1}^{N}\delta_{j}y_{i}

当神经元 jj是输出节点时,可以直接得出局域梯度如下:

δj=ejφj(tj)\delta_{j} = e_{j}\varphi_{j}'(t_{j})

当神经元 jj是隐藏节点时,重新定义局域梯度

δj=εyjyjvj=εyjφj(vj)\delta_j=-\frac{\partial \varepsilon}{\partial y_j} \frac{\partial y_j}{\partial v_j}=-\frac{\partial \varepsilon}{\partial y_j} \varphi_j^{\prime}\left(v_j\right)

εyj=1Nn=1Nkekejyj=1Nn=1Nkekejvjvjyj\frac{\partial \varepsilon}{\partial y_j}=\frac{1}{N}\sum_{n=1}^{N}\sum_{k}e_{k}\frac{\partial e_j}{\partial y_j}=\frac{1}{N}\sum_{n=1}^{N}\sum_{k}e_{k}\frac{\partial e_j}{\partial v_j}\frac{\partial v_j}{\partial y_j}

由于 en=dnyn=dnφn(vn)e_{n}=d_{n}-y_{n}=d_{n}-\varphi_{n}(v_{n}) 以及 vj=k(wjiyi)v_{j}=\sum_{k}(w_{ji}y_{i})

εyj=1Nn=1Nkekejvjvjyj=1Nn=1Nkek(φkvk)(wkj)=1Nn=1Nkδkwkj\begin{aligned} \frac{\partial \varepsilon}{\partial y_j}&=\frac{1}{N}\sum_{n=1}^{N}\sum_{k}e_{k}\frac{\partial e_j}{\partial v_j}\frac{\partial v_j}{\partial y_j}\\ &=\frac{1}{N}\sum_{n=1}^{N}\sum_{k}e_{k}(-\varphi_k^{\prime}v_{k})(w_{kj})\\ &=-\frac{1}{N}\sum_{n=1}^{N}\sum_{k}\delta_{k}w_{kj} \end{aligned}

故可得局域梯度为

δj=εyjφj(vj)=φj1Nn=1Nkδkwkj\begin{aligned} \delta_j&=-\frac{\partial \varepsilon}{\partial y_j} \varphi_j^{\prime}\left(v_j\right)\\ &=\varphi_j^{\prime}\frac{1}{N}\sum_{n=1}^{N}\sum_{k}\delta_{k}w_{kj} \end{aligned}