小編延續之前的教學繼續教大家如何把前面所講的公式用python一步一步實作出來,這裡選擇的是用ipython notebook實作,這種筆記本也是小編愛上python的原因,有了ipython notebook程式碼和解說公式可以放在一起互相比對,也可以把實驗結果跑出來的圖放在筆記本上,簡直是神器阿XD。在公式旁小編都會附上對應的程式碼,如果還有不懂得歡迎留言詢問 ^.^

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
 
In [2]:
import numpy as np
 

設置神經元層數,並初始化參數

In [3]:
sizes=[2,3,1]
num_layers = len(sizes)
biases = [np.random.randn(y, 1) for y in sizes[1:]]  #輸入層沒有bias
weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]  #23 31
In [4]:
np.random.randn(2, 3) #Return a sample (or samples) from the “standard normal” distribution.
Out[4]:
array([[-0.40753557,  1.69663529,  1.68384902],
       [ 0.2270228 , -0.67659325, -1.56037246]])
 
  • 第一行為隱藏層的偏權值,第二行為輸出神經元的偏權值
In [5]:
biases
Out[5]:
[array([[ 1.67783431],
        [ 1.48388232],
        [ 1.25672544]]), array([[-0.37243728]])]
 
  • 第一個array為輸入層與隱藏層之間的權重
  • 第二個array為隱藏層與輸出層之間的權重
In [6]:
weights
Out[6]:
[array([[-0.67155003, -0.16412354],
        [ 1.03533557, -1.62712015],
        [ 0.81559388,  0.33848189]]),
 array([[ 1.55154725,  0.75984726, -0.83059525]])]
 

準備矩陣儲存算出的偏微分值

In [7]:
nabla_b = [np.zeros(b.shape) for b in biases]
nabla_b
Out[7]:
[array([[ 0.],
        [ 0.],
        [ 0.]]), array([[ 0.]])]
In [8]:
nabla_w = [np.zeros(w.shape) for w in weights]
nabla_w
Out[8]:
[array([[ 0.,  0.],
        [ 0.,  0.],
        [ 0.,  0.]]), array([[ 0.,  0.,  0.]])]
 

定義函數

In [9]:
def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z)*(1-sigmoid(z))

def cost_derivative(output_activations, y):
        """Return the vector of partial derivatives \partial C_x / \partial a for the output activations."""
        return (output_activations-y)
 

製造訓練數據 x,y

In [10]:
np.random.seed(1)
x = 10 * np.random.randn(sizes[0], 1)
y = np.array([1])
print x
print y
 
[[ 16.24345364]
 [ -6.11756414]]
[1]
 

前饋網路

 

前饋網路矩陣示意圖 Wlayer2 X layer1 * Wlayer1 X N = Wlayer2 X N

 
  輸入層第0顆 輸入層第1顆   輸入層第1組 (X2*n)   Z3*n
隱藏層第0顆 W0,0 W0,1   X0   W0,0 X0 +W0,1 X1+ b0
隱藏層第1顆 W1,0 W1,1 * X1 => W1,0 X0 +W1,1 X1+ b1
隱藏層第2顆 W2,0 W2,1       W2,0 X0 +W2,1 X1+ b2
In [11]:
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(biases, weights):    
    z = np.dot(w, activation)+b
    zs.append(z)
    activation = sigmoid(z)
    activations.append(activation)
 
  • 各層的輸出神經元Z值,zs中包含兩個矩陣,np.array(zs)[0]就是第一個矩陣也就是代表隱藏層的輸出Z值
In [12]:
print np.array(zs)
 
[array([[ -8.22642123],
       [ 28.25531946],
       [ 12.43410211]])
 array([[-0.44276705]])]
 
  • 各層的輸出神經元a值
In [13]:
print np.array(activations)
 
[array([[ 16.24345364],
       [ -6.11756414]])
 array([[  2.67420380e-04],
       [  1.00000000e+00],
       [  9.99996020e-01]])
 array([[ 0.39108184]])]
 

後饋網路

 
  • 算出最後一層的敏感度delta $$\delta_{j}^{L} = \frac{\partial E}{\partial a_{j}^{L}}\ f^{'}\left( z_{j}^{L} \right)$$
In [14]:
delta = cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1])
delta
Out[14]:
array([[-0.14500584]])
 
  • 總誤差對輸出層b的微分就等於最後一層的敏感度 $$\frac{\partial E}{\partial b_{j}^{L}} = \delta_{j}^{L}$$
In [15]:
nabla_b[-1] = delta
 
  • 根據公式算出最後一層總誤差對weight的微分 $$\frac{\partial E}{\partial w_{\text{jk}}^{l}} = a_{k}^{l - 1}\delta_{j}^{l}$$
 

nabla_wj,k=deltaj,1 * a1,k

In [16]:
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
nabla_w[-1]
Out[16]:
array([[ -3.87775177e-05,  -1.45005844e-01,  -1.45005266e-01]])
 
  • 算出倒數第二層函數微分 $$f^{'}\left( z_{k}^{L - 1} \right)$$
In [17]:
z = zs[-2]
f_prime = sigmoid_prime(z)
print f_prime
 
[[  2.67348866e-04]
 [  5.35571587e-13]
 [  3.98047234e-06]]
 
  • 代入公式 $$\delta_{k}^{l - 1} = f^{'}\left( z_{k}^{l - 1} \right)*\sum_{j}^{}{\delta_{j}^{l}w_{\text{jk}}^{l}}$$
 

wk,j X deltaj,1 * f_primek,1

In [18]:
delta_l_1 = np.dot(weights[-1].transpose(), delta) * f_prime
delta_l_1
Out[18]:
array([[ -6.01490617e-05],
       [ -5.90105057e-14],
       [  4.79412722e-07]])
 

將算出的delta放入nabla_b矩陣中

In [19]:
nabla_b[-2] = delta_l_1
print nabla_b[-2]
 
[[ -6.01490617e-05]
 [ -5.90105057e-14]
 [  4.79412722e-07]]
 
  • 根據公式算出最後第二層總誤差對weight的微分 $$\frac{\partial E}{\partial w_{\text{ki}}^{l-1}} = a_{i}^{l - 2}\delta_{k}^{l-1}$$
 

nabla_wk,i=deltak,1 * a1,i

In [20]:
nabla_w[-2] = np.dot(delta_l_1, activations[-2-1].transpose())
print nabla_w[-2]
 
[[ -9.77028495e-04   3.67965743e-04]
 [ -9.58534413e-13   3.61000553e-13]
 [  7.78731833e-06  -2.93283808e-06]]
 

寫成函式全部合再一起

In [21]:
def backprop( x, y):
        nabla_b = [np.zeros(b.shape) for b in biases]
        nabla_w = [np.zeros(w.shape) for w in weights]
        # feedforward
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(biases, weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass
        delta = cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        # l的定義在程式中不一樣,l=1代表最後一層,l=2代表倒數第二層
        for l in xrange(2, num_layers):
            z = zs[-l]
            f_prime = sigmoid_prime(z)
            delta = np.dot(weights[-l+1].transpose(), delta) * f_prime
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)
 

X丟輸入,Y丟輸出,經由backprop副函式算出偏微分

In [22]:
nabla_b, nabla_w=backprop(x,y)
In [23]:
nabla_w
Out[23]:
[array([[ -9.77028495e-04,   3.67965743e-04],
        [ -9.58534413e-13,   3.61000553e-13],
        [  7.78731833e-06,  -2.93283808e-06]]),
 array([[ -3.87775177e-05,  -1.45005844e-01,  -1.45005266e-01]])]
In [24]:
nabla_b
Out[24]:
[array([[ -6.01490617e-05],
        [ -5.90105057e-14],
        [  4.79412722e-07]]), array([[-0.14500584]])]
 

重要公式總整理

 

公式1:根據公式算出最後一層總誤差對weight的微分 $$\frac{\partial E}{\partial w_{\text{jk}}^{l}} = a_{k}^{l - 1}\delta_{j}^{l}$$

 

nabla_wj,k=deltaj,1 * a1,k

 

公式2:根據前一層算出的delta算出當層delta $$\delta_{k}^{l - 1} = f^{'}\left( z_{k}^{l - 1} \right)*\sum_{j}^{}{\delta_{j}^{l}w_{\text{jk}}^{l}}$$

 

wk,j X deltaj,1 * f_primek,1

 

參考資料:

arrow
arrow
    全站熱搜

    Darwin的AI天地 發表在 痞客邦 留言(0) 人氣()