01.神经网络和深度学习 W2.神经网络基础(作业:逻辑回归 图片识别)

2021-02-19 11:47:03 浏览数 (1)

文章目录

    • 编程题 1
      • 1. numpy 基本函数
        • 1.1 编写 sigmoid 函数
        • 1.2 编写 sigmoid 函数的导数
        • 1.3 reshape操作
        • 1.4 标准化
        • 1.5 广播机制
      • 2. 向量化
        • 2.1 L1L2损失函数
    • 编程题 2. 图片?识别
      • 1. 导入包
      • 2. 数据预览
      • 3. 算法的一般结构
      • 4. 建立算法
        • 4.1 辅助函数
        • 4.2 初始化参数
        • 4.3 前向后向传播
        • 4.4 更新参数,梯度下降
        • 4.5 合并所有函数到Model
        • 4.6 分析
        • 4.7 用自己的照片测试模型
      • 5. 总结

选择题测试,请参考 链接博文

编程题 1

1. numpy 基本函数

1.1 编写 sigmoid 函数
代码语言:javascript复制
import math

def basic_sigmoid(x):
    """
    Compute sigmoid of x.

    Arguments:
    x -- A scalar

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    s = 1/(1 math.pow(math.e, -x))
# or  s = 1/(1 math.exp(-x))
    ### END CODE HERE ###
    
    return s
  • 不推荐使用 math 包,因为深度学习里很多都是向量,math 包不能对向量进行计算
代码语言:javascript复制
### One reason why we use "numpy" instead of "math" in Deep Learning ###
x = [1, 2, 3]
basic_sigmoid(x) # you will see this give an error when you run it, because x is a vector.
# 会报错!
代码语言:javascript复制
import numpy as np

# example of np.exp
x = np.array([1, 2, 3])
print(np.exp(x)) # result is (exp(1), exp(2), exp(3))
# [ 2.71828183  7.3890561  20.08553692]
# numpy 可以对向量进行操作
  • 使用 numpy 编写的 sigmoid 函数
代码语言:javascript复制
import numpy as np # this means you can access numpy functions by writing np.function() instead of numpy.function()

def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    s = 1/(1 np.exp(-x))
    ### END CODE HERE ###
    
    return s
代码语言:javascript复制
x = np.array([1, 2, 3])
sigmoid(x)
# array([0.73105858, 0.88079708, 0.95257413])
1.2 编写 sigmoid 函数的导数
代码语言:javascript复制
# GRADED FUNCTION: sigmoid_derivative

def sigmoid_derivative(x):
    """
    Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
    You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
    
    Arguments:
    x -- A scalar or numpy array

    Return:
    ds -- Your computed gradient.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    s = sigmoid(x)
    ds = s*(1-s)
    ### END CODE HERE ###
    
    return ds
代码语言:javascript复制
x = np.array([1, 2, 3])
sigmoid_derivative(x)
print ("sigmoid_derivative(x) = "   str(sigmoid_derivative(x)))
# sigmoid_derivative(x) = [0.19661193 0.10499359 0.04517666]
1.3 reshape操作

将照片的数据展平,不想计算的维,可以置为 -1,会自动计算

代码语言:javascript复制
# GRADED FUNCTION: image2vector
def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    v = image.reshape(-1,1)
    ### END CODE HERE ###
    
    return v
代码语言:javascript复制
# This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
image = np.array([[[ 0.67826139,  0.29380381],
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])

print ("image2vector(image) = "   str(image2vector(image)))
代码语言:javascript复制
# 输出
image2vector(image) = [[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]
1.4 标准化

标准化通常使得梯度下降收敛更快。

代码语言:javascript复制
# GRADED FUNCTION: normalizeRows

def normalizeRows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).
    
    Argument:
    x -- A numpy matrix of shape (n, m)
    
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
    x_norm = np.linalg.norm(x, axis=1, keepdims=True)
    
    # Divide x by its norm.
    x = x/x_norm
    ### END CODE HERE ###

    return x
代码语言:javascript复制
x = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print("normalizeRows(x) = "   str(normalizeRows(x)))
# normalizeRows(x) = [[0.         0.6        0.8       ]
#					[0.13736056 0.82416338 0.54944226]]
1.5 广播机制

官方文档

代码语言:javascript复制
# GRADED FUNCTION: softmax

def softmax(x):
    """Calculates the softmax for each row of the input x.

    Your code should work for a row vector and also for matrices of shape (n, m).

    Argument:
    x -- A numpy matrix of shape (n,m)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (n,m)
    """
    
    ### START CODE HERE ### (≈ 3 lines of code)
    # Apply exp() element-wise to x. Use np.exp(...).
    x_exp = np.exp(x)

    # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
    x_sum = np.sum(x_exp, axis=1, keepdims=True)
    
    # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
    s = x_exp/x_sum

    ### END CODE HERE ###
    
    return s
代码语言:javascript复制
x = np.array([
    [9, 2, 5, 0, 0],
    [7, 5, 0, 0 ,0]])
print("softmax(x) = "   str(softmax(x)))
代码语言:javascript复制
softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]]

2. 向量化

向量化计算更简洁,更高效

2.1 L1L2损失函数
代码语言:javascript复制
def L1(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L1 loss function defined above
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    loss = np.sum(abs(yhat-y))
    ### END CODE HERE ###
    
    return loss
代码语言:javascript复制
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = "   str(L1(yhat,y)))
# L1 = 1.1

import numpy as np a = np.array([1, 2, 3]) np.dot(a, a) 14

代码语言:javascript复制
# GRADED FUNCTION: L2

def L2(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L2 loss function defined above
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    loss = np.dot(yhat-y, yhat-y)
    ### END CODE HERE ###
    
    return loss
代码语言:javascript复制
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L2 = "   str(L2(yhat,y)))
# L2 = 0.43

编程题 2. 图片?识别

使用神经网络识别猫

1. 导入包

代码语言:javascript复制
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset

%matplotlib inline

2. 数据预览

弄清楚数据的维度 reshape 数据 标准化数据

有训练集,标签为 y = 1 是猫,y = 0 不是猫 有测试集,带标签的 每个图片是 3 通道的

  • 读取数据
代码语言:javascript复制
# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
  • 预览图片
代码语言:javascript复制
# Example of a picture
index = 24
plt.imshow(train_set_x_orig[index])
print ("y = "   str(train_set_y[:, index])   ", it's a '"   
		classes[np.squeeze(train_set_y[:, index])].decode("utf-8")   
		 "' picture.")
代码语言:javascript复制
y = [1], it's a 'cat' picture.
  • 数据大小
代码语言:javascript复制
### START CODE HERE ### (≈ 3 lines of code)
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]
### END CODE HERE ###

print ("Number of training examples: m_train = "   str(m_train))
print ("Number of testing examples: m_test = "   str(m_test))
print ("Height/Width of each image: num_px = "   str(num_px))
print ("Each image is of size: ("   str(num_px)   ", "   str(num_px)   ", 3)")
print ("train_set_x shape: "   str(train_set_x_orig.shape))
print ("train_set_y shape: "   str(train_set_y.shape))
print ("test_set_x shape: "   str(test_set_x_orig.shape))
print ("test_set_y shape: "   str(test_set_y.shape))
代码语言:javascript复制
Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Height/Width of each image: num_px = 64
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)
  • 将样本图片矩阵展平
代码语言:javascript复制
# Reshape the training and test examples

### START CODE HERE ### (≈ 2 lines of code)
train_set_x_flatten = train_set_x_orig.reshape(m_train, -1).T
test_set_x_flatten = test_set_x_orig.reshape(m_test, -1).T
### END CODE HERE ###

print ("train_set_x_flatten shape: "   str(train_set_x_flatten.shape))
print ("train_set_y shape: "   str(train_set_y.shape))
print ("test_set_x_flatten shape: "   str(test_set_x_flatten.shape))
print ("test_set_y shape: "   str(test_set_y.shape))
print ("sanity check after reshaping: "   str(train_set_x_flatten[0:5,0]))
代码语言:javascript复制
train_set_x_flatten shape: (12288, 209)
train_set_y shape: (1, 209)
test_set_x_flatten shape: (12288, 50)
test_set_y shape: (1, 50)
sanity check after reshaping: [17 31 56 22 33]
  • 图片的矩阵数值为 0 - 255,标准化数据
代码语言:javascript复制
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

3. 算法的一般结构

用神经网络的思路,建立一个 Logistic 回归

4. 建立算法

定义模型结构(如,输入的特征个数) 初始化模型参数 循环迭代:

  1. 计算当前损失(前向传播)
  2. 计算当前梯度(后向传播)
  3. 更新参数(梯度下降)
4.1 辅助函数
  • sigmoid 函数
代码语言:javascript复制
# GRADED FUNCTION: sigmoid

def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    ### START CODE HERE ### (≈ 1 line of code)
    s = 1/(1 np.exp(-z))
    ### END CODE HERE ###
    
    return s
4.2 初始化参数

逻辑回归的参数可以都设置为 0(神经网络不可以)

代码语言:javascript复制
# GRADED FUNCTION: initialize_with_zeros

def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    w = np.zeros((dim, 1))
    b = 0
    ### END CODE HERE ###

    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b
4.3 前向后向传播

前向传播:

代码语言:javascript复制
# GRADED FUNCTION: propagate

def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    
    Tips:
    - Write your code step by step for the propagation. np.log(), np.dot()
    """
    
    m = X.shape[1]
    
    # FORWARD PROPAGATION (FROM X TO COST)
    ### START CODE HERE ### (≈ 2 lines of code)
    
    A = sigmoid(np.dot(w.T, X) b)             # compute activation
    # w 是列向量, A 行向量,dot 矩阵乘法
    cost = np.sum(Y*np.log(A) (1-Y)*np.log(1-A))/(-m)  # compute cost
    # Y 行向量,* 对应位置相乘

    ### END CODE HERE ###
    
    # BACKWARD PROPAGATION (TO FIND GRAD)
    ### START CODE HERE ### (≈ 2 lines of code)
    dw = np.dot(X, (A-Y).T)/m
    db = np.sum(A-Y, axis=1)/m
    ### END CODE HERE ###

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost
代码语言:javascript复制
w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = "   str(grads["dw"]))
print ("db = "   str(grads["db"]))
print ("cost = "   str(cost))
代码语言:javascript复制
dw = [[0.99993216]
 [1.99980262]]
db = [0.49993523]
cost = 6.000064773192205
4.4 更新参数,梯度下降
代码语言:javascript复制
# GRADED FUNCTION: optimize

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    This function optimizes w and b by running a gradient descent algorithm
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    
    Tips:
    You basically need to write down two steps and iterate through them:
        1) Calculate the cost and the gradient for the current parameters. Use propagate().
        2) Update the parameters using gradient descent rule for w and b.
    """
    
    costs = []
    
    for i in range(num_iterations):
        
        
        # Cost and gradient calculation (≈ 1-4 lines of code)
        ### START CODE HERE ### 
        grads, cost = propagate(w, b, X, Y)
        ### END CODE HERE ###
        
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        
        # update rule (≈ 2 lines of code)
        ### START CODE HERE ###
        w = w - learning_rate * dw
        b = b - learning_rate * db
        ### END CODE HERE ###
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        
        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs
代码语言:javascript复制
params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

print ("w = "   str(params["w"]))
print ("b = "   str(params["b"]))
print ("dw = "   str(grads["dw"]))
print ("db = "   str(grads["db"]))
代码语言:javascript复制
w = [[0.1124579 ]
 [0.23106775]]
b = [1.55930492]
dw = [[0.90158428]
 [1.76250842]]
db = [0.43046207]
  • 可以利用学习到的参数来进行预测

计算预测值 Y^=A=σ(wTX b) 根据预测值进行分类,<= 0.5 标记为0,否则为1

代码语言:javascript复制
# GRADED FUNCTION: predict

def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    
    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    ### START CODE HERE ### (≈ 1 line of code)
    A = sigmoid(np.dot(w.T, X)   b)
    ### END CODE HERE ###
    
    for i in range(A.shape[1]):
        
        # Convert probabilities A[0,i] to actual predictions p[0,i]
        ### START CODE HERE ### (≈ 4 lines of code)
        Y_prediction[0][i] = 0 if A[0][i] <= 0.5 else 1
        ### END CODE HERE ###
    
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction
代码语言:javascript复制
print ("predictions = "   str(predict(w, b, X)))
代码语言:javascript复制
predictions = [[1. 1.]]
4.5 合并所有函数到Model
代码语言:javascript复制
# GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """
    
    ### START CODE HERE ###
    
    # initialize parameters with zeros (≈ 1 line of code)
    w, b = initialize_with_zeros(X_train.shape[0])

    # Gradient descent (≈ 1 line of code)
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = print_cost)
    
    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]
    
    # Predict test/train set examples (≈ 2 lines of code)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    ### END CODE HERE ###

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d
代码语言:javascript复制
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)
代码语言:javascript复制
Cost after iteration 0: 0.693147
Cost after iteration 100: 0.584508
Cost after iteration 200: 0.466949
Cost after iteration 300: 0.376007
Cost after iteration 400: 0.331463
Cost after iteration 500: 0.303273
Cost after iteration 600: 0.279880
Cost after iteration 700: 0.260042
Cost after iteration 800: 0.242941
Cost after iteration 900: 0.228004
Cost after iteration 1000: 0.214820
Cost after iteration 1100: 0.203078
Cost after iteration 1200: 0.192544
Cost after iteration 1300: 0.183033
Cost after iteration 1400: 0.174399
Cost after iteration 1500: 0.166521
Cost after iteration 1600: 0.159305
Cost after iteration 1700: 0.152667
Cost after iteration 1800: 0.146542
Cost after iteration 1900: 0.140872
train accuracy: 99.04306220095694 %
test accuracy: 70.0 %
  • 模型在训练集上表现的很好,在测试集上一般,存在过拟合现象
代码语言:javascript复制
# Example of a picture that was wrongly classified.
index = 24
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
print ("y = "   str(test_set_y[0,index])   ", you predicted that it is a ""   classes[int(d["Y_prediction_test"][0,index])].decode("utf-8")    "" picture.")
代码语言:javascript复制
y = 1, you predicted that it is a "cat" picture.

更改 index 可以查看 测试集的 预测值和真实值

  • 绘制代价函数、梯度
代码语言:javascript复制
# Plot learning curve (with costs)
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate ="   str(d["learning_rate"]))
plt.show()
  • 增加训练迭代次数为 3000(上面是2000)
代码语言:javascript复制
train accuracy: 99.52153110047847 %
test accuracy: 68.0 %

训练集上的准确率上升,但是测试集上准确率下降,这就是过拟合了

4.6 分析
  • 不同学习率下的对比
代码语言:javascript复制
learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: "   str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('n'   "-------------------------------------------------------"   'n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()
代码语言:javascript复制
learning rate is: 0.01
train accuracy: 99.52153110047847 %
test accuracy: 68.0 %

-------------------------------------------------------

learning rate is: 0.001
train accuracy: 88.99521531100478 %
test accuracy: 64.0 %

-------------------------------------------------------

learning rate is: 0.0001
train accuracy: 68.42105263157895 %
test accuracy: 36.0 %

-------------------------------------------------------
  • 学习率太大的话,容易引起震荡,导致不收敛(本例子0.01,不算太坏,最后收敛了)
  • 低的cost不意味着好的模型,要检查是否过拟合(训练集很好,测试集很差)
4.7 用自己的照片测试模型
代码语言:javascript复制
## START CODE HERE ## (PUT YOUR IMAGE NAME) 
my_image = "cat1.jpg"   # change this to the name of your image file 
## END CODE HERE ##

# We preprocess the image to fit your algorithm.
fname = "images/"   my_image
image = Image.open(fname)
my_image = np.array(image.resize((num_px, num_px))).reshape((1, num_px*num_px*3)).T
my_predicted_image = predict(d["w"], d["b"], my_image)

plt.imshow(image)
print("y = "   str(np.squeeze(my_predicted_image))   ", your algorithm predicts a ""   classes[int(np.squeeze(my_predicted_image)),].decode("utf-8")    "" picture.")

5. 总结

  • 处理数据很重要,数据维度,数据标准化
  • 各个独立的函数,初始化,前后向传播,梯度下降更新参数
  • 组成模型
  • 调节学习率等超参数

0 人点赞