Machine Learning Experiment5 Regularization(正则化) 详解+代码实现

2020-11-13 16:18:27 浏览数 (1)

  1. 为什么要引入正则化?

在做线性回归或者逻辑回归的时候,会遇到过拟合问题,即,在训练集上的error很小,但是在测试集上的偏差却很大。因此,引入正则化项,防止过拟合。保证在测试集上获得和在训练集上相同的效果。

例如:对于线性回归,不同幂次的方程如下

通过训练得到的结果如下:

明显,对于低次方程,容易产生欠拟合,而对于高次方程,容易产生过拟合现象。

因此,我们引入正则化项:

其他的正则化因子

  1. 关于线性回归的正则化

(1)首先,绘制数据图像:

我们可以看到,只有7个数据点,因此,很容易过拟合,(训练数据集越大,越不容易过拟合)。

(2)我们用一个五次的多项式做线性回归:

之所以是线性回归,是因为对于每一个x的不同幂次,它们是线性组合的。对于初始的x,它是一个一维的特征,因此,我们将x重新构造,得到一个六维的向量。

m=length(y);

x=[ones(m,1),x,x.^2,x.^3,x.^4,x.^5];

如上实现,那么对于x的每一个维度,它们都是线性无关的,h(x)是它们的线性组合,因此,此时问题是一个多维线性回归问题。

(3)损失函数

其中λ是正则化参数。

(4)采用正规方程方式求解

注意:λ后的矩阵,θ0不参与计算,即不对θ0进行惩罚。

对于不同的参数λ,如果过大,则会把所有的参数都最小化了,导致模型编程常数θ0

即造成欠拟合。

(5)计算方法:

lambda=1;

Lambda=lambda.*eye(6);

Lambda(1)=0;

theta=(x'*x Lambda)x'*y

figure;

x_=(minx:0.01:maxx)';

x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]

hold on

plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);

plot(x_,x_1*theta,'--b','LineWidth',2);

legend({'data','5-th line'})

title('lambda=1')

xlabel('x')

ylabel('y')

hold off

其中λ取0,1,10.结果如下:

计算结果如下:

代码语言:javascript复制
Theta(λ=0) = 6×1
代码语言:javascript复制
    0.4725
代码语言:javascript复制
    0.6814
代码语言:javascript复制
   -1.3801
代码语言:javascript复制
   -5.9777
代码语言:javascript复制
    2.4417
代码语言:javascript复制
    4.7371
代码语言:javascript复制
Theta(λ=1) = 6×1
代码语言:javascript复制
    0.3976
代码语言:javascript复制
   -0.4207
代码语言:javascript复制
    0.1296
代码语言:javascript复制
   -0.3975
代码语言:javascript复制
    0.1753
代码语言:javascript复制
   -0.3394
代码语言:javascript复制
Theta(λ=10) = 6×1
代码语言:javascript复制
    0.5205
代码语言:javascript复制
   -0.1825
代码语言:javascript复制
    0.0606
代码语言:javascript复制
   -0.1482
代码语言:javascript复制
    0.0743
代码语言:javascript复制
   -0.1280

我们可以看出,当λ=0时,曲线很好的拟合了数据点,但是也明显产生了过拟合;而当λ=1时,数据点相对均匀地分布在曲线的两侧,而λ=10时,欠拟合现象明显。

  1. 关于逻辑回归的正则化
  2. 绘制原始数据

其中,‘ ’表示正例,‘o’表示反例。

绘制方法如下:

pos = find(y); neg = find(y == 0);

plot (x(pos,1),x(pos,2),' ')

hold on

plot (x(neg,1),x(neg,2),'o')

  1. 预测函数与x的转化

注意:x是一个二维的向量,我们此处将x转化为一个高维的向量,同时,最高次数为6.特征映射函数如下:

function out = map_feature(feat1, feat2)

degree = 6;

out = ones(size(feat1(:,1)));

for i = 1:degree

for j = 0:i

out(:, end 1) = (feat1.^(i-j)).*(feat2.^j);

end

end

正则化后的损失函数

参数θ的更新规则,其中H是Hessian矩阵,另一个参数为J的梯度。

  1. 迭代求解
  2. [m, n] = size(x); theta = zeros(n, 1); g =@(z)(1.0 ./ (1.0 exp(-z))); % disp(theta) lambda=0 iteration=20 J = zeros(iteration, 1); for i=1:iteration z = x*theta;% x:117x28 theta 28x1 h = g(z) ;% sigmoid h % Calculate J (for testing convergence) J(i) =-(1/m)*sum(y.*log(h) (1-y).*log(1-h)) ... (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0) %norm求的是向量theta的欧几里德范数 % Calculate gradient and hessian. G = (lambda/m).*theta; G(1) = 0; % gradient L = (lambda/m).*eye(n); L(1) = 0;% Hessian grad = ((1/m).*x' * (h-y)) G; H = ((1/m).*x'*diag(h)*diag(1-h)*x) L; % Here is the actual update theta = theta - Hgrad; end 计算出θ的值,然后绘制决策边界,可视化展示计算结果。
  3. 结果展示
  4. 其中λ的取值同样为0,1,10; 注意,采用MATLAB中的contour函数通过等高线的方式进行绘制,同时,在取值连线的时候注意要对u,v做同样的处理,如下: % Define the ranges of the grid u = linspace(-1, 1.5, 200); v = linspace(-1, 1.5, 200); % Initialize space for the values to be plotted z = zeros(length(u), length(v)); % Evaluate z = theta*x over the grid for i = 1:length(u) for j = 1:length(v) % Notice the order of j, i here! z(j,i) = map_feature(u(i), v(j))*theta; end end 绘制图像结果如下:
  1. 同样,我们可以看到对于λ=0,过拟合,λ=10,欠拟合。

附录 源代码

代码语言:javascript复制
附录:程序源代码
1.	线性回归 正则化
2.	clc,clear
3.	x=load("ex5Linx.dat");
4.	y=load("ex5Liny.dat");
5.	x0=x,y0=y
6.	figure;
7.	plot(x, y, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
8.	title('training data')
9.	xlabel('x')
10.	ylabel('y')
11.	minx=min(x);
12.	maxx=max(x);
13.	m=length(y);
14.	x=[ones(m,1),x,x.^2,x.^3,x.^4,x.^5];
15.	disp(size(x(1,:)))   %1x6
16.	theta=zeros(size(x(1,:)))
17.	lambda=0;
18.	Lambda=lambda.*eye(6);
19.	Lambda(1)=0;
20.	theta=(x'*x Lambda)x'*y
21.	figure;
22.	x_=(minx:0.01:maxx)';
23.	x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
24.	hold on 
25.	plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
26.	plot(x_,x_1*theta,'--b','LineWidth',2);
27.	legend({'data','5-th line'})
28.	title('lambda=0')
29.	xlabel('x')
30.	ylabel('y')
31.	hold off
32.	lambda=1;
33.	Lambda=lambda.*eye(6);
34.	Lambda(1)=0;
35.	theta=(x'*x Lambda)x'*y
36.	figure;
37.	x_=(minx:0.01:maxx)';
38.	x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
39.	hold on 
40.	plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
41.	plot(x_,x_1*theta,'--b','LineWidth',2);
42.	legend({'data','5-th line'})
43.	title('lambda=1')
44.	xlabel('x')
45.	ylabel('y')
46.	hold off
47.	lambda=10;
48.	Lambda=lambda.*eye(6);
49.	Lambda(1)=0;
50.	theta=(x'*x Lambda)x'*y
51.	figure;
52.	x_=(minx:0.01:maxx)';
53.	x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
54.	hold on 
55.	plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
56.	plot(x_,x_1*theta,'--b','LineWidth',2);
57.	legend({'data','5-th line'})
58.	title('lambda=10')
59.	xlabel('x')
60.	ylabel('y')
61.	hold off

2.	逻辑回归 正则化
1.	clc,clear;
2.	x = load ('ex5Logx.dat') ;
3.	y = load ('ex5Logy.dat') ;
4.	x0=x
5.	y0=y
6.	figure
7.	% Find the i n d i c e s f or th e 2 c l a s s e s
8.	pos = find(y); neg = find(y == 0);
9.	plot (x(pos,1),x(pos,2),' ')
10.	hold on
11.	plot (x(neg,1),x(neg,2),'o')
12.	u=x(:,1)
13.	v=x(:,2)
14.	x = map_feature (u,v)
15.	[m, n] = size(x);
16.	theta = zeros(n, 1);
17.	g =@(z)(1.0 ./ (1.0   exp(-z)));
18.	% disp(theta)
19.	lambda=0
20.	iteration=20
21.	J = zeros(iteration, 1);
22.	for i=1:iteration
23.	    z = x*theta;%  x:117x28 theta 28x1
24.	    h = g(z) ;%  sigmoid   h
25.	 
26.	    % Calculate J (for testing convergence)
27.	    J(i) =-(1/m)*sum(y.*log(h) (1-y).*log(1-h))  ...
28.	    (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
29.	    %norm求的是向量theta的欧几里德范数
30.	 
31.	    % Calculate gradient and hessian.
32.	    G = (lambda/m).*theta; G(1) = 0; % gradient
33.	    L = (lambda/m).*eye(n); L(1) = 0;% Hessian
34.	    
35.	    grad = ((1/m).*x' * (h-y))   G;
36.	    H = ((1/m).*x'*diag(h)*diag(1-h)*x)   L;
37.	 
38.	    % Here is the actual update
39.	    theta = theta - Hgrad;
40.	    
41.	end
42.	% Define the ranges of the grid
43.	u = linspace(-1, 1.5, 200);
44.	v = linspace(-1, 1.5, 200);
45.	 
46.	% Initialize space for the values to be plotted
47.	z = zeros(length(u), length(v));
48.	 
49.	% Evaluate z = theta*x over the grid
50.	for i = 1:length(u)
51.	    for j = 1:length(v)
52.	        % Notice the order of j, i here!
53.	        z(j,i) = map_feature(u(i), v(j))*theta;
54.	    end
55.	end
56.	% Because of the way that contour plotting works
57.	% in Matlab, we need to transpose z, or
58.	% else the axis orientation will be flipped!
59.	z = z'
60.	% Plot z = 0 by specifying the range [0, 0]
61.	contour(u,v,z,[0,0], 'LineWidth', 2)
62.	xlim([-1.00 1.50])
63.	ylim([-0.8 1.20])
64.	legend({'y=1','y=0','Decision Boundary'})
65.	title('lambda=0')
66.	xlabel('u')
67.	ylabel('v')
68.	lambda=1
69.	% lambda=10
70.	iteration=20
71.	J = zeros(iteration, 1);
72.	for i=1:iteration
73.	    z = x*theta;%  x:117x28 theta 28x1
74.	    h = g(z) ;%  sigmoid   h
75.	 
76.	    % Calculate J (for testing convergence)
77.	    J(i) =-(1/m)*sum(y.*log(h) (1-y).*log(1-h))  ...
78.	    (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
79.	    %norm求的是向量theta的欧几里德范数
80.	 
81.	    % Calculate gradient and hessian.
82.	    G = (lambda/m).*theta; G(1) = 0; % gradient
83.	    L = (lambda/m).*eye(n); L(1) = 0;% Hessian
84.	    
85.	    grad = ((1/m).*x' * (h-y))   G;
86.	    H = ((1/m).*x'*diag(h)*diag(1-h)*x)   L;
87.	 
88.	    % Here is the actual update
89.	%     disp(Hgrad)
90.	    theta = theta - Hgrad;
91.	%     disp(theta)
92.	%     disp(i)
93.	end
94.	% Define the ranges of the grid
95.	u = linspace(-1, 1.5, 200);
96.	v = linspace(-1, 1.5, 200);
97.	 
98.	% Initialize space for the values to be plotted
99.	z = zeros(length(u), length(v));
100.	 
101.	% Evaluate z = theta*x over the grid
102.	for i = 1:length(u)
103.	    for j = 1:length(v)
104.	        % Notice the order of j, i here!
105.	        z(j,i) = map_feature(u(i), v(j))*theta;
106.	    end
107.	end
108.	% Because of the way that contour plotting works
109.	% in Matlab, we need to transpose z, or
110.	% else the axis orientation will be flipped!
111.	z = z'
112.	% Plot z = 0 by specifying the range [0, 0]
113.	figure;
114.	pos = find(y0); neg = find(y0 == 0);
115.	plot (x0(pos,1),x0(pos,2),' ')
116.	hold on
117.	plot (x0(neg,1),x0(neg,2),'o')
118.	contour(u,v,z,[0,0], 'LineWidth', 2)
119.	xlim([-1.00 1.50])
120.	ylim([-0.8 1.20])
121.	legend({'y=1','y=0','Decision Boundary'})
122.	title('lambda=1')
123.	xlabel('u')
124.	ylabel('v')
125.	lambda=10
126.	iteration=20
127.	J = zeros(iteration, 1);
128.	for i=1:iteration
129.	    z = x*theta;%  x:117x28 theta 28x1
130.	    h = g(z) ;%  sigmoid   h
131.	 
132.	    % Calculate J (for testing convergence)
133.	    J(i) =-(1/m)*sum(y.*log(h) (1-y).*log(1-h))  ...
134.	    (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
135.	    %norm求的是向量theta的欧几里德范数
136.	 
137.	    % Calculate gradient and hessian.
138.	    G = (lambda/m).*theta; G(1) = 0; % gradient
139.	    L = (lambda/m).*eye(n); L(1) = 0;% Hessian
140.	    
141.	    grad = ((1/m).*x' * (h-y))   G;
142.	    H = ((1/m).*x'*diag(h)*diag(1-h)*x)   L;
143.	 
144.	    % Here is the actual update
145.	    theta = theta - Hgrad;
146.	end
147.	% Define the ranges of the grid
148.	u = linspace(-1, 1.5, 200);
149.	v = linspace(-1, 1.5, 200);
150.	 
151.	% Initialize space for the values to be plotted
152.	z = zeros(length(u), length(v));
153.	 
154.	% Evaluate z = theta*x over the grid
155.	for i = 1:length(u)
156.	    for j = 1:length(v)
157.	        % Notice the order of j, i here!
158.	        z(j,i) = map_feature(u(i), v(j))*theta;
159.	    end
160.	end
161.	% Because of the way that contour plotting works
162.	% in Matlab, we need to transpose z, or
163.	% else the axis orientation will be flipped!
164.	z = z'
165.	% Plot z = 0 by specifying the range [0, 0]
166.	figure;
167.	pos = find(y0); neg = find(y0 == 0);
168.	plot (x0(pos,1),x0(pos,2),' ')
169.	hold on
170.	plot (x0(neg,1),x0(neg,2),'o')
171.	contour(u,v,z,[0,0], 'LineWidth', 2)
172.	xlim([-1.00 1.50])
173.	ylim([-0.8 1.20])
174.	legend({'y=1','y=0','Decision Boundary'})
175.	title('lambda=10')
176.	xlabel('u')
177.	ylabel('v')

0 人点赞