- 为什么要引入正则化?
在做线性回归或者逻辑回归的时候,会遇到过拟合问题,即,在训练集上的error很小,但是在测试集上的偏差却很大。因此,引入正则化项,防止过拟合。保证在测试集上获得和在训练集上相同的效果。
例如:对于线性回归,不同幂次的方程如下
通过训练得到的结果如下:
明显,对于低次方程,容易产生欠拟合,而对于高次方程,容易产生过拟合现象。
因此,我们引入正则化项:
其他的正则化因子
- 关于线性回归的正则化
(1)首先,绘制数据图像:
我们可以看到,只有7个数据点,因此,很容易过拟合,(训练数据集越大,越不容易过拟合)。
(2)我们用一个五次的多项式做线性回归:
之所以是线性回归,是因为对于每一个x的不同幂次,它们是线性组合的。对于初始的x,它是一个一维的特征,因此,我们将x重新构造,得到一个六维的向量。
m=length(y);
x=[ones(m,1),x,x.^2,x.^3,x.^4,x.^5];
如上实现,那么对于x的每一个维度,它们都是线性无关的,h(x)是它们的线性组合,因此,此时问题是一个多维线性回归问题。
(3)损失函数
其中λ是正则化参数。
(4)采用正规方程方式求解
注意:λ后的矩阵,θ0不参与计算,即不对θ0进行惩罚。
对于不同的参数λ,如果过大,则会把所有的参数都最小化了,导致模型编程常数θ0
即造成欠拟合。
(5)计算方法:
lambda=1;
Lambda=lambda.*eye(6);
Lambda(1)=0;
theta=(x'*x Lambda)x'*y
figure;
x_=(minx:0.01:maxx)';
x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
hold on
plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
plot(x_,x_1*theta,'--b','LineWidth',2);
legend({'data','5-th line'})
title('lambda=1')
xlabel('x')
ylabel('y')
hold off
其中λ取0,1,10.结果如下:
计算结果如下:
代码语言:javascript复制Theta(λ=0) = 6×1
代码语言:javascript复制 0.4725
代码语言:javascript复制 0.6814
代码语言:javascript复制 -1.3801
代码语言:javascript复制 -5.9777
代码语言:javascript复制 2.4417
代码语言:javascript复制 4.7371
代码语言:javascript复制Theta(λ=1) = 6×1
代码语言:javascript复制 0.3976
代码语言:javascript复制 -0.4207
代码语言:javascript复制 0.1296
代码语言:javascript复制 -0.3975
代码语言:javascript复制 0.1753
代码语言:javascript复制 -0.3394
代码语言:javascript复制Theta(λ=10) = 6×1
代码语言:javascript复制 0.5205
代码语言:javascript复制 -0.1825
代码语言:javascript复制 0.0606
代码语言:javascript复制 -0.1482
代码语言:javascript复制 0.0743
代码语言:javascript复制 -0.1280
我们可以看出,当λ=0时,曲线很好的拟合了数据点,但是也明显产生了过拟合;而当λ=1时,数据点相对均匀地分布在曲线的两侧,而λ=10时,欠拟合现象明显。
- 关于逻辑回归的正则化
- 绘制原始数据
其中,‘ ’表示正例,‘o’表示反例。
绘制方法如下:
pos = find(y); neg = find(y == 0);
plot (x(pos,1),x(pos,2),' ')
hold on
plot (x(neg,1),x(neg,2),'o')
- 预测函数与x的转化
注意:x是一个二维的向量,我们此处将x转化为一个高维的向量,同时,最高次数为6.特征映射函数如下:
function out = map_feature(feat1, feat2)
degree = 6;
out = ones(size(feat1(:,1)));
for i = 1:degree
for j = 0:i
out(:, end 1) = (feat1.^(i-j)).*(feat2.^j);
end
end
正则化后的损失函数
参数θ的更新规则,其中H是Hessian矩阵,另一个参数为J的梯度。
- 迭代求解
- [m, n] = size(x); theta = zeros(n, 1); g =@(z)(1.0 ./ (1.0 exp(-z))); % disp(theta) lambda=0 iteration=20 J = zeros(iteration, 1); for i=1:iteration z = x*theta;% x:117x28 theta 28x1 h = g(z) ;% sigmoid h % Calculate J (for testing convergence) J(i) =-(1/m)*sum(y.*log(h) (1-y).*log(1-h)) ... (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0) %norm求的是向量theta的欧几里德范数 % Calculate gradient and hessian. G = (lambda/m).*theta; G(1) = 0; % gradient L = (lambda/m).*eye(n); L(1) = 0;% Hessian grad = ((1/m).*x' * (h-y)) G; H = ((1/m).*x'*diag(h)*diag(1-h)*x) L; % Here is the actual update theta = theta - Hgrad; end 计算出θ的值,然后绘制决策边界,可视化展示计算结果。
- 结果展示
- 其中λ的取值同样为0,1,10; 注意,采用MATLAB中的contour函数通过等高线的方式进行绘制,同时,在取值连线的时候注意要对u,v做同样的处理,如下: % Define the ranges of the grid u = linspace(-1, 1.5, 200); v = linspace(-1, 1.5, 200); % Initialize space for the values to be plotted z = zeros(length(u), length(v)); % Evaluate z = theta*x over the grid for i = 1:length(u) for j = 1:length(v) % Notice the order of j, i here! z(j,i) = map_feature(u(i), v(j))*theta; end end 绘制图像结果如下:
- 同样,我们可以看到对于λ=0,过拟合,λ=10,欠拟合。
附录 源代码
代码语言:javascript复制附录:程序源代码
1. 线性回归 正则化
2. clc,clear
3. x=load("ex5Linx.dat");
4. y=load("ex5Liny.dat");
5. x0=x,y0=y
6. figure;
7. plot(x, y, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
8. title('training data')
9. xlabel('x')
10. ylabel('y')
11. minx=min(x);
12. maxx=max(x);
13. m=length(y);
14. x=[ones(m,1),x,x.^2,x.^3,x.^4,x.^5];
15. disp(size(x(1,:))) %1x6
16. theta=zeros(size(x(1,:)))
17. lambda=0;
18. Lambda=lambda.*eye(6);
19. Lambda(1)=0;
20. theta=(x'*x Lambda)x'*y
21. figure;
22. x_=(minx:0.01:maxx)';
23. x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
24. hold on
25. plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
26. plot(x_,x_1*theta,'--b','LineWidth',2);
27. legend({'data','5-th line'})
28. title('lambda=0')
29. xlabel('x')
30. ylabel('y')
31. hold off
32. lambda=1;
33. Lambda=lambda.*eye(6);
34. Lambda(1)=0;
35. theta=(x'*x Lambda)x'*y
36. figure;
37. x_=(minx:0.01:maxx)';
38. x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
39. hold on
40. plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
41. plot(x_,x_1*theta,'--b','LineWidth',2);
42. legend({'data','5-th line'})
43. title('lambda=1')
44. xlabel('x')
45. ylabel('y')
46. hold off
47. lambda=10;
48. Lambda=lambda.*eye(6);
49. Lambda(1)=0;
50. theta=(x'*x Lambda)x'*y
51. figure;
52. x_=(minx:0.01:maxx)';
53. x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
54. hold on
55. plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
56. plot(x_,x_1*theta,'--b','LineWidth',2);
57. legend({'data','5-th line'})
58. title('lambda=10')
59. xlabel('x')
60. ylabel('y')
61. hold off
2. 逻辑回归 正则化
1. clc,clear;
2. x = load ('ex5Logx.dat') ;
3. y = load ('ex5Logy.dat') ;
4. x0=x
5. y0=y
6. figure
7. % Find the i n d i c e s f or th e 2 c l a s s e s
8. pos = find(y); neg = find(y == 0);
9. plot (x(pos,1),x(pos,2),' ')
10. hold on
11. plot (x(neg,1),x(neg,2),'o')
12. u=x(:,1)
13. v=x(:,2)
14. x = map_feature (u,v)
15. [m, n] = size(x);
16. theta = zeros(n, 1);
17. g =@(z)(1.0 ./ (1.0 exp(-z)));
18. % disp(theta)
19. lambda=0
20. iteration=20
21. J = zeros(iteration, 1);
22. for i=1:iteration
23. z = x*theta;% x:117x28 theta 28x1
24. h = g(z) ;% sigmoid h
25.
26. % Calculate J (for testing convergence)
27. J(i) =-(1/m)*sum(y.*log(h) (1-y).*log(1-h)) ...
28. (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
29. %norm求的是向量theta的欧几里德范数
30.
31. % Calculate gradient and hessian.
32. G = (lambda/m).*theta; G(1) = 0; % gradient
33. L = (lambda/m).*eye(n); L(1) = 0;% Hessian
34.
35. grad = ((1/m).*x' * (h-y)) G;
36. H = ((1/m).*x'*diag(h)*diag(1-h)*x) L;
37.
38. % Here is the actual update
39. theta = theta - Hgrad;
40.
41. end
42. % Define the ranges of the grid
43. u = linspace(-1, 1.5, 200);
44. v = linspace(-1, 1.5, 200);
45.
46. % Initialize space for the values to be plotted
47. z = zeros(length(u), length(v));
48.
49. % Evaluate z = theta*x over the grid
50. for i = 1:length(u)
51. for j = 1:length(v)
52. % Notice the order of j, i here!
53. z(j,i) = map_feature(u(i), v(j))*theta;
54. end
55. end
56. % Because of the way that contour plotting works
57. % in Matlab, we need to transpose z, or
58. % else the axis orientation will be flipped!
59. z = z'
60. % Plot z = 0 by specifying the range [0, 0]
61. contour(u,v,z,[0,0], 'LineWidth', 2)
62. xlim([-1.00 1.50])
63. ylim([-0.8 1.20])
64. legend({'y=1','y=0','Decision Boundary'})
65. title('lambda=0')
66. xlabel('u')
67. ylabel('v')
68. lambda=1
69. % lambda=10
70. iteration=20
71. J = zeros(iteration, 1);
72. for i=1:iteration
73. z = x*theta;% x:117x28 theta 28x1
74. h = g(z) ;% sigmoid h
75.
76. % Calculate J (for testing convergence)
77. J(i) =-(1/m)*sum(y.*log(h) (1-y).*log(1-h)) ...
78. (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
79. %norm求的是向量theta的欧几里德范数
80.
81. % Calculate gradient and hessian.
82. G = (lambda/m).*theta; G(1) = 0; % gradient
83. L = (lambda/m).*eye(n); L(1) = 0;% Hessian
84.
85. grad = ((1/m).*x' * (h-y)) G;
86. H = ((1/m).*x'*diag(h)*diag(1-h)*x) L;
87.
88. % Here is the actual update
89. % disp(Hgrad)
90. theta = theta - Hgrad;
91. % disp(theta)
92. % disp(i)
93. end
94. % Define the ranges of the grid
95. u = linspace(-1, 1.5, 200);
96. v = linspace(-1, 1.5, 200);
97.
98. % Initialize space for the values to be plotted
99. z = zeros(length(u), length(v));
100.
101. % Evaluate z = theta*x over the grid
102. for i = 1:length(u)
103. for j = 1:length(v)
104. % Notice the order of j, i here!
105. z(j,i) = map_feature(u(i), v(j))*theta;
106. end
107. end
108. % Because of the way that contour plotting works
109. % in Matlab, we need to transpose z, or
110. % else the axis orientation will be flipped!
111. z = z'
112. % Plot z = 0 by specifying the range [0, 0]
113. figure;
114. pos = find(y0); neg = find(y0 == 0);
115. plot (x0(pos,1),x0(pos,2),' ')
116. hold on
117. plot (x0(neg,1),x0(neg,2),'o')
118. contour(u,v,z,[0,0], 'LineWidth', 2)
119. xlim([-1.00 1.50])
120. ylim([-0.8 1.20])
121. legend({'y=1','y=0','Decision Boundary'})
122. title('lambda=1')
123. xlabel('u')
124. ylabel('v')
125. lambda=10
126. iteration=20
127. J = zeros(iteration, 1);
128. for i=1:iteration
129. z = x*theta;% x:117x28 theta 28x1
130. h = g(z) ;% sigmoid h
131.
132. % Calculate J (for testing convergence)
133. J(i) =-(1/m)*sum(y.*log(h) (1-y).*log(1-h)) ...
134. (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
135. %norm求的是向量theta的欧几里德范数
136.
137. % Calculate gradient and hessian.
138. G = (lambda/m).*theta; G(1) = 0; % gradient
139. L = (lambda/m).*eye(n); L(1) = 0;% Hessian
140.
141. grad = ((1/m).*x' * (h-y)) G;
142. H = ((1/m).*x'*diag(h)*diag(1-h)*x) L;
143.
144. % Here is the actual update
145. theta = theta - Hgrad;
146. end
147. % Define the ranges of the grid
148. u = linspace(-1, 1.5, 200);
149. v = linspace(-1, 1.5, 200);
150.
151. % Initialize space for the values to be plotted
152. z = zeros(length(u), length(v));
153.
154. % Evaluate z = theta*x over the grid
155. for i = 1:length(u)
156. for j = 1:length(v)
157. % Notice the order of j, i here!
158. z(j,i) = map_feature(u(i), v(j))*theta;
159. end
160. end
161. % Because of the way that contour plotting works
162. % in Matlab, we need to transpose z, or
163. % else the axis orientation will be flipped!
164. z = z'
165. % Plot z = 0 by specifying the range [0, 0]
166. figure;
167. pos = find(y0); neg = find(y0 == 0);
168. plot (x0(pos,1),x0(pos,2),' ')
169. hold on
170. plot (x0(neg,1),x0(neg,2),'o')
171. contour(u,v,z,[0,0], 'LineWidth', 2)
172. xlim([-1.00 1.50])
173. ylim([-0.8 1.20])
174. legend({'y=1','y=0','Decision Boundary'})
175. title('lambda=10')
176. xlabel('u')
177. ylabel('v')