探索时间序列，预测未来

文章期号：20190702

掌握预测，不能少的技能时间序列预测

1，什么是时间序列

时间序列（time series）是按时间顺序记录的一组数据。其中观察的时间可以是年份，季度，月份或其它任何时间形式，为了方便表述，文中用 t 表示所观察的时间， Yt表示在时间t上的观测值。

2，影响时间序列变化的成分

时间序列的变化可能受到一种或多种因素的影响，导致在不同的时间上取值是有差异的，这些影响因素称为时间序列的组成要素，一个时间序列通常由4种要素组成：趋势，季节变动，循环波动和不规则波动。

趋势：在较长时期内的一种持续向上或持续向下的势能波动。

季节变动：以年为周期长度的固定变动

循环波动：非固定长度的周期性变动

不规则波动：它是时间序列种除去趋势，季节变动，循环波动之后剩余的波动，是由偶然因素引起的误差性波动。

3，时间序列的模型

趋势（T），季节变动（S），循环波动（C）和不规则波动（I）组合的时间序列表达式：

四种不同成分的时间序列

4，时间序列预测方法与评估

预测方法的选择

一种预测方法的好坏取决于预测误差的大小，预测误差是预测值于实际值的差距，有平均误差，平均绝对误差，均方误差，平均百分比误差和平均绝对百分比误差等，其中较为常有的是均方误差（误差平方和的平均数：MSE）公式：

R模拟几种常有的时间序列:

4.1，简单指数平滑预测（加权平均的一种形式）

代码语言：javascript复制

> example11_1
# A tibble: 16 x 6
      X1  年份 粮食产量 居民消费水平 原煤产量   CPI
   <int> <int>    <dbl>        <int>    <dbl> <dbl>
 1     1  2000   46218.         3721     13.8 100. 
 2     2  2001   45264.         3987     14.7 101. 
 3     3  2002   45706.         4301     15.5  99.2
 4     4  2003   43070.         4606     18.4 101. 
 5     5  2004   46947          5138     21.2 104. 
....
> example11_1<-ts(example11_1,start = 20000)
> cpiforecast<-HoltWinters(example11_1[,6],beta = F, gamma = F)
> cpiforecast
Holt-Winters exponential smoothing without trend and without seasonal component.
Call:
HoltWinters(x = example11_1[, 6], beta = F, gamma = F)
Smoothing parameters:
 alpha: 0.2635414
 beta : FALSE
 gamma: FALSE

Coefficients:
      [,1]
a 102.3659
> par(cex=0.7,family="SimSun")
> plot(example11_1[,6],type='o',xlab="时间",ylab="居民消费价格指数")
> #历史数据拟合值
> cpiforecast$fitted
> lines(cpiforecast$fitted[,1],type='o',lty=2,col="blue");
> legend(x='topleft',legend=c("观测值","拟合值"),lty = 1:2,col = c(1,4))
#预估样本外的值（可以预测一个置信区间）
> library(forecast)
> cpi<-forecast.HoltWinter(cpiforecast,h=1)

4.2，Holt 指数平滑预测（考虑了趋势成分）

代码语言：javascript复制

> example11_1<-ts(example11_1,start = 2000)> grainforecast<-HoltWinters(example11_1[,3],gamma = F)> grainforecastHolt-Winters exponential smoothing with trend and without seasonal component.Call:HoltWinters(x = example11_1[, 3], gamma = F)Smoothing parameters: alpha: 0.666955 beta : 0.5802066 gamma: FALSECoefficients:       [,1]a 62211.953b  1108.538> #绘制拟合图> plot(example11_1[,3],type='o',xlab="时间",ylab="粮食产量")> lines(example11_1[,2][c(-1,-2)],grainforecast$fitted[,1],type='o',col="blue")> legend(x='topleft',legend=c("观测值","拟合值"),lty = 1:2,col = c(1,4))> #预估样本外的值（可以预测一个置信区间）> library(forecast)> grainforecast11<-forecast.HoltWinter(grainforecast,h=1)#画出实际值和预测值图>plot(grainforecast11,type='o',xlab="时间",ylab="粮食产量",main="")

4 .3， Winter指数平滑预测

代码语言：javascript复制

> View(example11_4)> example11_4# A tibble: 24 x 4      X1  年份  季度 销售量   <int> <int> <int>  <int> 1     1  2010     1    123 2     2  2010     2    132 3     3  2010     3    137# ... with 14 more rows> > #把数据整理成以季度为周期的格式> example11_4<-ts(example11_4[,4],start = 2010, frequency = 4)> example11_4     Qtr1 Qtr2 Qtr3 Qtr42010  123  132  137  1262011  130  138  142  1322012  138  141  150  1372013  143  147  158  1432014  147  153  166  1512015  159  163  174  161> #确定模型参数阿尔法，陪她，gamma以及模型系数a,b,s> saleforecast<-HoltWinters(example11_4)> saleforecastHolt-Winters exponential smoothing with trend and additive seasonal component.Call:HoltWinters(x = example11_4)Smoothing parameters: alpha: 0.3900487 beta : 0.09609663 gamma: 1Coefficients:          [,1]a  165.6380871b    1.8531103s1  -0.9005488s2   1.2526248s3  10.8458341s4  -4.6380871> #Winter模型的拟合图> plot(example11_4,type='o',xlab="时间",ylab="销售时间")> lines(saleforecast$fitted[,1],type='o',lty=2,col='blue')> legend(x='topleft',legend=c("观测值","拟合值"),lty = 1:2,col = c(1,4))> #预估样本外的值（可以预测一个置信区间）> library(forecast)> saleforecast11<-forecast.HoltWinter(saleforecast,h=8)#画出实际值和预测值图>plot(saleforecast11,type='o',xlab="时间",ylab="销售量",main="")

4.4，线性趋势预测

4.5 ，非线性趋势预测（指数曲线，多阶曲线）

指数曲线的方程为：

代码语言：javascript复制

> example11_1
Time Series:
Start = 2000 
End = 2015 
Frequency = 1 
X1 年份 粮食产量 居民消费水平 原煤产量   CPI
2000  1 2000  46217.5         3721    13.84 100.4
2001  2 2001  45263.7         3987    14.72 100.7
2002  3 2002  45705.8         4301    15.50  99.2
2003  4 2003  43069.5         4606    18.35 101.2
.......
> y<-log(example11_1[,4])
> y
Time Series:
Start = 2000 
End = 2015 
Frequency = 1 
[1] 8.221748 8.290794 8.366603 8.435115 8.544419 8.660601 8.766550 8.932213 9.071883 9.160520
[11] 9.298260 9.482960 9.595535 9.692149 9.785717 9.868275
> x<-1:16
> fit<-lm(y~x)
> fit

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
8.008        0.118

> exp(8.008)
[1] 3004.901
>

多阶曲线：当有k个拐点时，需要拟合k 1阶曲线，一般表达式为：

代码语言：javascript复制

> example11_1
Time Series:
Start = 2000 
End = 2015 
Frequency = 1 
     X1 年份 粮食产量 居民消费水平 原煤产量   CPI
2000  1 2000  46217.5         3721    13.84 100.4
2001  2 2001  45263.7         3987    14.72 100.7
2002  3 2002  45705.8         4301    15.50  99.2
2003  4 2003  43069.5         4606    18.35 101.2
2004  5 2004  46947.0         5138    21.23 103.9
2005  6 2005  48402.2         5771    23.65 101.8
2006  7 2006  49804.2         6416    25.70 101.5
2007  8 2007  50160.3         7572    27.60 104.8
.......
> y<-example11_1[,5]
> x<-1:16
> fit1<-lm(y~x I(x^2))
> fit1

Call:
lm(formula = y ~ x   I(x^2))

Coefficients:
(Intercept)            x       I(x^2)  
    8.87205      2.84856     -0.05432

> #二阶曲线预测
> predata1<-predict(fit1,data.frame(x=1:17))
> #二阶曲线预测值的残差
> residuals1<fit1$residuals
错误: 找不到对象'residuals1'
> residuals1<-fit1$residuals
> #拟合三阶曲线
> fit2<-lm(y~x I(x^2) I(x^3))
> fit2

Call:
lm(formula = y ~ x   I(x^2)   I(x^3))

Coefficients:
(Intercept)            x       I(x^2)       I(x^3)  
   13.66489     -0.09771      0.36610     -0.01649

> predata1<-predict(fit2,data.frame(x=1:17))
> predata1<-predict(fit1,data.frame(x=1:17))
> predata2<-predict(fit2,data.frame(x=1:17))
> residuals2<-fit2$residuals
> plot(2000:2016,predata2,type='o',lty=2,col='red',xlab = "时间",ylim = c(10,45),ylab="原煤产量")
> lines(2000:2016,predata1,type = 'o',lty=3,col='blue')
> points(example11_1[,2],example11_1[,5],type = 'o',pch=19)
> abline(v=2015,lty=6)
>

4.6，分解预测

分解预测是先将时间序列的各个成分依次分解出来，而后再进行预测的。分离出季节性成分，建立线性预测模型或者非线性模型，预测值乘以相对应的季节指数得到最终的预测值。

代码语言：javascript复制

> example11_4
     Qtr1 Qtr2 Qtr3 Qtr4
2010  123  132  137  126
2011  130  138  142  132
2012  138  141  150  137
2013  143  147  158  143
2014  147  153  166  151
2015  159  163  174  161
> #计算季节指数
> salecompose<-decompose(example11_4,type="multiplicative")
> names(salecompose)
[1] "x"        "seasonal" "trend"    "random"   "figure"   "type"    
> salecompose$seasonal
          Qtr1      Qtr2      Qtr3      Qtr4
2010 0.9828110 1.0052503 1.0562235 0.9557153
2011 0.9828110 1.0052503 1.0562235 0.9557153
2012 0.9828110 1.0052503 1.0562235 0.9557153
2013 0.9828110 1.0052503 1.0562235 0.9557153
2014 0.9828110 1.0052503 1.0562235 0.9557153
2015 0.9828110 1.0052503 1.0562235 0.9557153
> #绘制成分图
> plot(salecompose,type='o')
> 
> #
> #季节调整后的序列图
> seasonaladjust<-example11_4/salecompose$seasonal
> 
> x<-1:24
> fit<-lm(seasonaladjust~x)
> fit

Call:
lm(formula = seasonaladjust ~ x)

Coefficients:
(Intercept)            x  
    124.312        1.692

> #最终预测值
> predata<-predict(fit,data.frame(x=1:28))*rep(salecompose$seasonal[1:4],7)
> predata<-ts(predata,start = 2010,frequency = 4);predata
         Qtr1     Qtr2     Qtr3     Qtr4
2010 123.8384 128.3670 136.6635 125.2762
2011 130.4911 135.1716 143.8132 131.7455
2012 137.1438 141.9762 150.9628 138.2148
2013 143.7965 148.7808 158.1124 144.6841
2014 150.4492 155.5854 165.2620 151.1533
2015 157.1019 162.3899 172.4117 157.6226
2016 163.7546 169.1945 179.5613 164.0919
> plot(predata,type='o',lty=2,col='blue',xlab="时间",ylab="销售量")
> lines(example11_4)
> legend(x='topleft',legend=c("实际值","预测值"),lty = 1:2,col = c(1,4))
> abline(v=2016,lty=6,col="grey")
>

成分分解图

分解预测图

至此，常有的几种时间序列预测模型整理完成，大家也可以对不同模型的预测效果做两两的残差对比，根据不同的实际情况，选择恰当的模型，会事半功倍。

series time

0 人点赞