Key words: factor decomposition; ARMA model; GDP forecast;
1. Introduction
1.1Background
From 1978, since the reform and opening up, china’s economy is developing rapidly and
steadily. After joining the WTO, the developing speed has reached a new level. GDP (Gross Domestic Product) , which is the basis of national economic production of statistical indicators, can be used to reflect a country’s economy. It is the core of Statistical indicators in the national economy. GDP combines responses of the most basic aspects of macroeconomic, can not only measure the overall national output and income scale, but also can explore the economic fluctuations and cycles. Hence, it is of great importance to fit and analyze GDP accurately for exploring a country’s
macroeconomics trend. The aim of this article is to generate a GDP forecast model and use it to predict the future GDP of china.
1.2 Method
A lot of methods have been used to analysis economy phenomenon, time series analysis is one of the most efficient methods. A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. Time-series methods use economic theory mainly as a guide to variable selection, and rely on past patterns in the data to predict the future. An observed time series can be decomposed into three components: the trend (long term direction), the seasonal (systematic, calendar related movements) and the irregular (unsystematic, short term fluctuations). When these factors occur, we can use the method of decomposition, from which can we collect useful information of the data, we defined it as factor decomposition here.
The trend component typically represents the longer term developments of the time series of interest and is often specified as a smooth function of time.
The recurring but persistently changing patterns within the years are captured by the seasonal component. It is quite common in economic time series, when it occurs, we should use seasonal adjustment method. Seasonal adjustment is the process of estimating and then removing from a time series influences that are systematic and calendar related. Observed data needs to be
seasonally adjusted as seasonal effects can conceal both the true underlying movement in the series, as well as certain non-seasonal characteristics which may be of interest to analysts.
The irregular component represents the irregular fluctuations which are affected by causal factors. It usually defined as residual. Considering the Insufficiency of the deterministic
decomposition, we should test the residuals, if there is no autocorrelation among the residual, it means that the information of the time series is totally recovered by the deterministic decomposition.
In the case of the existence of autocorrelation, ARMA model can be used to fit the residuals. ARMA is a one of those most common time series model which was used to make precise estimation according to short term data. Its main idea can be concluded as a combination of several time-related components which can be used to predict the future data. The time series components from the ARMA model is a set of random variables which related to time itself, which shows uncertainty when observed individually combined with each other shows some kinds of regularity and can be expressed by corresponding statistical model. The ARMA model consists of two parts, an
autoregressive (AR) part and a moving average (MA) part. The model is usually then referred to as
the ARMA(p,q) model where p is the order of the autoregressive part and q is the order of the moving average part. 2. Data Analysis 2.1 Dataset
The data we collected contains historical GDP from 1992-2010, the reason we choose this time duration rather than the 1978-2011 which most other prediction article would like to use is that during the first 10-15 years the economic growth rate is relatively slow compared with the later year’s (1990-now) growth. So we would like to wipe out the interference of the early data. Another reason we use recent years data (1992-2011) is that it is hard for us to look for the quarterly GDP data before 1992 due to the imperfection of the statistical system of China in the end of 20th century.
Table 1 quarter GDP data from 1992-2010 (Unit: 1000 million CNY) time GDP time GDP time GDP Time 1992.1 4974 1997.1 16257 2002.01 25376 2007.01 1992.2 6358 1997.2 18697 2002.02 27965 2007.02 1992.3 7119 1997.3 19148 2002.03 29716 2007.03 1992.4 8472 1997.4 24871 2002.04 37276 2007.04 1993.1 6500 1998.1 17501 2003.01 28861.8 2008.01 1993.2 8044 1998.2 19722 2003.02 31007.1 2008.02 1993.3 9048 1998.3 20372 2003.03 33460.4 2008.03 1993.4 11742 1998.4 26807 2003.04 42493.5 2008.04 1994.1 9065 1999.1 18790 2004.01 33420.6 2009.01 1994.2 11085 1999.2 20765 2004.02 36985.3 2009.02 1994.3 12447 1999.3 21859 2004.03 39561.7 2009.03 1994.4 15601 1999.4 28263 2004.04 49910.7 2009.04 1995.1 11858 2000.1 207 2005.01 39117.4 2010.01 1995.2 14110 2000.2 23101 2005.02 42795.2 2010.02 1995.3 15535 2000.3 24340 2005.03 44744.4 2010.03 1995.4 19291 2000.4 31127 2005.04 58280.4 2010.04 1996.1 14261 2001.01 23300 2006.01 45315.8 1996.2 16601 2001.02 25651 2006.02 50112.7 1996.3 17671 2001.03 26867 2006.03 51912.8 1996.4 224 2001.04 33837 2006.04 673.1 (Data source: National statistical database of China) GDP 755.9 61243 102.2 85709.2 66283.8 74194 768.3 97019.3 69816.9 78386.7 83099.7 109599.5 82613.4 92265.4 97747.9 128886.1 We are going to use these historical GDP data as a time series, learn and analyze the data, then based on the past patterns to get a forecast model, use the model to predict the future GDP. 2.2 Data Graphical Analysis
Figure 1 shows a plot of the data, and we can find that there is a significant long-term trend and varying seasonality in the time series. The trend seems to be quadratic while the seasonality illustrate a strong yearly component occurring at lags that are multiples of s=4. For the purpose of demonstration, the sample ACF of the data is displayed in Figure 2, also, it appears a significant seasonality.
Figure 1. Quartely china’s GDP from 1992(1) to 2010(4) Figure 2.Sample ACF of the GDP data
Series Y1.010000060000ACFY2000002040Index60-0.200.00.20.40.60.8102030Lag405060
3. Time Series Model
3.1 Factor Decomposition
After the previous analysis, now we are going to use the method of factor decomposition to build a time series model, its principle is that through the decomposition method, we collect the useful
information and measure the influence of the trend and seasonality. Define Yt as GDP, x as time. We set the decomposition model as bellows: (1)YtTtStt
(2) Tt01x2x23x3
(3) St01D12D23D3
The reason we use (2) as the trend model is because we can find that it seems to be a thrice model from the pattern in figure 2. D1,D2 and D3 in (3) is the dummy variables of seasons, and D1=c(1,0,0,0,1,0,0,0,………) D2=c(0,1,0,0,0,1,0,0,……..) D3=c(0,0,1,0,0,0,1,0,……..)
3.11 Data Transformation
A significant varying seasonality is observed from figure 1, since the varying seasonality will have negative effects on the model fitting, so we should take some transformation of the GDP value to make the seasonality constant. The Box-Cox transformation is part of the family of power transformation, where the data is transformed using a power functions whilst preserving the rank of the data, so we take Box-Cox transformation of GDP.
Figure 3 illustrate the box-cox plot of GDP, Looking at the Box-Cox diagram in figure 3, λ is near the
'0.20.2 mark, so we use GDP as a new response variable defined asYt. Figure 4 shows the GDP plot after transformation. From the plot we can find that the seasonality is almost constant.
Figure 3.box-cox transformation Figure 4. Quartely china’s GDP after transformation
95%-750-800log-LikelihoodY1-850-900-95060710-2-10122040Index60
3.12 Build Model
ttt. Using the “R” statistics After transformation, now the decomposition model istpackage to analyse the time series and build model, the result of the full model is as follows:
YTSLM1: Yt5.9760.09219x0.001532x20.00001472x30.5072D10.3763D20.3433D3t T = (110.322) (16.798) (-9.281) (10.442) (-15.340) (-11.405) (-10.418) P = (<2e-16) (<2e-16) (9.22e-14) (7.62e-16) (<2e-16) (<2e-16) (8.40e-16) Multiple R-squared: 0.9934 Adjusted R-squared: 0.9928
F-statistic: 1724 p-value: 2.2e-16
In LM1, the T values of the coefficients are all reasonable, the P values of t test are all very small
from which we can conclude that each explanatory variable is significant for the model fitting. F-statistic is 1724 and its p value is small enough, so suggesting that the regression is very significant. Notice that the adjusted R-squared equals to 0.9928, which means that about 99.28% of GDP can be explained from the model, indicating that the model fit the data quite well.Consquensely, the model we choose is quite a perfect one, but we still do some test of the residuals.
3.13 Residuals Analysis
Figure 5 displayed a plot of residuals and fitted, we can find a significant cyclical trend in the plot. Figure 6 shows the ACF plot of the residuals, the plot also give a strong evidence that there is a cyclical trend, suppose it is because the previous model did not fully catch the data’s seasonality though the model fit the data quite well, in other words, it is still caused by the seasonality. Hence we would better to take some measure to deal with the cyclical residuals.
Figure 5.Residuals Vs Fitted Figure 6. ACF of the Residuals
Series res1.00.20.1residuals0.0ACF-0.1-0.2-0.40-0.20.00.20.40.60.802040data602040Lag60
3.13 Residual ARIMA Model
Since the cyclical trend has been observed in residuals, we are going to use the ARIMA model to fit the residuals, because as previous said, the cyclical is caused by the seasonality, so at first, we take one order difference of the residuals at s=4 quarters.
Figure 7. ACF and PACF of the Residuals after difference
Series res1-0.20.20.61.00ACF102030Lag40506070Series res1-0.20.20.60Partial ACF102030Lag40506070
Figure 7 shows the ACF plot and PACF plot of the residuals after first difference, the ACF plot seems to be more reasonable than the one before difference, and most of the ACF are within the bound line. Inspecting the ACF and PACF, we might feel that the ACF is cutting of at lag 3 and the PACF is tailing off at lag 1. For the purpose of choosing a better model, we are going to compare some reasonable ARMA models, such as AR(1), MA(3), ARMA(1,1), ARMA(1,2) and ARMA(1,3)
Table 2 The AICc values of ARMA(p,q) model 0p1, 0q3.
AR 0 1 MA0 -5.404351 1 -4.701461 -5.4024 2 -5.211436 -5.42886 3 -5.274476 -5.39809 Table 3 The BIC values of ARMA(p,q) model 0p1, 0q3.
MA AR 0 1 0 -6.37379 1 -5.6709 -6.343662 2 -6.1524 -6.342783 3 -6.188399 -6.285715 Based on the AICc and BIC critical, we should choose the model with smallest AICc value or BIC value. According to table 2 and table 3, the ARMA(0,1) model has the smallest AICc value and BIC value as well. So we choose ARMA(0,1) model as the final model of the differenced residuals, say
Rt4t(1B4)t. The model is as follow:
LM2 MA(1): Rtvt0.6582vt1
Figure 8. Diagnostics for the MA(1) fit on the differenced residuals
Standardized Residuals-300102030Time40506070Sample QuantilesACF of ResidualsNormal Q-Q Plot of Std Residuals-0.40.6ACF510LAG15-30-2-1012Theoretical Quantilesp values for Ljung-Box statistic0.00.8p value510lag1520
The diagnostics for the model are displayed in Figure 8. Notice that the few outliers in the series as exhibited in the plot of the standard residuals and their normal Q-Q plot, and some of autocorrelation that still remains according to the p-values for Ljung-Box statistical plot. but otherwise, the model fits well. So we are going to use the full model to predict the future GDP of China, the full model is as follow: Yt5.9760.09219x0.001532x20.00001472x30.5072D10.3763D20.3433D3t
4tvt0.6582vt1 4. Prediction
Forecasts based on the full fitted model for the next 8 quarters are shown in table 4, this is because of the principle of time series, it is considered to be applied in doing short-term forecast. Compared with the actual GDP value of 2011, the predicted value is a little bigger than the actual GDP value, especially the fourth quarter GDP of 2011. However the predictions here are still reasonable, regardless of the wide of the 95% intervals, the actual value are all within the intervals. We also predicted the quarterly China’s GDP value here.
Table 4 Forcasting value (Unit: 1000 million CNY) Prediction Lower (95%) Upper (95%) Quarter 2011.01 101488.4 90394.34 1135.8 2011.02 115587.3 103065 129297.7 2011.03 123014.8 109607.4 137704.1 2011.04 161356.3 144318.5 179966.9 2012.01 129376.7 114598.3 1452.5 2012.02 147532.4 130673.5 166088.6 2012.03 157712.4 139420.3 177876.8 2012.04 205702.2 182337.1 231403.2 5. Conclusion
As we can see, the data shows obvious characteristics of seasonality and uptrend in long term as well as some kind of periodicity which are very significant from both patterns we've drawn and the model we've constructed . The long term trend shows a sign of nonlinearity. We uses the method of factor decomposition which we have defined in the first part of the article whom has the advantages of directly viewing and easiness to be understood, meanwhile, the weak point of this model is the
insufficient use of the information in the residual, and that's why we use ARMA model to fit its residual which can make use of the remaining residual information at large.
China now is the world's largest economy, and it is critical to make macro-control and policy-making to continue its booming economy due to the result we have got from the GDP prediction. Time series method is suitable for short-term prediction, and we predict the 8 quarters GDP value after 2010. The prediction we've made shows that China's economy is currently developing at a very high speed which is in line with reality.
Actual 97101.2 108674.2 1143.7 150344.6
Appendix
> data=read.table(\"st5209new.txt\> Y=data$GDP > x=data$time > x2=x^2 > x3=x^3
>plot(Y,ty=”l”) >acf(Y)
>D1=c(1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0)
>D2=c(0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0)
>D3=c(0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0) >lm1=lm(Y~x+x2+x3+D1+D2+D3) >boxcox(lm1) >Y1=Y^0.2
>plot(Y1,ty=”l”)
>lm2=lm(Y1~x+x2+x3+D1+D2+D3) >summary(lm2) >res=lm2$residuals
> plot(res,xlab=\"data\> abline(h=0) >acf(res,100) >pacf(res,100)
>res1=diff(res,lag=4) >acf(res1,100) >pacf(res1,100) >sarima(res1,1,0,0) >sarima(res1,0,0,1) >sarima(res1,0,0,2) >sarima(res1,0,0,3) >sarima(res1,1,0,1) >sarima(res1,1,0,2) >sarima(res1,1,0,3)
> sarima.for(res1,n.ahead=8,0,0,1)
>new=data.frame(x=c(77,78,79,80,81,82,83,84),x2=c(5929,6084,6241,00,6561,6724,68,7056),x3=c(456533,474552,493039,512000,531441,551368,571787,592704),D1=c(1,0,0,0,1,0,0,0),D2=c(0,1,0,0,0,1,0,0),D3=c(0,0,1,0,0,0,1,0))
> predict(lm4,new,interval=\"prediction\")
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- sceh.cn 版权所有 湘ICP备2023017654号-4
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务