In this report, we will be quantitatively modelling and forecasting the turnover for New South Wale’s pharmaceutical and toiletry wholesale industry by using 2 methods: Exponential smoothing (ETS) and Auto-Regressive Integrated Moving Average (ARIMA). In the first half, we will go wrangle and provide a brief visualisations and time series characteristics of the turnover data. In addition, we will transform the data using a log transformation and apply the appropriate differencing. Then we will go through the process of selecting our models’ parameters and train our model using a test set consisting of the COVID-19 ending period data. We will conduct residuals diagnostic on the models to draw attention to the COVID-19 effects on the effectiveness of forecasting. In the second half, we train each of our best ETS and ARIMA model using the full turnover data (up to December 2022) to conduct out of sample forecast (up to December 2024) and compare our predictions against actual turnover.
We conclude that COVID-19 had a big impact on both forecasting methods. During the periods immediately after COVID-19 (December 2021), it is found that both forecasting methods severely underestimate the turnover figures. The rebound of demand is not fully capture and while the models trained on data up to December 2022 performed better, an overcorrection of turnover estimation is likely. With all this, we suggest a cautious approach when it comes to relying on forecasts as the full effects of COVID-19 has still not been seen.
Plotting the data, the y-axis represents the turnover figures and x-axis represents the time period in years. The turnover figures are the turnover for New South Wales’ phamaceutical, comsetic and toiletry goods retailing.
Looking at the plot generated from the data-set, we can observe a general upward trend throughout the given time period. Looking more closely we can see that there are multiple periods where the trend is slowing down before going up again in the following periods. From this we can assume there to be a hint of cyclic behaviors within the data. Interestingly, during and after the COVID-19 pandemic starting in 2020, we can observe a greater increase in turnover year by year. An explanation for this could be the hoarding of pharmaceutical and toiletry goods during the initial shock of the pandemic, followed by a pented up desire to consume cosmetic goods after pandemic measures were lifted.
In addition, we can observe the presence of seasonality from year to year. At the beginning the level of seasonality is not as substantial as the end of the time series as we can observe higher peaks and deeper troughs by the end. We can use a seasonal plot to further analyse the seasonality.
From the seasonal plot generated. We can observe that the seasonality within the data intensifies as time goes on. In the beginning, we can barely see the seasonality in the data, the orange lines are pretty flat throughout besides an upward tick in December. However, this seasonal behavior grew as the time series progresses. From the 1990s to the 2000s, we observe strong seasonal behaviors. Turnover seems to be lower at the first half of the year but increases in the second half before peaking in December.
From the sub-series plot, we can observe the turnovers of each month throughout the time series in seperate plots. We can see an increasing trend over every month throughout the time period. This means turnover has increased consistently during every month throughout the years. We can observe the seasonal pattern that aligns with what we observed from our previous seasonal plot, sales are higher in the second half of the year and typically peak in December.
Overall, we can observe a time series with a trend, multiplicative seasonality and multiplicative errors. We can see that the turnover across the pharmaceutical, cosmetics, and toiletry industry in New South Wales has steadily increased over time. The rate of the increase seems to be constant. We can observe some unusual noise beginning in 2020, the trend afterward appears to increase at a higher rate than previous periods. This may be the result of the mass consumption of items during the initial panic of the pandemic and its lingering effects on demand and thus may not be a long term change in turnover trends. Furthermore, the turnover is seasonal; could be attributed to increasing demand in certain periods such as for Christmas in December. In addition the general trend has some cyclic behaviors that could be linked to the economic and business cycle of Australia.
With a mathematical transformation of the data we seek to reduce the variation between the time periods to get as close to a constant variance as possible in order to make modelling and decomposition easier. With a homoskedastic variance across the levels of the time series. For this data-set, variation is dependant on the level. Using a Box-Cox transformation with parameter \(\lambda\) = 0 (equivalent to a log transformation) deals with the variance heterogeneity between the start and end of the time period as the variance of the transformed data appears to be more constant and homoskedastic.
In order to fit an Auto-regressive Intergrated Moving Average (ARIMA) model, our time series needs to satisfy the conditions of stationarity. A stationary time series is a time series with no trends, no seasonality, a constant mean and homoskedastic variance. It is a time series with no predictable patterns. We can visually check for stationarity by looking a the ACF and PACF graphs.
As seen from the graphs, our time series is non-stationary. There is a definite trend, the presence of seasonality, and an inconsistent variations between time periods. Additionally, the ACF graphs shows a statistically significant correlation between lags while decaying at a slow pace. These visuals suggest that turnover of one period is correlated with other previous periods.
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Statistic | P value | Seasonal Differencing Needed |
---|---|---|
7.735 | 0.01 | 1 |
We can do a unit root test for stationarity using the Kwiatkowski-Phillips-Schmidt-Shin (KPPS) test. Based off of KPSS, we can test our null hypothesis that our time series is stationary against the alternative hypothesis that our time series is non-stationary. With a p-value of 0.01, we reject the null hypothesis that our time series is stationary in favour of the alternative hypothesis. To deal with this, we will perform a first order seasonal difference.
After the process of seasonal differencing, we obtain the following plots.
Visually, we see more evidence of stationarity than before differencing. We again perform the KPSS test to check for stationarity.
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Statistic | P value | Seasonal Differencing Needed |
---|---|---|
0.162 | 0.1 | 0 |
Based off of the test results, we observe that our time series is now stationary (p value > 0.05), making it suitable for ARIMA modelling.
From our transformed time series, we can estimate a number of ARIMA, AR, and MA models by observing either the ACF or PACF plots individually. As shown above we can observe similiar patterns in both plots.
First, we can estimate an Auto-regressive model by looking at the last significant lag in the PACF plot . We see that the last significant lag is lag 2. The first lag is highly significant so p must equal at least 1, and for the second lag, the significance could be due to chance so we can either select p equal to 1 or 2. However, we see highly significant lags for 2 seasonal periods (lag 12 and 24), thus we should consider our seasonal parameter P to be equal to 2.
Additionally, we can estimate a Moving Average model by following the same process but on the ACF graph. For our series, the ACF plot follows a sinusoidal pattern; many lags are highly significant, so a possible model might have a large number of parameter q. Additionally, we see that lag 12 and 24 is significant. This make sense as the time series is a yearly time series thus we may consider models with a seasonal parameter Q equal to 1 or 2. However, we should be cautious with using a MA approach for this data as it does not seem to follow an MA process.
For both AR and MA component, our differencing parameters d and D are 0 and 1 respectively as we only perform a seasonal difference.
For each component of the ARIMA model (MA or AR), we must set the parameter for the other model equal to 0 in order to make it possible to visually estimate the parameters. However, we can also guess each component parameters within reason. The following are the forecasts and accuracy of some pure AR, MA, and ARIMA models.
Model | AICc | Root Mean Squared Error (Test) | Root Mean Scaled Squared Error (Test) |
---|---|---|---|
ARIMA(1,0,1)(2,1,1) | -1422.26 | 34.395 | 1.540 |
ARIMA(1,1,1)(1,1,1) | -1409.77 | 43.743 | 1.958 |
ARIMA(2,0,0)(2,1,0) | -1329.51 | 44.511 | 1.993 |
ARIMA(1,0,0)(2,1,0) | -1300.51 | 48.360 | 2.165 |
ARIMA(1,0,0)(1,1,0) | -1254.78 | 47.751 | 2.138 |
ARIMA(0,0,1)(0,1,1) | -1016.61 | 55.022 | 2.463 |
We can use Akaike Information Criterion (AIC) to compare between ARIMA models with the same order of differencing. Based off of AICc (sample size adjusted), we see that ARIMA(1,0,1)(2,1,1) is the best out of all our chosen ARIMA models as it minimizes AICc. This model parameters were automatically chosen by fable’s ARIMA function.
We can select our exponential smoothing parameters by looking at the characteristics of our time series. By observing the plot of turnover in section 1, we can see that our time series contains a trend so our model should contain a trend parameter that’s additive. Similarly, the presence of multiplicative seasonality and errors should inform our error and season parameters. Thus, a possible model should include multiplicative errors, an additive trend, and multiplicative seasonality.
The following graphs are the forecast and accuracy of some ETS models with different parameters.
Model | AICc | Root Mean Squared Error (Test) | Root Mean Scaled Squared Error (Test) |
---|---|---|---|
ETS(M,A,M) | 4827.12 | 67.054 | 3.002 |
ETS(M,Ad,M) | 4833.30 | 64.519 | 2.889 |
ETS(M,N,M) | 4926.05 | 74.937 | 3.355 |
ETS(A,A,A) | 5168.66 | 57.661 | 2.582 |
ETS(A,Ad,A) | 5175.69 | 51.981 | 2.327 |
ETS(M,A,N) | 5482.22 | 61.784 | 2.766 |
We can also use AIC to compare between ETS models. Based off of AICc, it would appear that ETS(M,A,M) is the best out of all ETS models. However, we see that the model with the lowest AICc has the second highest errors based off of the test set. Since, our test set is small and the test set period contains more variability and unusual noise due to the pandemic, we will trust more in AIC for our model selection as AIC represents a good balance between training accuracy and model complexity. Thus, ETS(M,A,M) is chosen.
We cannot use AIC to compare between ARIMA and ETS models since the order of differencing is not the same for both model. Thus, to compare between ETS and ARIMA we can use our test set to forecast and calculate the errors of the predictions for each. However, as we will see in the next section, compared to the test set, both ARIMA and ETS under-predicted turnover post 2020. This can be attributed to our model not being able to anticipate and capture the effects of after effects of COVID-19 on demand. We see that on average our ARIMA models have lower errors than our ETS models, suggesting an ARIMA approach might be advantageous for this time series.
Moving forward, we will use ARIMA(1,0,1)(2,1,1)[12] and ETS(M,A,M) as our two chosen forecasting models.
***
Model | Month | Point forecast | Actual | 80% Prediction Interval |
---|---|---|---|---|
ARIMA(1,0,1)(2,1,1)[12] | 2021 Jan | 419.4624 | 415.2 | [418.7562, 419.1670]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 Feb | 411.3172 | 418.4 | [410.4132, 410.8833]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 Mar | 466.2647 | 481.2 | [465.0194, 465.6149]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 Apr | 434.7759 | 467.4 | [433.4243, 434.0292]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 May | 475.4573 | 489.7 | [473.7856, 474.4944]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 Jun | 462.8893 | 458.3 | [461.0863, 461.8168]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 Jul | 492.7985 | 488.6 | [490.7045, 491.5206]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 Aug | 500.2087 | 484.0 | [497.9178, 498.7811]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 Sep | 490.7999 | 478.9 | [488.4004, 489.2784]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 Oct | 499.1490 | 499.8 | [496.5643, 497.4858]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 Nov | 510.9257 | 533.5 | [508.1416, 509.1115]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2021 Dec | 594.6677 | 655.6 | [591.2765, 592.4337]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Jan | 453.1236 | 500.3 | [450.3336, 451.2529]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Feb | 441.4650 | 474.7 | [438.5965, 439.5185]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Mar | 491.0015 | 538.8 | [487.6545, 488.7069]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Apr | 477.1096 | 525.3 | [473.7147, 474.7613]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 May | 513.9221 | 562.0 | [510.1215, 511.2725]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Jun | 499.8533 | 555.5 | [496.0255, 497.1662]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Jul | 520.7287 | 554.7 | [516.6129, 517.8216]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Aug | 537.2086 | 570.0 | [532.8386, 534.1049]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Sep | 521.6736 | 565.4 | [517.3171, 518.5641]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Oct | 530.9212 | 548.5 | [526.3797, 527.6653]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Nov | 546.2844 | 589.1 | [541.5075, 542.8459]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2022 Dec | 629.4053 | 679.8 | [623.7890, 625.3478]0.8 |
Model | Month | Point forecast | Actual | 80% Prediction Interval |
---|---|---|---|---|
ETS(M,A,M) | 2021 Jan | 418.8856 | 415.2 | [418.6798, 419.0913]0.8 |
ETS(M,A,M) | 2021 Feb | 404.2159 | 418.4 | [403.9855, 404.4464]0.8 |
ETS(M,A,M) | 2021 Mar | 445.9284 | 481.2 | [445.6433, 446.2135]0.8 |
ETS(M,A,M) | 2021 Apr | 429.0561 | 467.4 | [428.7551, 429.3571]0.8 |
ETS(M,A,M) | 2021 May | 461.9487 | 489.7 | [461.5982, 462.2991]0.8 |
ETS(M,A,M) | 2021 Jun | 448.6839 | 458.3 | [448.3198, 449.0481]0.8 |
ETS(M,A,M) | 2021 Jul | 469.6524 | 488.6 | [469.2479, 470.0568]0.8 |
ETS(M,A,M) | 2021 Aug | 474.9113 | 484.0 | [474.4801, 475.3425]0.8 |
ETS(M,A,M) | 2021 Sep | 463.6992 | 478.9 | [463.2575, 464.1408]0.8 |
ETS(M,A,M) | 2021 Oct | 471.8386 | 499.8 | [471.3692, 472.3080]0.8 |
ETS(M,A,M) | 2021 Nov | 479.3804 | 533.5 | [478.8840, 479.8767]0.8 |
ETS(M,A,M) | 2021 Dec | 560.1997 | 655.6 | [559.5978, 560.8015]0.8 |
ETS(M,A,M) | 2022 Jan | 426.9162 | 500.3 | [426.4389, 427.3935]0.8 |
ETS(M,A,M) | 2022 Feb | 411.9531 | 474.7 | [411.4776, 412.4285]0.8 |
ETS(M,A,M) | 2022 Mar | 454.4504 | 538.8 | [453.9101, 454.9908]0.8 |
ETS(M,A,M) | 2022 Apr | 437.2427 | 525.3 | [436.7081, 437.7774]0.8 |
ETS(M,A,M) | 2022 May | 470.7491 | 562.0 | [470.1579, 471.3402]0.8 |
ETS(M,A,M) | 2022 Jun | 457.2182 | 555.5 | [456.6294, 457.8069]0.8 |
ETS(M,A,M) | 2022 Jul | 478.5714 | 554.7 | [477.9402, 479.2025]0.8 |
ETS(M,A,M) | 2022 Aug | 483.9161 | 570.0 | [483.2631, 484.5690]0.8 |
ETS(M,A,M) | 2022 Sep | 472.4776 | 565.4 | [471.8261, 473.1291]0.8 |
ETS(M,A,M) | 2022 Oct | 480.7571 | 548.5 | [480.0803, 481.4340]0.8 |
ETS(M,A,M) | 2022 Nov | 488.4273 | 589.1 | [487.7258, 489.1287]0.8 |
ETS(M,A,M) | 2022 Dec | 570.7553 | 679.8 | [569.9197, 571.5909]0.8 |
Model | Root Mean Squared Error (Test) | Root Mean Scaled Squared Error (Test) |
---|---|---|
ARIMA(1,0,1)(2,1,1)[12] | 34.395 | 1.540 |
ETS(M,A,M) | 67.054 | 3.002 |
As mentioned in the above section, actual turnover post 2020 is generally higher than our forecast. In other words, our models captured the decrease in overall demand during the pandemic period and anticipated the same level of demand in the future without taking into account the bounce back of demand after the pandemic.
The ARIMA model performs better than the ETS model based off of the test set errors. Additionally, it has a narrower prediction interval. It is less uncertain in its forecast.
A good unbiased forecasting method should have their innovation residuals satisfy the properties of independence and identically distributed (i.i.d) normally with mean zero and a homoskedastic variance.
To check for independence we can look at the ACF graph which shows the statistically significant correlation between residuals and past lags. If the residuals are not correlated with any lags, i.e white noise. We can say that our model has sufficiently captured all the patterns in the data.
In addition, identically distributed residuals can be identified by looking at the residual plot. We want our residuals to fluctuate around mean zero with the same amount of variance.
Below are the residual graphs of the ETS and ARIMA model respectively:
As we can see from the two graphs, the innovation residuals of both model does not seem to satisfy the conditions of i.i.d normal with mean zero and constant variance. While the residuals seem to be normally distributed, we observe some autocorrelation between residuals and past lags. Since there are correlations between the residuals, it seems like our model have not captured all the available patterns in the data. However, this does not mean our models will not do a good job at forecasting as the significant autocorrelations are generally around 0.1 and -0.1. This suggests that the un-captured information would not significantly affect our point forecast and prediction intervals. So while theoretically a better model might be available we might run into the problem of overfitting. With our current models, the residuals appear well-behaved enough. These factors suggest that both our models did well in capturing most of the patterns in our data.
We can use the ljung box test to test if our residuals are white noise by setting the null hypothesis as our residuals being white-noise against the alternative hypothesis where our residuals are not white-noise.
Model | Ljung Box Stat | P value |
---|---|---|
ARIMA(1,0,1)(2,1,1)[12] | 36.37 | 0.01 |
ETS(M,A,M) | 40.87 | 0.00 |
Following the ljung box test, the p-values for ARIMA and ETS innovation residuals are both < 0.05, thus we reject the null hypothesis that the residuals are white noise. The test validates with our visual assumptions.
As mentioned above, to compare between our chosen ARIMA and ETS model we can only use the results of the forecast and their errors (such as RMSE or RMSSE) and not AIC. With both RMSE and RMSSE we see that our ARIMA model did better than our ETS model. However, both model consistently predicted lower turnover figures than actual figures post COVID-19 starting from 2021.
Model | Root Mean Squared Error (Test) | Root Mean Scaled Squared Error (Test) |
---|---|---|
ARIMA(1,0,1)(2,1,1)[12] | 34.395 | 1.540 |
ETS(M,A,M) | 67.054 | 3.002 |
In this section, we used the previously chosen ARIMA and ETS parameters to retrain our model to the full retail dataset (up to Dec 2022). Then we will use the two models to forecast the next 2 years of turnover figures up until Decemeber 2024.
Model | Month | Point forecast | 80% Prediction Interval |
---|---|---|---|
ARIMA(1,0,1)(2,1,1)[12] | 2023 Jan | 518.7945 | [517.9455, 518.4452]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Feb | 503.6097 | [502.5293, 503.0969]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Mar | 569.5498 | [568.0621, 568.7804]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Apr | 536.0035 | [534.3722, 535.1092]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 May | 582.5829 | [580.5769, 581.4353]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Jun | 567.0400 | [564.8768, 565.7614]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Jul | 594.2715 | [591.7987, 592.7715]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Aug | 603.5723 | [600.8661, 601.8957]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Sep | 590.7658 | [587.9392, 588.9836]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Oct | 596.3532 | [593.3324, 594.4202]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Nov | 615.0113 | [611.7346, 612.8878]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Dec | 710.3686 | [706.4098, 707.7749]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Jan | 540.6773 | [537.4327, 538.5144]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Feb | 529.4142 | [526.0658, 527.1555]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Mar | 599.7561 | [595.7813, 597.0474]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Apr | 563.7212 | [559.8258, 561.0430]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 May | 611.1722 | [606.7873, 608.1340]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Jun | 590.2732 | [585.8921, 587.2167]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Jul | 624.7937 | [620.0119, 621.4372]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Aug | 631.2661 | [626.2981, 627.7600]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Sep | 617.4960 | [612.5113, 613.9609]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Oct | 630.0983 | [624.8924, 626.3900]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Nov | 648.7219 | [643.2471, 644.8066]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Dec | 755.7702 | [749.2663, 751.1022]0.8 |
Model | Month | Point forecast | 80% Prediction Interval |
---|---|---|---|
ETS(M,A,M) | 2023 Jan | 516.8625 | [516.6087, 517.1163]0.8 |
ETS(M,A,M) | 2023 Feb | 500.9847 | [500.7106, 501.2587]0.8 |
ETS(M,A,M) | 2023 Mar | 558.1727 | [557.8390, 558.5064]0.8 |
ETS(M,A,M) | 2023 Apr | 531.9929 | [531.6499, 532.3360]0.8 |
ETS(M,A,M) | 2023 May | 572.0192 | [571.6252, 572.4133]0.8 |
ETS(M,A,M) | 2023 Jun | 554.4645 | [554.0596, 554.8695]0.8 |
ETS(M,A,M) | 2023 Jul | 580.1266 | [579.6800, 580.5732]0.8 |
ETS(M,A,M) | 2023 Aug | 586.4207 | [585.9472, 586.8942]0.8 |
ETS(M,A,M) | 2023 Sep | 573.3155 | [572.8318, 573.7991]0.8 |
ETS(M,A,M) | 2023 Oct | 582.9704 | [582.4583, 583.4824]0.8 |
ETS(M,A,M) | 2023 Nov | 598.8134 | [598.2673, 599.3595]0.8 |
ETS(M,A,M) | 2023 Dec | 699.0308 | [698.3706, 699.6911]0.8 |
ETS(M,A,M) | 2024 Jan | 531.5448 | [531.0214, 532.0682]0.8 |
ETS(M,A,M) | 2024 Feb | 515.1825 | [514.6595, 515.7055]0.8 |
ETS(M,A,M) | 2024 Mar | 573.9542 | [573.3545, 574.5539]0.8 |
ETS(M,A,M) | 2024 Apr | 546.9991 | [546.4117, 547.5864]0.8 |
ETS(M,A,M) | 2024 May | 588.1168 | [587.4686, 588.7650]0.8 |
ETS(M,A,M) | 2024 Jun | 570.0318 | [569.3878, 570.6758]0.8 |
ETS(M,A,M) | 2024 Jul | 596.3765 | [595.6866, 597.0665]0.8 |
ETS(M,A,M) | 2024 Aug | 602.8089 | [602.0954, 603.5224]0.8 |
ETS(M,A,M) | 2024 Sep | 589.3004 | [588.5876, 590.0133]0.8 |
ETS(M,A,M) | 2024 Oct | 599.1870 | [598.4468, 599.9273]0.8 |
ETS(M,A,M) | 2024 Nov | 615.4325 | [614.6567, 616.2083]0.8 |
ETS(M,A,M) | 2024 Dec | 718.3868 | [717.4634, 719.3103]0.8 |
Model | Month | Point forecast | Actual | 80% Prediction Interval |
---|---|---|---|---|
ARIMA(1,0,1)(2,1,1)[12] | 2023 Jan | 518.7945 | 506.7 | [517.9455, 518.4452]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Feb | 503.6097 | 496.7 | [502.5293, 503.0969]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Mar | 569.5498 | 559.8 | [568.0621, 568.7804]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Apr | 536.0035 | 534.5 | [534.3722, 535.1092]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 May | 582.5829 | 598.0 | [580.5769, 581.4353]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Jun | 567.0400 | 559.0 | [564.8768, 565.7614]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Jul | 594.2715 | 574.0 | [591.7987, 592.7715]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Aug | 603.5723 | 590.9 | [600.8661, 601.8957]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Sep | 590.7658 | 610.1 | [587.9392, 588.9836]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Oct | 596.3532 | 599.8 | [593.3324, 594.4202]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Nov | 615.0113 | 631.8 | [611.7346, 612.8878]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2023 Dec | 710.3686 | 698.3 | [706.4098, 707.7749]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Jan | 540.6773 | 553.3 | [537.4327, 538.5144]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Feb | 529.4142 | 537.2 | [526.0658, 527.1555]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Mar | 599.7561 | 574.6 | [595.7813, 597.0474]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Apr | 563.7212 | NA | [559.8258, 561.0430]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 May | 611.1722 | NA | [606.7873, 608.1340]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Jun | 590.2732 | NA | [585.8921, 587.2167]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Jul | 624.7937 | NA | [620.0119, 621.4372]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Aug | 631.2661 | NA | [626.2981, 627.7600]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Sep | 617.4960 | NA | [612.5113, 613.9609]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Oct | 630.0983 | NA | [624.8924, 626.3900]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Nov | 648.7219 | NA | [643.2471, 644.8066]0.8 |
ARIMA(1,0,1)(2,1,1)[12] | 2024 Dec | 755.7702 | NA | [749.2663, 751.1022]0.8 |
Model | Month | Point forecast | Actual | 80% Prediction Interval |
---|---|---|---|---|
ETS(M,A,M) | 2023 Jan | 516.8625 | 506.7 | [516.6087, 517.1163]0.8 |
ETS(M,A,M) | 2023 Feb | 500.9847 | 496.7 | [500.7106, 501.2587]0.8 |
ETS(M,A,M) | 2023 Mar | 558.1727 | 559.8 | [557.8390, 558.5064]0.8 |
ETS(M,A,M) | 2023 Apr | 531.9929 | 534.5 | [531.6499, 532.3360]0.8 |
ETS(M,A,M) | 2023 May | 572.0192 | 598.0 | [571.6252, 572.4133]0.8 |
ETS(M,A,M) | 2023 Jun | 554.4645 | 559.0 | [554.0596, 554.8695]0.8 |
ETS(M,A,M) | 2023 Jul | 580.1266 | 574.0 | [579.6800, 580.5732]0.8 |
ETS(M,A,M) | 2023 Aug | 586.4207 | 590.9 | [585.9472, 586.8942]0.8 |
ETS(M,A,M) | 2023 Sep | 573.3155 | 610.1 | [572.8318, 573.7991]0.8 |
ETS(M,A,M) | 2023 Oct | 582.9704 | 599.8 | [582.4583, 583.4824]0.8 |
ETS(M,A,M) | 2023 Nov | 598.8134 | 631.8 | [598.2673, 599.3595]0.8 |
ETS(M,A,M) | 2023 Dec | 699.0308 | 698.3 | [698.3706, 699.6911]0.8 |
ETS(M,A,M) | 2024 Jan | 531.5448 | 553.3 | [531.0214, 532.0682]0.8 |
ETS(M,A,M) | 2024 Feb | 515.1825 | 537.2 | [514.6595, 515.7055]0.8 |
ETS(M,A,M) | 2024 Mar | 573.9542 | 574.6 | [573.3545, 574.5539]0.8 |
ETS(M,A,M) | 2024 Apr | 546.9991 | NA | [546.4117, 547.5864]0.8 |
ETS(M,A,M) | 2024 May | 588.1168 | NA | [587.4686, 588.7650]0.8 |
ETS(M,A,M) | 2024 Jun | 570.0318 | NA | [569.3878, 570.6758]0.8 |
ETS(M,A,M) | 2024 Jul | 596.3765 | NA | [595.6866, 597.0665]0.8 |
ETS(M,A,M) | 2024 Aug | 602.8089 | NA | [602.0954, 603.5224]0.8 |
ETS(M,A,M) | 2024 Sep | 589.3004 | NA | [588.5876, 590.0133]0.8 |
ETS(M,A,M) | 2024 Oct | 599.1870 | NA | [598.4468, 599.9273]0.8 |
ETS(M,A,M) | 2024 Nov | 615.4325 | NA | [614.6567, 616.2083]0.8 |
ETS(M,A,M) | 2024 Dec | 718.3868 | NA | [717.4634, 719.3103]0.8 |
Model | Root Mean Squared Error (Test) | Root Mean Scaled Squared Error (Test) |
---|---|---|
ARIMA(1,0,1)(2,1,1)[12] | 13.738 | 0.523 |
ETS(M,A,M) | 17.449 | 0.665 |
Overall, we see that both our models did well in terms of forecasting the actual turnover in New South Wales (compared up to March, 2024). Our chosen ARIMA model has slightly lower errors than our ETS model, but overall they are not too different. Additionally, by training our models on the full data-set with realized turnover figures after the pandemic, our models were able to better predict and forecast the post COVID increased in turnover. Thus, from our forecast, we should expect to see a steady increase in turnover from the current period into future periods. However, we should also be cautious when using the models as demand might decay back to its pre-pandemic levels; this scenario might lead to our models over-predicting the turnover figures in future periods.
We can see if we picked the right models from our short-list of models by looking at the accuracy of all models based on the latest data by the ABS.
Model | Root Mean Squared Error (Test) | Root Mean Scaled Squared Error (Test) |
---|---|---|
ARIMA(1,0,1)(2,1,1) | 13.738 | 0.523 |
ARIMA(1,1,1)(1,1,1) | 14.845 | 0.565 |
ARIMA(1,0,0)(1,1,0) | 17.693 | 0.674 |
ARIMA(0,0,1)(0,1,1) | 19.855 | 0.756 |
ARIMA(1,0,0)(2,1,0) | 21.105 | 0.804 |
ARIMA(2,0,0)(2,1,0) | 22.639 | 0.862 |
Model | Root Mean Squared Error (Test) | Root Mean Scaled Squared Error (Test) |
---|---|---|
ETS(M,A,M) | 17.412 | 0.663 |
ETS(M,Ad,M) | 17.412 | 0.663 |
ETS(A,A,A) | 19.262 | 0.734 |
ETS(A,Ad,A) | 20.996 | 0.800 |
ETS(M,N,M) | 28.969 | 1.103 |
ETS(M,A,N) | 65.084 | 2.479 |
Based on the errors, we see that our two chosen ARIMA and ETS models indeed performed the best out of all the short-listed models in their respective class.
In this report, we chose two forecasting methods: ARIMA and ETS. The benefit of both these models is the fact that they are both dynamic models that can change as new information is obtained in future periods. As such, they are more flexible than other simple benchmark methods such as naive or drift. For our time series, the two models performed well and both were robust enough to capture the changing time series dynamics resulting from COVID-19. Furthermore, they are relatively more straightforward to use and are less computationally intensive when compared to other methods such as dynamic regression. Additionally, ARIMA models can handle cyclical time series while other models may not be able to. This benefit is particularly useful for the forecast of economics time series, and since our time series has cyclic behaviors tied to the business cycle, ARIMA worked quite well.
The main drawback of these two model however, is the inability to incorporate exogenous factors into the forecast. For this aspect, time series linear and dynamic regression models might be better as they allow for the inclusion of dummy variables. As we can see in this report, the effects of COVID-19 had an undeniable impact on our predictions and forecasts. With dummy variables we would be able to better incorporate this dimension into our calculations. Furthermore, it is harder to interpret the models output. For instance the monthly seasonality in the data would be more interpretable by using seasonal dummies in linear models. With that said, specifically for dynamic regression, it is much more computationally intensive than ARIMA or ETS when used with larger time series.
Lastly, our current models might overpredict the turnover of future periods, as the increase in demand used to train the model might be due to a backlog of demands that could not be met during the pandemic period. It is reasonable to assume that for future periods demand may drop back to its equilibrium level. Thus, regardless of our models’ performance, we should still approach future periods with a healthy amount of caution and not be entirely reliant on the predictions.