Time Series Forecasting using ARIMA model

# Time Series Forecasting using ARIMA model

ARIMA model: Introduction

ARIMA is a widespread statistical technique for time series forecasting. It is an acronym. It stands for AutoRegressive Integrated Moving Average. This model captures a suite of diverse standard temporal structures in time series data.An ARIMA model is a class of statistical techniques for analyzing and forecasting time series data. To read more about time series analysis and its basic techniques please read this post.

It clearly provides a suite of standard structures in time series data, and as such offers a simple yet powerful technique for making skillful time series forecasts.A standard representation is used of ARIMA (p,d,q) where the arguments are replaced with integer values to rapidly specify the exact ARIMA model being used.

Parameters of the model are described as follows:

p: Number of lag observations comprised in the model / lag order. d: Number of times that the raw observations are differenced / degree of differencing. q: Size of the moving average window / order of moving average

Autoregressive Integrated Moving Average and Stationarity

In an autoregressive integrated moving average model, the data is differenced in order to style it as stationary. There is constancy to the data over period in stationary data. Most financial and market data have trend structure in their nature, hence the purpose of differencing is to eliminate any trends or seasonal structures.

Seasonality, or when data display steady and foreseeable designs that recur over a calendar year, could harmfully disturb the regression model. If a trend is present and stationarity is not obvious, many of the calculations throughout the procedure cannot be made with good effectiveness.

Configuring an ARIMA Model

The traditional tactic for fitting an ARIMA model is to follow the Box-Jenkins Practice.This method practices time series analysis and diagnostics to discover good parameters for the ARIMA model.

In summary, the steps of this process are as follows:

1. Model Identification. Use plots and summary statistics to identify trends, seasonality, and autoregression elements to get an idea of the amount of differencing and the size of the lag that will be required.
2. Parameter Estimation. Use a fitting procedure to find the coefficients of the regression model.
3. Model Checking. Use plots and statistical tests of the residual errors to govern the amount and type of temporal structure not captured by the model.

The process is made recurrent until either a desirable level of fit is accomplished on the in-sample or out-of-sample observations (e.g. training or test datasets).

Parameter Selection for the ARIMA Time Series Model

When looking to fit time series data with ARIMA model, initial objective is to catch the values of ARIMA (p,d,q)(P,D,Q)s that optimize a metric of interest.

One can use a “grid search” method to iteratively discover different combinations of parameters. For each combination of parameters, one fits a new seasonal ARIMA model with the `SARIMAX()` function from the ‘`statsmodels’` package and measure its overall quality. Once one has discovered the whole landscape of parameters, optimal set of parameters will be the one that produces the best performance for one’s criteria.

ARIMA with Python

The ‘statsmodels’ library offers the proficiency to fit an ARIMA model. The procedure for the same is as follows:

1. Define the model by calling ARIMA() function and pass pd, and q parameters to it.
2. The model is cretaed on the training data by calling the fit() function.
3. Forecasts are generated by calling the predict() function and stipulating the index of the time or times to be forecasted

Example Usage and Code with Python:

Please visit official ‘Statsmodels’ site for usage. Here default values for parameter p,d,q are 5,1,0 respectively.

from pandas import datetime

from pandas import DataFrame

from statsmodels.tsa.arima_model import ARIMA

# fit model

model = ARIMA(series, order=(5,1,0))

model_fit = model.fit(disp=0)

print(model_fit.summary())

Reference:

Statsmodels- Python Documentation