Time Serious Forecast using Python
Introduction:
Time series are one of the most common things encountered in daily life. For example Financial prices, weather, home energy usage, and even weight are all of data that can be collected at regular intervals. In the series collect the data points at constant interval time which leads the data have the dependency with the time.The Time series has the sudden increase or decrease in certain intervals(Month, day, hour ...). Which will implies the non stationary in the data modelling. The sudden increase we may called like trends, seasonality and so on.
Data Analysis in Time series:
In python, we have the great library Pandas to handle the time series objects, particularly the datatime64[ns] class which stores time information and allows us to perform some operations really fast.In the below example we are going to use the AirPassanger.csv data set. Initially, we have to load the necessary libraries
import pandas as pd import numpy as np import matplotlib.pylab as plt %matplotlib inline
load the AirPassanger data set to the Pandas data frame,
data = pd.read_csv('AirPassengers.csv') data.head();
Looking at the output
The data contains a particular month and number of passengers travelling in that month .The data type here is object (month) Let’s convert it into a Time series object and use the Month column as our index.
dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m') data = pd.read_csv('/home/ubuntu/Downloads/AirPassengers.csv', parse_dates=['Month'], index_col='Month',date_parser=dateparse) data.index
You can see that now the data type is ‘datetime64[ns]’.Now let’s just make it into a series rather than a data frame
to convert the data frame to time serious
#converting time series ts = data['#Passengers']
And also we can see the various time based index operations,
Stationarity:
This is a very important concept in Time Series Analysis. In order to apply a time series model, it is important for the Time series to be stationary; in other words all its statistical properties (mean,variance) remain constant over time. This is done basically because if you take a certain behaviour over time, it is important that this behaviour is same in the future in order for us to forecast the series. There are a lot of statistical theories to explore stationary series than non-stationary series.Checks for Stationarity:
There are many methods to check whether a time series (direct observations, residuals, otherwise) is stationary or non-stationary.Look at Plots: You can review a time series plot of your data and visually check if there are any obvious trends or seasonality.
Summary Statistics: You can review the summary statistics for your data for seasons or random partitions and check for obvious or significant differences.
Statistical Tests: You can use statistical tests to check if the expectations of stationarity are met or have been violated (Augmented Dickey-Fuller test).
The best way to understand you stationarity in a Time Series is by the plot:
It’s clear from the plot that there is an overall increase in the trend,with some seasonality in it.
Plotting Rolling Statistics :The function will plot the moving mean or moving Standard Deviation. This is still visual method
NOTE: moving mean and moving standard deviation — At any instant ‘t’, we take the mean/std of the last year which in this case is 12 months)
Augmented Dickey-Fuller test :
Statistical tests make strong assumptions about your data. They can only be used to inform the degree to which a null hypothesis can be rejected or fail to be reject. The result must be interpreted for a given problem to be meaningful.Nevertheless, they can provide a quick check and confirmatory evidence that your time series is stationary or non-stationary.
The Augmented Dickey-Fuller test is a type of statistical test called a unit root test.
The intuition behind a unit root test is that it determines how strongly a time series is defined by a trend.
There are a number of unit root tests and the Augmented Dickey-Fuller may be one of the more widely used. It uses an autoregressive model and optimizes an information criterion across multiple different lag values.
The null hypothesis of the test is that the time series can be represented by a unit root, that it is not stationary (has some time-dependent structure). The alternate hypothesis (rejecting the null hypothesis) is that the time series is stationary.
Null Hypothesis (H0): If failed to be rejected, it suggests the time series has a unit root, meaning it is non-stationary. It has some time dependent structure.
Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the time series does not have a unit root, meaning it is stationary. It does not have time-dependent structure.
We interpret this result using the p-value from the test. A p-value below a threshold (such as 5% or 1%) suggests we reject the null hypothesis (stationary), otherwise a p-value above the threshold suggests we fail to reject the null hypothesis (non-stationary).
p-value > 0.05: Fail to reject the null hypothesis (H0), the data has a unit root and is non-stationary.
p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and is stationary.
The below function used to find out the rolling mean, std and dickey-fuller test information. Which will plot the all the three value result in plot. Its useful to filter create the good insights to the developer.
from statsmodels.tsa.stattools import adfuller def test_stationarity(timeseries): #Determing rolling statistics rolmean = pd.rolling_mean(timeseries, window=12) rolstd = pd.rolling_std(timeseries, window=12) #Plot rolling statistics: orig = plt.plot(timeseries, color='blue',label='Original') mean = plt.plot(rolmean, color='red', label='Rolling Mean') std = plt.plot(rolstd, color='black', label = 'Rolling Std') plt.legend(loc='best') plt.title('Rolling Mean & Standard Deviation') plt.show(block=False) #Perform Dickey-Fuller test: print('Results of Dickey-Fuller Test:') dftest = adfuller(timeseries, autolag='AIC') dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used']) for key,value in dftest[4].items(): dfoutput['Critical Value (%s)'%key] = value print(dfoutput)
Now let’s parse our time series data ,
test_stationarity(ts)
This will be continue on Next tutorial.
Comments
Post a Comment