Before we continue, it is important to clarify a couple prediction model concepts. This will allow us to test our models.
We will break the data down into 2 sets. The first set is what will be used to train our model and give it the parameters. This will include the first 80% of the entries. Think of this as teaching the model. The 2nd set will test the model to see how well it performs on new data. This will subsist of the last 20% of the data.
The performance of the models will be compared using Mean Absolute Percentage Error (MAPE). MAPE is calculated by taking the mean of the standardized absolute errors and turning it to a percentage.
This will give us a standardized measurement with which to compare two models. This score will be calculated on the new test data. MAPE percentage error is an intuitive way of understanding the error statistic. As Minitab’s online resource explains, a MAPE score of 5 means that the forecast is off by 5% on average. Other options to measure the error of models would have been R-Squared, Mean Absolute Deviation, or Mean Squared Deviation. I won’t go into the details of these except to explain that R-Squared is the percentage of the response variable variation explained by the model on a scale of 0%-100%. In other words, it is a deviation measure between the model and actual values over a deviation measure between the mean and actual values.
In summary, in order to improve something we have to measure it. So, we want to make sure we have a system in place to systematically measure and compare our models.