- Varun Rao
Forecasting Retail Sales with Machine Learning
Updated: Sep 27, 2020
Executive Summary
Prediction of sales is an important application of machine learning in the retail space. Given accurate predictions, retailers can manage dynamic pricing, staff rostering and inventory so as to maximise profit and improve the customer experience.
This article details work done by Deep Blue AI to demonstrate the utility of this approach. We used a state-of-the-art machine learning model to forecast weekly sales prices for a large chain of department stores using an open-source dataset. In general, there is good agreement between the true values and model predictions. The median absolute error was $494.8 per week and 82% of all errors were below 20%. The mean error for weekly sales amounts over $10,000 - which account for over 83% of revenue - was 7.7%.
The most important features that affect sales were found to be environmental conditions and macroeconomic indicators: temperature, fuel price, CPI, week of the year, and unemployment rate. A brief discussion of how these results could be used is provided in a following section. The concluding section details some additional work that could improve model accuracy based on our previous publications.
Introduction
Forecasting sales is an important machine learning application in the customer retail environment. Accurate predictions of aggregate sales allow retailers to roster staff and manage supply chains efficiently, as well as determine optimum item prices [Liu et al. (2013)].
An accurate forecast of sales allows retail outlets to answer questions such as:
Can we use dynamic pricing to maximise our profit?
Do we have enough stock to satisfy demand without being overstocked?
Have we rostered on the appropriate amount of staff?
What are the most important factors that affect sales, and how can we optimise them?
Do we have the right trade-off between store size and rental expense?
Dynamic pricing is a typical use case for sales forecasting models, and is commonly used in the airline, hotel, apparel, electronics and telecommunications industries [Narahari et al. (2005), Sahay (2007)]. In essence, companies using this strategy adjust prices for their products based on predicted demand, resulting in increased profit margins [Shakya et al. (2012)]. Given that retail prices can be one of the inputs into a machine learning model, it would then be possible to predict sales volumes based on each input price.
Ultimately, prices could be adjusted dynamically to ensure the maximum profit for the retailer, such as the electronics supplier who gained a $25 million by adjusting their prices faster than the competition [Baker et al. (2001)]. Overall, it is estimated that dynamic pricing has the potential to increase profits by 7% [Zhao & Zheng (2000)].
Optimisation of the number of rostered staff is another useful application of sales forecasting models. When businesses are understaffed, wait times for services increase, resulting in diminished customer satisfaction [Henderson & Mason (1998)] and possibly lower sales. In the reverse case of overstaffing, profit margins are reduced. As far back as 1991, it was estimated that an optimisation strategy would save the US airline industry $20 million annually [Anbil et al. (1991)], while a similar approach applied to ground crew was estimated to save over $8 million [Brusco et al. (1995)].

The figure above illustrates the utility of both dynamic pricing and rostering optimisation. The data, taken from the source described in a following section, shows weekly sales for one department in a store over a period of about 2.5 years. The dominant characteristic is clearly the large spike in sales around December 2011, representing an increase in sales of over 60%. Also noteworthy are the fluctuations of ~$250 per week in magnitude occurring throughout 2011.
Accurate forecasting of these sales values would allow the store to adjust prices and/or staff to account for these variations. In particular, if the large demand around December 2011 is not predicted, undersupply of items is likely to be an issue. It would be possible, but outside the scope of the current work, to determine the cause of this unusual increase in sales demand - Christmas sales are the obvious candidate, but the absence of this effect for December 2010 would require explanation.
Another important aspect not considered in the present work is the use of this data by the original suppliers of the goods, such as fast moving consumer goods (FMCG) manufacturers or the food and beverage industry. These companies could use the insights developed by the sales forecasting engine to uncover trends, identify issues and target niche products.
Methodology
The aim of this machine learning task was to predict the weekly sales in dollars for a given store, department and week. In addition to the data provided by the retailer, macroeconomic data is a crucial input into the sales forecasting process [Krishna et al. (2018), Cadavid et al. (2018)]. Of particular relevance in the current environment, the COVID-19 pandemic has upended a great deal of conventional wisdom with regard to sales forecasts. Given the new normal, it is clear that previous assumptions based on limited domain knowledge will not hold. In this case, it is crucial that external macroeconomic factors be included in the inputs to the model, as they are likely to provide some of the general knowledge that affects today's retail environment.
Data
Public-domain data was obtained from Kaggle here, consisting of 282k samples of anonymised historical sales data for 45 Walmart stores between February 2010 and October 2012. A description of the inputs available is given below.
Store details
Store number
Department number
Store size
Store type
General details
Date
Holiday Weeks
Average temperature
Macroeconomic indicators
Fuel price
Unemployment rate
Consumer price index (CPI)
The figure below shows the distribution of weekly sales in USD. This is clearly a right-skewed distribution, with a large number of values falling to the right of the peak.

The figure below shows the cumulative value of weekly sales. As also seen in the previous figure, most sales have relatively low magnitude values - 50% of sales are below $7665 per week. Although not considered in this work, it is likely that a retailer would wish to apply different weights to the training data based on the dollar value of the sale. In other words, the tolerance for error on small dollar values might be large so long as predictions on the large dollar amounts are accurate.

Results
DBAI developed a sale forecasting engine based on a state-of-the-art machine learning model to demonstrate a proof of concept use case for the retail industry. Note that the results presented below were separate to training examples, so they are genuinely unseen data to best represent the use case.
Predictions
The figures below show a comparison of actual weekly sales versus model predictions in USD/week; each figure represents a selected store and department. The blue line represents true values, while model predictions are indicated by the orange line.
The figure below clearly shows a large seasonal trend around April-June of each of the three years. This trend is also captured in the model predictions. In this case, the requirements for dynamic pricing, inventory management and staff rostering are likely to be significantly different for these few months than the rest of the the year. Without further information it is not possible to state the reason for this seasonal trend, but it is likely that it is related to items generally used in summer, such as swimming costumes or sunscreen. As described in a following section, the ambient temperature plays a large role in determining weekly sales.

The figure below shows another store/department combination for the same time period. In this case, the figure shows spikes in demand during the months of August 2010, December 2010, December 2011 and August 2012. Clearly this is a more complex trend than the previous figure - it is also possible that the trends for August and December are unrelated. The utility of the machine learning forecast is clear - the model correctly predicts the significant periods of low sales, while also capturing the peaks in demand. Similar to previous discussions, the number of staff and amount of stock required is likely to vary significantly as a function of the fluctuating weekly sales figures.

Prediction Error
The figure below shows the cumulative distribution of error percentages. The median absolute error was $494.8 per week. 82% of errors were below 20%, and the small number of error values above this point are related to very low weekly sales amounts; the mean weekly sales amount for data points with errors above 20% is only $591, meaning that even small errors can result in large error percentages.
As mentioned earlier, it is likely that the company's tolerance for error would vary as a function of the dollar amount; minor inaccuracies at low dollar amounts would be acceptable if predictions for large dollar amounts were accurate. In this case, the mean error for weekly sales amounts over $10k, which account for only 42% of samples but over 83% of revenue, was 7.7%.

Feature Importance
Feature importance is an important additional output that can be provided by ML models. The figure below shows the features that are most important to predicting weekly sales, excluding store and department identification. In this case, it is interesting that the dominant factors - temperature, fuel price, CPI, week of the year, and unemployment rate, are not within the control of the company. From the figure, it is clear that environmental conditions and macroeconomic indicators are the most important features that affect weekly sales. Of the store-related values, the store size is the most important feature.

The figure above provides an illuminating insight into the wide range of applications of the sales forecasting engine. For example, were this company to consider prospective locations for new stores, the ideal location would be a city with amenable temperatures and inexpensive fuel, even if high land values meant that the store size was restricted. An alternative candidate city that offered cheap land (thus allowing for a larger store), but suffered from intemperate weather and high fuel prices would be a sub-optimal choice. The strong temporal effect is clear as well, indicating that any rostering policy must be flexible to account for seasonally varying demand. It is also clear that supply chain management must also take this seasonal factor into account to maintain the desired level of inventory.
Further work
Clustering is a technique that has previously been applied for retail sales forecasting [Chen & Lu (2016), Thomassey & Fiordaliso (2006), Cherian et al. (2018)]. In a previous article and paper [Rao & Rao (2020a)], we explored the use of Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to identify Geographic Clusters of Road Traffic Accidents in Victoria. It is likely that the DBSCAN algorithm could be applied to the present case study, and will be the target of a future publication.
Feature engineering is a powerful tool that can add additional information to a machine learning model. Typically, this is done using some level of domain expertise to generate features relevant to the specific problem. In the case of retail sales, some features that have previously been used include mean sales for a preceding time period and distance to competitor stores [Pavlyshenko (2019)]. Feature engineering was also discussed in our recent article and paper [Rao & Rao, (2020b)]. It is likely that some degree of feature engineering would improve the accuracy of these predictions.
References
R. Anbil, E. Gelman, B. Patty, and R. Tanga. Recent advances in crew–pairing optimization at American Airlines. Interfaces, 21:62–74, 1991.
M J Brusco, L W Jacobs, R J Bongiorno, D V Lyons, and B X Tang. Improving personel scheduling at airline stations. Operations Research, 43:741–751, 1995.
W. Baker, M.V. Marn, C. Zawada, Price smarter on the net, Harvard Business Review 79 (2) (2001).
Chen, I.-F., & Lu, C.-J. (2016). Sales forecasting by combining clustering and machine-learning techniques for computer retailing. Neural Computing and Applications, 28(9), 2633–2647.
Cheriyan, S., Ibrahim, S., Mohanan, S., & Treesa, S. (2018). Intelligent Sales Prediction Using Machine Learning Techniques. 2018 International Conference on Computing, Electronics & Communications Engineering (iCCECE).
Henderson, S. G., & Mason, A. J. (n.d.). Rostering by iterating integer programming and simulation. 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).
Juan Pablo Usuga Cadavid, Samir Lamouri, Bernard Grabot. Trends in Machine Learning Applied to Demand & Sales Forecasting: A Review. International Conference on Information Systems, Logistics and Supply Chain, Jul 2018, Lyon, France
Krishna, A., V, A., Aich, A., & Hegde, C. (2018). Sales-forecasting of Retail Stores using Machine Learning Techniques. 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS).
Liu, N., Ren, S., Choi, T.-M., Hui, C.-L., & Ng, S.-F. (2013). Sales Forecasting for Fashion Retailing Service Industry: A Review. Mathematical Problems in Engineering, 2013, 1–9.
Narahari, Y., Raju, C. V. L., Ravikumar, K., & Shah, S. (2005). Dynamic pricing models for electronic business. Sadhana, 30(2-3), 231–256.
Pavlyshenko, B.M. Machine-Learning Models for Sales Time Series Forecasting. Data 2019, 4, 15.
Rao, V. & Rao, R. Using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to identify Geographic Clusters of Road Traffic Accidents in Victoria, DBAI Technical Report, 2020a.
Rao, R. & Rao, V. Energy efficiency in buildings - the case for AI, DBAI Technical Report, 2020b.
Shakya, S., Kern, M., Owusu, G., & Chin, C. M. (2012). Neural network demand models and evolutionary optimisers for dynamic pricing. Knowledge-Based Systems, 29, 44–53.
A. Sahay, How to reap higher profits with dynamic pricing, MIT Sloan Management Review, 1532-9194 48 (4) (2007) 53–60.
Thomassey, S., & Fiordaliso, A. (2006). A hybrid sales forecasting system based on clustering and decision trees. Decision Support Systems, 42(1), 408–421.
Zhao, W., & Zheng, Y.-S. (2000). Optimal Dynamic Pricing for Perishable Assets with Nonhomogeneous Demand. Management Science, 46(3), 375–388.