Model Evaluation - Applying Machine Learning For Retail Demand Forecasting

5.2 Results

5.2.2 Model Evaluation

This section concerns the evaluation of different forecasting methods on the RDF problem. For this experiment, the forecasting methods run on a sample of the historical sale database. The sample’s sales records regard (a) ten groups of products (the same groups that were used in section 5.2.1 ), (b) three stores of the retail industry, and (c) sales records from the time period 2015-2016. Also, the sample’s records are split into ten datasets, according to product group, which became the training sets for the subsequent algorithms. Specifically, the records for the seven last days where held out from the training sets, in order to form the respective test sets.

The set of methods features algorithms from both the Statistical Time Se- rie Forecasting and Machine Learning fields. The models were trained to respond to forecasting queries (StoreId,ItemCode,Date). These methods are:

• Thenaive, average and drift methods, to be used as benchmarks.

• TheARIMAmethod, implemented in packageforecast

• TheRandom Forestalgorithm, implemented in packagerandomForest

• TheXGBoostalgorithm, implemented in packagexgboost

The MAE and MASE metrics (section 2.4.9) were estimated for all products daily sales, aggregated by store. The product group MAE and MASE metrics are then derived by averaging over their respective products’ metrics. It should be noted that, while the summation of MASE values is consid- ered statistically sound, the same does not hold for MAE values of different scale. However, it is true that MAE values were only summed for products of the same group, which also always have similar scale.

FIGURE5.3: The groups’ average MAE estimations for the six forecasting methods.

Figures 5.3 and 5.4 illustrate the aforementioned group metrics for all pairs (group,method). The average method had the worse performance in most groups, while the naive and drift methods have very similar performance on all groups, being the worst in three groups. The ARIMA method’s performance was slightly better, losing only twice to a benchmark method.

It should be noted that these methods do not learn from any other feature beyond the number of sales (Qty).

The Random Forest’s error metrics were significantly lower than the ARIMA’s, in all groups. This is due to the data features (section4.2.1) that were used

FIGURE5.4: The groups’ average MASE estimations for the six forecasting methods.

for training the Random Forest models. Specifically, these features explain a portion of the variance in sales (for instance, due to marketing campaigns features) that were regarded as noise by the previous methods. The categor- ical features wereone-hot encodedprior to training.

The most sophisticated method, XGBoost, was the best performing algo- rithm in all groups. XGBoost used only the subset of features from section 4.2.1that had a numerical interpretation. Nonetheless, the difference in Ran- dom Forest’s MASE metrics were at least -10% in all groups, even though Random Forest models were given more features.

Group MASE statistics (min, mean, median, max and standard deviation) were computed for each forecasting method. The aforementioned statistics are illustrated in figure5.5. With the use of the mean statistic, the methods are ranked from best to worst as 1) XGBoost, 2) Random Forest, 3) ARIMA, 4) naive method, 5) drift method and 6) mean method. Finally, the average MASE estimations are graphically illustrated in5.6.

FIGURE5.5: Group MASE statistics for the six forecasting methods.

FIGURE5.6: Average MASE plot for all groups and algorithms.

Bibliography

[1] Garcia-Molina H., Ullman J., Widom J. Database Systems: The Complete Book.Pearson, 2008.

[2] Kuhn M.Building Predictive Models in R Using the caret PackageJournal of Statistical Software Vol 28, 2008.

[3] Wickham H., Grolemund G. R for Data Science: Import, Tidy, Transform, Visualize, and ModelData O’Reilly Media, 2017.

[4] Liaw, Andy, and Matthew Wiener.Classification and regression by random- Forest.R news 2, no. 3 18-22, 2002.

[5] Louppe, Gilles, et al. Understanding variable importances in forests of ran- domized trees.Advances in neural information processing systems. 2013.

[6] Arlot, Sylvain, and Alain Celisse.A survey of cross-validation procedures for model selection.Statistics surveys 4 40-79, 2010.

Chapter 6

Conclusion

Solving a real world RDF (Retail Demand Forecasting) problem requires an understanding of the general forecasting problem, as well as of the challenges that appear in practice. In a similar fashion, the first part of this thesis was concerned with the theory of predictive modeling, borrowing from the areas of Time Series Analysis and Machine Learning.

The next part provided descriptive analysis for the data that are usually found in historical sales databases, also suggesting transformations for the engineering of new data features. Then, the task was to design the architecture for a RDF software solution, under the restriction of certain require- ments. The workings and necessity for each module of the architecture were thoroughly explained.

The last part of this thesis regarded the evaluation of a) the engineered data features and b) a set of forecasting methods. For the purpose of this evaluation, sales records for ten large groups of products where taken as a sample from a RI database. The estimated pIncMSE metric for most features (that is the increase in MSE after shuffling the values of the feature) were positive in all groups. The models were ranked according to their performance on all groups via the MASE metric. The ultimate ranking was XGBoost, Random Forest and Arima, followed by the benchmarks drift method, naive method and mean. Finally, a few points will given as directions for future work:

• Implement the architecture over the Spark platform in order to make more modelling decisions feasible, such as the training of a global model.

• Generate hierarchical features from sales data via the application of neural networks.

• Decide how many forecasting models are necessary by clustering the sales records.

Bibliography

[1] Zaharia, Matei, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. HotCloud 10, no. 10-10 (2010): 95.

[2] Zaharia, Matei, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng et al.Apache spark: a unified engine for big data processing.Communications of the ACM 59, no. 11 (2016): 56- 65.

[3] Thomassey, Sébastien, and Antonio Fiordaliso. A hybrid sales forecasting system based on clustering and decision trees. Decision Support Systems 42, no. 1 (2006): 408-421.

[4] Thomassey, Sébastien, and Michel Happiette.A neural clustering and classification system for sales forecasting of new apparel items.Applied Soft Com- puting 7, no. 4 (2007): 1177-1187.

Appendix A

R Libraries

1. assertthat 2. backports 3. base64enc 4. BH

5. bigmemory 6. bigmemory.sri 7. bindr

8. bindrcpp 9. bit

10. bit64 11. bitops 12. blob 13. broom 14. caret 15. caret.ts 16. caTools 17. cli 18. clue

19. colorspace 20. crayon 21. curl

22. CVST 23. cyphr 24. data.table 25. DBI 26. dbplyr 27. ddalpha 28. DEoptimR 29. DescTools 30. devtools 31. dichromat 32. digest 33. dimRed 34. doParallel 35. dplyr 36. DRR 37. DT 38. dtplyr 39. dtw 40. dtwclust 41. dygraphs 42. evaluate

43. expm 44. expsmooth 45. flexclust 46. fma 47. foreach 48. forecast 49. formatR 50. fpp2 51. fracdiff 52. gclus 53. getPass 54. GGally 55. ggplot2 56. git2r 57. glue 58. gower 59. gtable 60. highr 61. hms 62. htmltools 63. htmlwidgets 64. httpuv

65. httr 66. imputeTS 67. ipred 68. iterators 69. jsonlite 70. kernlab 71. knitr 72. labeling 73. lava 74. lazyeval 75. lmtest 76. lubridate 77. magrittr 78. manipulate 79. markdown 80. memoise 81. Metrics 82. mime 83. mnormt 84. ModelMetrics 85. modeltools 86. munsell 87. mvtnorm 88. NLP 89. numDeriv 90. odbc 91. openssl

92. packrat 93. pillar 94. pkgconfig 95. pkgmaker 96. PKI

97. plogr 98. plyr 99. praise 100. prettyunits 101. prodlim 102. progress 103. proxy 104. psych 105. purrr 106. quadprog 107. quantmod 108. R6

109. randomForest 110. RColorBrewer 111. Rcpp

112. RcppArmadillo 113. RcppEigen 114. RcppRoll 115. RCurl 116. recipes 117. registry 118. remotes

119. reshape 120. reshape2 121. RJSONIO 122. rlang

123. rmarkdown 124. rngtools 125. robustbase 126. RODBC 127. rprojroot 128. rsconnect 129. RSpectra 130. rstudioapi 131. sandwich 132. scales 133. secure 134. sfsmisc 135. shiny 136. skmeans 137. slam 138. sodium 139. sourcetools 140. stinepack 141. stringi 142. stringr 143. strucchange 144. testthat 145. tibble

146. tidyr 147. tidyselect 148. timeDate 149. tm

150. tseries 151. TTR

152. urca 153. utf8 154. vars 155. viridisLite 156. whisker 157. withr

158. xtable 159. xts 160. yaml 161. zoo

Appendix B

Retail Demand Forecasting

Solutions From Kaggle

B.1 Walmart Recruiting I - Store Sales Forecasting

Challenge¹

• 45 Walmart stores located in different regions

• Each store contains many departments 100

• Walmart runs 5 kinds of promotions (M1-M5)

• Data ranges from 2010-02-05 to 2012-11-01.

• The competition was evaluated on the weighted mean absolute error (WMAE)

• Errors for non-holiday weeks are weighted by 1

• Errors for holiday weeks are weighted by 5

• Target: predict weekly sales for following weeks Features

• Store ID

• Department ID

• Date (specify the week, always a Friday)

• Whether the week is a special holiday week

• Weekly sales

Additional Information for stores

• Type

1https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting

• Size

Additional Information for pairs (StoreID, Date)

• Average temperature in the region

• Cost of fuel in the region

• M1 – M5

• Consumer price index

• Unemployment Rate

• Whether the week is a special holiday week First Place Entry Solution (user David Thaler)²

• Extracted a week-of-the-year feature ( 1-52)

• 5 time series models (TSM1 – TSM5)

– three used STL decomposition ( stlf() from R) – two used ARIMA ( auto.arima() from R)

• 3 simple models (SM1 – SM3)

– a linear regression model with seasonal(weekly) dummy variables – a seasonal naive model

– a product model The combined model was

Average(T SM1, .., T M S5, Average(SM1, SM2, SM3))

TSM1: SVD + stlf/ets - this model applied SVD to the training data for pre- processing, and then forecast each series with stlf(), using an exponential smoothing model (ets) for the non-seasonal forecast. (Best performing single model)

TSM2: SVD + stlf/arima - the same, but with an arima model for the non- seasonal forecast

TSM3:Standard scaling + stlf/ets + averaging - like TSM1, but SVD was not used. Instead, the data were standard scaled, and a correlation matrix was computed. Then forecasts were made and several of the closely correlated series were averaged together, before restoring the original scale.

2https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/

discussion/8125

TSM4:SVD + seasonal arima - This used auto.arima() from the forecast pack- age.

TSM5:non-seasonal arima with Fourier series terms as regressors – This also used auto.arima(), but as a non-seasonal arima model, with the seasonality captured in the regressors

No documento Applying Machine Learning For Retail Demand Forecasting (páginas 77-93)