Item attribute weight in fashion assortment planning

(1)

Item attribute weight in fashion assortment planning

Ana Rita Amorim de Araújo Novo

Master’s Dissertation

FEUP Supervisor: Prof. Paulo Osswald

Integrated Master’s in Industrial Engineering and Management

(2)

(3)

Abstract

Fashion industry is a market in constant change, which leads to high unpredictability of next season sales. Future season’s collections have to be defined and ordered with at least one season of anticipation, making the planning of future collections a very sensitive issue for a company's success. Said success is very dependent of the forecasting ability of the company to predict, among other things, the assortment of product mix with relevant attributes to answer customer demand. In this context, the assignment of weights to those attributes constitutes a number of criteria that helps guide retailers selection of product variety for the following collection.

This project proposes a methodology to improve one of the steps taken in the assortment planning process, describing how it is performed in one of the biggest retail database and IT solutions applications. The goal is to provide a selection of product attribute weights that defines how products shall be selected for the next collection, while achieving a given managerial goal (in this case, maximization of sales).

A visual analysis was performed with historical data, aiming at finding relationships when crossing different inputs that could be reflected on sales. Afterwards, three different algorithms (Random Forest, Artificial Neural Network and Gradient Boost) were constructed and tested in order to identify the best-fitting methodology. In all cases, the goal was to extract features importance, in order to be able to assign them a priority order.

The methodologies were applied to large data from a multibrand Mexican fashion company and three different scenarios were defined, differing on the output variable to predict. The ordered scenarios predicted sales units, weekly sales units and a defined ratio, respectively.

It was observed that Neural Networks algorithm would not converge, probably due to the fact that all input features were categorical. A better performance was observed for Gradient Boost over Random Forest. Considering the order of magnitude from output variables in the three scenarios, the predictive error was considered to be high. Percentages were estimated for different feature weights when building the models and an example was provided allying their priority order and knowledge extracted from visual analysis. The presented methodology is flexible to the aggregation of product more convenient for the user and so is the choice of scenario to use.

(4)

Ponderação de atributos de produto no planeamento de composição

da coleção na indústria da moda

Resumo

A indústria da moda é um mercado em constante mudança, o que leva a uma elevada imprevisibilidade das vendas de coleções futuras. A definição da próxima coleção deve ser efetuada e encomendada com pelo menos uma estação de antecedência, o que leva a que o bom desempenho da empresa se torne fortemente suscetível ao planeamento de sucessivas coleções. Sucesso tal mostra-se muito dependente da capacidade de previsão da empresa, relativo, entre outras coisas, à seleção de produto com atributos relevantes para satisfazer a procura do cliente. É neste contexto que surge a ponderação dos pesos aos atributos de produto como forma de criar critérios de escolha que guiam os retalhistas na composição das coleções

Este projeto propõe uma metodologia para melhorar um dos passos que constituem o processo de planeamento de composição de produto, descrevendo como decorre numa das aplicações de grandes bases de dados e soluções de tecnologias de informação. O objetivo é providenciar pesos de atributos das peças de vestuário que definam a forma como deviam ser escolhidos para próxima coleção, otimizando um determinado objetivo de gestão, neste caso, a maximização de vendas.

Para esse fim, foi realizada uma análise visual com dados históricos, procurando encontrar correlações entre atributos que poderiam ser refletidas nas vendas. Foram posteriormente contruídos e testados três algoritmos diferentes (Random Forest, Artifitial Neural Networks e Gradient Boost) para identificar a metodologia mais adequada. Em todos os casos, o objetivo era extrair a importância das variáveis preditivas, de tal forma que permitisse atribuir-lhes um escalonamento de prioridades.

As metodologias foram aplicadas a um grande conjunto de dados de uma empresa de roupa multimarca mexicana, onde foram definidos três cenários diferentes para análise, distinguindo-se pela variável de saída. Os cenários ordenadamente previam vendas unitárias, vendas unitárias semanais e o e um rácio criado.

Observou-se que as redes neuronais não convergiram, provavelmente devido ao caráter categórico de todas as variáveis preditivas. Entre os restantes algoritmos, o Gradient Boost apresentou um melhor desempenho. Considerando a ordem de grandeza das variáveis de saída nos três cenários, o erro associado à previsão foi considerado elevado. Foram estimadas percentagens para os pesos das variáveis preditivas aquando da construção dos modelos, sendo posteriormente apresentado um exemplo que aliava a sua ordem de prioridade com as conclusões retiradas da análise gráfica. A metodologia apresentada é flexível ao nível de agregação de produto que for mais conveniente ao utilizador bem como a escolha do cenário a utilizar.

(5)

Acknowledgements

I would like to thank firstly to all people from Retail Consult – KSR, S.A. who gave me the opportunity to collaborate with them, sharing their experience and providing a very friendly work environment. A greater thank you goes to the planning team, specially to the those who were always available for clearing any doubts, Carlos Lima, Paulo Marques and Nuno Santos. Within the company and the University as well, I would like to thank Ana Andrade and Maria João Robles who were not only great team mates but also shared their knowledge and opinions when needed, becoming a great company during the past few months.

A big thanks goes also to Professor Vera Miguéis who despite having a busy agenda, was tireless on guiding me and giving me her best advice. I would like to thank Professor Paulo Osswald who always tried to be as fast and efficient as possible in orienting my dissertation.

I want to thank all my friends who supported me when work was harder or seemed to have no solution. People who made me believe I could take the project to the end and never gave up on my capacities to face this challenge. A big thanks to Daniela Pinho who besides her friendship, shared her wise and brilliant ideas to help me reach a more efficient way of solving the proposed problem, from an innovative perspectve. A big thanks to Sofia Casanova and Veniamin Dyachenko whose presence, words, and capacity of listen were of great importance in a phase of stress and hard work.

Last but not least, I would like to thank my brother and sister whose presence always cheered me up and made me feel home no matter the place I was. A big thanks to my mother for being so supportive and always taking the best care of me. And finally to the one person who allowed me to get here, making the biggest of sacrifices to guarantee my success and happiness, my beloved father. Thank you.

(6)

3.4 Problem definition ... 19 4 Proposed solution ... 20 4.1 Data preprocessing ... 20 4.1.1 Variables description... 20 4.1.2 Raw data ... 20 4.1.3 Lack of information ... 22 4.1.4 Data aggregation ... 22 4.1.5 Outliers analysis ... 24 4.1.6 Variables categorization... 26 4.1.7 Output variable ... 26

4.1.8 Preprocessing set of tasks ... 27

4.2 Statistical analysis over criteria selection – Tableau/SPSS ... 28

4.2.1 Predictive variables and Sales Units ... 28

4.2.2 Crossed analysis on sales and predictive variables ... 29

4.3 Predictive models – construction and results ... 36

4.3.1 Data splitting methods ... 36

4.3.2 Parameter tuning configuration ... 36

4.4 Models predictive performance ... 37

4.5 Importance of predictive variables ... 40

5 Conclusions and future work ... 46

References ... 47

APPENDIX A: Attributes vs Sales units boxplots ... 49

APPENDIX B: Random Forest features importance results ... 52

(7)

Acronyms and Symbols AI – Artificial Intelligence ANN – Artificial neural networks

ARIMA – Autoregressive integrated moving average CB – Catboost

ELM – Extreme learning machine IT – Information Technology MAE – Mean absolute error

MAPE – Mean absolute percentage error MSE – Mean squared error

PoC – Point of Commerce RF – Random Forest

RMSE – Root mean squared error

SARIMA – Seasonal autoregressive integrated moving average SKU – Stock keeping unit

(8)

List of Figures

Figure 1 – Retail Consult’s services (Consult 2019) ... 1

Figure 2 – Methodology taken for model comparison in (Loureiro, Miguéis, and da Silva 2018) ... 6

Figure 3 – Schematic representation of decision tree structure based on (Han and Kamber 2001) ... 8

Figure 4 – Gradient boosting learning progress representation based on (KDnuggets 2019) . 10 Figure 5 – Friedman’s Gradient Boost algorithm in (Natekin and Knoll 2013) ... 11

Figure 6 – Neural Network structure in (Han and Kamber 2001) ... 12

Figure 7 – Activation function scheme in (Han and Kamber 2001) ... 13

Figure 8 – Merchandise hierarchy ... 16

Figure 9 - Integrated fashion planning and optimization in (Goodman 2016) ... 17

Figure 10 – Product life cycle histogram ... 23

Figure 11 – Total sales units by brand and target consumer ... 24

Figure 12 – Regular sales units across gender ... 25

Figure 13 – Promotional sales units across gender ... 25

Figure 14 – Clearance sales units across gender ... 26

Figure 15 – Preprocessing flowchart ... 28

Figure 16 – Sales distribution across gender and product type ... 30

Figure 17 – Sales distribution across gender and brand ... 30

Figure 18 – Sales distribution across price range, fashionability and sale type ... 31

Figure 19 – Sales distribution across fashionability and brand ... 32

Figure 20 – Brand A sales distribution across product family ... 32

Figure 21 – Brands B, C and D’s sales units across product family ... 33

Figure 22 – Tops sales units across fashionability and product family ... 33

Figure 23 - Bottoms sales units across fashionability and product family ... 33

Figure 24 - Winter sales units across fashionability and product family ... 33

Figure 25 – Accessories, beachwear and sets’ sales units across fashionability and product family ... 34

Figure 26 – Tops sales units across color and product family ... 34

Figure 27 – Bottoms items sales units across color and product family ... 34

Figure 28 – Pants_2 sales units across color and product family ... 35

Figure 29 – Winter sales units across color and product family ... 35

Figure 30 – Sets sales units across color and product family ... 35

Figure 31 – Accessories sales units across color and product family ... 35

Figure 32 – Features importance results for tops ... 40

Figure 33 – Sales across gender and product type highlighting tops ... 41

(9)

Figure 35 – Sales across price range, fashionability an sales type ... 43

Figure 36 – Tops sales across fashionability levels ... 43

Figure 37 – Sales across brand A and product ... 44

Figure 38 – Sales across brand and product ... 44

(10)

List of Tables

Table 1- Assortment Planning business process tasks and responsible in (Goodman 2016)... 17 Table 2 – Variables’ description ... 20 Table 3 - Order of magnitude of output variables across scenarios ... 38 Table 4 – Models’ performance results ... 39

(11)

1 Introduction

1.1 Retail Consult – KSR, S.A. - Overview

Retail Consult provides IT consultancy services to companies in the retail industry. The company is a gold partner of Oracle, a company that provides software for several retail business areas in order to support strategic and operational decisions (Consult 2015b).

Retail Consult’s specialists take care of the implementation, customization and support of the functionalities of Oracle software. Their main goal is “Retail transformation, covering Merchandising, Planning & Optimization, Supply Chain, Store and Warehouse Operations, Integration and Architecture” (Consult 2015a). The company has several teams allocated to client’s projects, covering different areas shown in Figure 1.

The company was founded in 2011 by 6 entrepreneurs. In 8 years, it has grown exponentially, reaching 300 collaborators, and spread to 6 countries – Portugal, Germany, U.S.A., Mexico, Chile and Brazil. Currently, it has 20 customers whose business fits in one of the following industries: grocery, fashion, pharmacy, do-it-yourself, electronics and telecommunications (Consult 2019).

Figure 1 – Retail Consult’s services (Consult 2019)

1.2 Motivation

Assortment Planning is a complex, stepwise process that fashion companies need to do as they are preparing the collections to come, in which they assort items to certain Points of Commerce (PoC), in specific periods of time. A Point of Commerce (PoC) is a distribution channel, such as physical stores or e-commerce. Currently, empirical methods are used for the

(12)

assortment, where decisions are based on users’ expertise and choice and on their experience along with analyzing historical sales of different items. This leads to decisions highly biased on user’s experience. Thus, a new challenge is unleashed to find other ways to support this kind of decision in a more sustained way.

The motivation of this project is to explore some new opportunities to answer this challenge that comes up in a process that is transversal to all fashion retailers. Thus, this project searched and tested a solution that added data-based logical tools to improve decisions so that they no longer depend on the final user’s intuition only. This proposal of study came up in one of KSR's Planning and Optimizing team’s internal projects and its findings could be widely applied in several software applications that are currently in the market.

1.3 Objectives

The goal of the project is to overcome the challenge of depending only of experience for the identification of the key attributes that will drive the assortment decision, by defining a model or methodology that determines the best combination of item attributes for fashion retailers to achieve a defined managerial goal, such as sales volume maximization, for instance. The study of item attributes weights is performed in order to improve this empirical decision system, so that it has some data foundation for the decisions taken and no longer depends only on specialist’s subjective opinion.

The question being answered by this dissertation is the identification of one of many factors that can impact fashion retail and understand their relative weight, in concrete cases. Thus the purpose of the project rests on learning more about this market’s opportunities, helping companies to perform better in a very competitive market.

1.4 Methodology

The methodology approached to answer the mentioned objectives rests on three major steps: data preprocessing, visually analyzing historical data and machine learning models building. After these three phases, the alliance of knowledge extracted from the second and third steps is used for criteria selection.

Data from a fashion company was provided in order to explore the method proposed. Firstly this data required preprocessing which was a step-wise process, consisting on:

▬ Building a database to structure information; ▬ Detecting and removing irrelevant information;

▬ Defining the aggregation level towards time, space, product and consumer; ▬ Analyzing outliers;

▬ Transforming features;

▬ Defining output variables to predict.

Regarding the visual analysis, an overview was done on how each individual variable would impact sales and, afterwards, correlating two and sometimes three variables to look for relationships between features. For this, Tableau software was used to facilitate the graphic representation. This approach was chosen given the fact that all variables except for the target variable were categorical, which made it harder to look for correlations in a non-visual way. All graphs obtained showed sales distribution related to other variables, instead of its sum or average. This decision was taken so that a better comprehension could be made from how data was spread.

For building the models, firstly research had to be done in order to find the best packages from R and Python that could provide not only the predictions but also features importance from training the model. Afterwards, for each model, parameters for tuning were selected and

(13)

models were built in three phases – parameter tuning, training the model and testing its performance. After finishing this phase, it was possible to extract attributes importance from each model.

Lastly, an example of the application of the alliance between information from the last two steps is exposed, trying to depict in a real case application the solution proposed.

1.5 Dissertation structure

This document is organized in five chapters, where Chapter 2 explores literature review, focusing mainly on forecasting techniques applied to fashion, and providing an overview of the fashion industry characterization, the predictive methods explored in the area and an explanation of the algorithms chosen along with the performance evaluation measures. Chapter 3 defines the problem, by giving an overview of the assortment planning process and providing a concrete example of a widely used application that is inserted into the assortment planning software tools. Chapter 4 explains the methodology approached to reach the proposed solution being divided in preprocessing, visual analysis and data mining models construction. It also presents models results, exploring their application in a concrete example. Chapter 5 draws some conclusions about this work.

(14)

2 Review of forecasting techniques applied to fashion industry

Considering the problem proposed, the review was directed at finding a better way to understand the customer’s shopping behavior and fashion industry characteristics so that attributes can be weighted according to demand and effective sales. Exploring forecasting techniques was understood as the best way to extract this kind of knowledge, considering the lack of linearity and the variability that make fashion sales behavior so particular comparing to other retail areas.

2.1 Fashion industry

The Fashion industry currently has a very competitive environment, facing demanding challenges such as globalization and uncertain demand (Thomassey and Happiette 2007). Thus, it is required that companies have a well-defined and controlled supply chain, so that lead times of products are taken into consideration in the planning process (Thomassey and Fiordaliso 2006). In order to differentiate their companies from the competition and improve customer’s satisfaction, retailers must “provide right products at the right time while maintaining a good stock” (Xia et al. 2012, 253).

Many sales forecasting methods have been studied, in order to help predict the best possible assortment for retailers to answer customer’s wants and needs. However, apparel items have many factors that influence their demand and are not easily predictable. First of all, every season, new fashion tendencies are defined, and therefore, new items are produced. Thus, there is no available direct historical data to rely on, as products are always changing (Thomassey and Fiordaliso 2006). Furthermore, customer’s taste and choice priorities, such as item’s durability, comfort, style, quality and even ethical issues are factors that will impact client’s decision too (Loureiro, Miguéis, and da Silva 2018; Jegethesan, K.; Sneddon, J. N.; Soutar 2007).

Having a good sales forecast can help retailers achieve more accurate information regarding products to order. Companies can then take better advantage of their resources achieving more effective ways of optimizing them and reducing costs. A good prediction can also allow to minimize the number of lost sales and provides the possibility of achieving higher profits (Thomassey 2010).

Item and customer’s characteristics both influence demand. However, weather conditions, promotions, holidays and economic factors will play an important role as well (Loureiro, Miguéis, and da Silva 2018). Most retailers provide a high variety of Stock Keeping Units (SKU) that will go through a short life cycle on the respective PoC (Loureiro, Miguéis, and da Silva 2018). This particular feature needs to be considered since products are not present during the whole season, but only for a few weeks. As products are arriving to complete the collection, others are leaving. This dynamism is essential to attract customers because it enables them to discover new items every time they visit the PoC, however it increases the complexity of the forecasting process.

For all these motives, demand forecast plays a prominent role on the fashion field. However, few solutions have been found that help retailers understand which of the apparel items’ attributes (for instance color or fabric) are going to meet customer’s preferences and thus, be translated into sales.

2.2 Current methodologies

This section presents the mathematical and heuristic methods that are currently used for sales forecast, and review examples where these methods and their variations were applied and the respective results obtained.

(15)

2.2.1 Statistical and heuristic methods for prediction

Considering all the conditions mentioned about the fashion industry, several sales forecasting models have been studied over the years. These methods can be classified into statistical or mathematical methods and modern heuristic methods (Xia et al. 2012). Regarding the first ones, which mainly deal with time series, the most commonly used are ARIMA, SARIMA, exponential smoothing, regression models, Box and Jenkins, Holt Winters (Loureiro, Miguéis, and da Silva 2018; Thomassey and Happiette 2007), trend extrapolation and seasonal index (Ni and Fan 2011).

Despite the high variety of methods available, and their satisfactory performance in many other fields (Thomassey 2010), when applied in fashion retail there are several drawbacks, due to the assumptions required from the database. To be applied they need a dataset large enough to allow for patterns, trends and cyclical behavior to be found (Choi et al. 2012).

Another relevant aspect relates to the assumption of linearity, which restricts some of these models very much (Xia et al. 2012). Fashion items sales suffer from a strong influence of uncontrollable factors such as volatile fashion tendency, weather events and market price changes (Wong and Guo 2010). Thus, it is very unlikely for linear trends and repetitive cycles to be found (Wong and Guo 2010). Besides that, item are replaced by new ones, every season, so that very little continuous historical data exists on the product (Thomassey and Happiette 2007). Because of this, the analyst has often to aggregate data at higher level of customer segment, product classification or time, since the PoC SKU level does not allow him to have enough information for a good analysis. Along with the transformation of qualitative into quantitative data, these issues turn up as disadvantages for the use of this kind of forecasting techniques (Loureiro, Miguéis, and da Silva 2018). The uncertainty and volatility intrinsic to the market lead to an increase of complexity when optimizing the models parameters (Thomassey 2010).

On the other hand, heuristic methods are being increasingly adopted to perform forecast in fashion. There are several soft computing techniques that have been applied in order to improve decision support systems, namely in the assortment planning process (Ibrahim 2016). The textile industry shows fluctuating, disturbed and incomplete (Thomassey 2010) data, and because of that, the use of data mining techniques tries to tackle the unsolved problems of time series models, through the use of Artificial Intelligence. Fuzzy methods, neural networks, evolutionary algorithms (Thomassey 2010) and expert systems (Wong and Guo 2010) are the main tendencies of AI and they have been showing very positive results when it comes to identifying irregular patterns. These robust and imprecision-tolerant techniques (Ibrahim 2016) are more flexible and thus can adapt better to the peculiarities of the data.

However, according to the chosen model, some handicaps may become evident, regarding problems such as hard interpretability, long running time, and overfitting to the data. Keeping this in mind, studies have focused into exploring hybrid models that combine the advantages of time series and data mining techniques.

2.2.2 Application of prediction methods on real cases

Xia et al. (2012) applied a hybrid method combining extreme learning machine (ELM) and adaptive metrics to forecast sales, obtaining a better accuracy than artificial neural networks (ANN), auto-regression and extreme learning machine alone. The main difference between ELM and ANN relies on the definition of the output weights and biases, which in ELM are randomly assigned, leading to a higher learning speed and a higher generalization performance. For predicting the future color trend, a comparison of several models was made in a Choi et al. (2012) paper including ANN, a Grey Model, a hybrid between the two and a Markov switching approach. It was concluded that the hybrid model has the best accuracy but that it ran

(16)

slow, even with a small dataset. The second best model was ANN, even though it was the slowest among all.

Another method was proposed for sales forecasting where a Harmony Search approach and ELM were put together and applied to aggregated data. The aggregation was done by city and category, for different time scales: month, quarter and year. The model was generally good, but, as the aggregation level increased, its accuracy reduces (Wong and Guo 2010).

One more comparison of models was performed, this time between ANN, the fuzzy logic approach and a hybrid method including both of them. ANN showed to be very dependent on the data and the results showed that the hybrid method was no better than the individual methods (Thomassey 2010).

Despite all the existing literature on sales forecasting in fashion industry, little research has been done to try to measure the impact of item attributes on the final sales. However, there were some exceptions, such as Loureiro, Miguéis and da Silva which performed a comparison of shallow data mining techniques and deep learning neural networks. Afterwards, a sensitivity analysis was performed to understand the relative importance of the explanatory variables used in the model (2018). The deep learning method provided results which were hard to interpret and was shown to be a computationally expensive strategy, even though it could also be applied to small datasets. On the other side, methods like Random Forest and ANN had good performances as well (Loureiro, Miguéis, and da Silva 2018). Figure 2 shows the steps taken in model comparison approach.

Figure 2 – Methodology taken for model comparison in (Loureiro, Miguéis, and da Silva 2018)

There are two other studies that combine clustering with classification methods. In both situations, sales data was provided, and several product profiles were created through clustering methods, grouping items according to their sales curve. In the first paper, by Thomassey and Fiordaliso (2006) a decision tree was constructed to understand how lifespan, price and starting time of sales influenced each of these product profiles. Afterwards, Thomassey and Happiette (2007)chose ANN as classification method. These classification techniques are usefull because they receive information from new items and can link them to a given sales pattern from one of the profiles. The same explanatory variables were used in both studies. A disadvantage of the

(17)

decision tree method related to the computational time required for the iterative process (Thomassey and Happiette 2007; Thomassey and Fiordaliso 2006).

Furthermore, some research has been done in order to include human judgement in the forecasting models and understand its impact on prediction. Baecke, De Baets, and Vanderheyden (2017) studied two diferent ways of doing this, either by using a restrictive judgement model or an integrative judgement model. The first one constrains the forecast according to an expert’s opinion. The second includes that kind of information in the model as a predictive variable. Human judgement input can add great value to the forecast since experts can identify exceptional events that algorithms could not be trained to identify and predict (Baecke, De Baets, and Vanderheyden 2017). On Loureiro, Miguéis, and da Silva’s (2018) paper, domain experts’ opinion was included in the predictive models as a variable and showed up to be the most relevant variable to explain the sales, followed by products’ physical attributes.

It is important to understand that, according to the kind of study the analyst wishes to perform, data is handled differently. Some studies look only at sales along time, aggregating PoC or product categories in a given time scale (Ni and Fan 2011; Shoham 2008). In other situations, item behavior is evaluated according to its lifecycle, distinguishing basic items, from fashion items and bestselling items, for instance (Thomassey 2010). In this case, basic item are the ones that are sold no matter what time of the year it is, they are permanent in the assortment. Fashion item have short lifecycles and are produced only once. The bestselling items are slightly changed and reappear every year. Wong and Guo (2010) decided to study medium-term sales, aggregating data at category level and city for several time periods. A distinction between high-end, medium and basic fashion item was also performed (Xia et al. 2012). Few studies exist on SKU level in a sense that the products’ physical attributes are usually not taken into consideration. So, for the current study, the category aggregation, time scale and PoC group will also be taken into consideration, according to the data and goal to achieve.

2.3 Data mining algorithms

This section is dedicated to the explanation of the theoretical concepts behind the data mining techniques explored in the project.

Data analysis is a science that makes use of distinctive approaches and analyses different types of data, by applying supervised or unsupervised learning methods. Unsupervised learning methods are used when data does not have a target attribute or a specific output, and this is where clustering is included. As for supervised learning, it looks for patterns to reach a given target variable. Supervised learning is usually performed in one of two distinct ways, classification and prediction, also mentioned as regression. Both supervised learning methods share the common goal of obtaining information out of a dataset, through the production of models, differing only on the output they provide. Classification models result in categorical labels. Prediction models, also known as regression models, provide continuous-valued functions (Han and Kamber 2001). This project applies regression models, in order to predict sales, and further analyzes their sensitivity to the item attributes .

2.3.1 Random Forest

For review of the RF method, the book Data Mining: Concepts and Techniques from Han and Kamber (2001) was used as reference.

Random Forest cannot be explained without understanding the way decision trees proceed. For that reason, a brief foundation on Decision Trees is provided bellow.

“A decision tree is a flow-chart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent

(18)

classes or class distributions.” (Han and Kamber 2001, 284) An example of the structure described is presented in a schematic way in Figure 3.

Figure 3 – Schematic representation of decision tree structure based on (Han and Kamber 2001)

The decision tree predicts a classification by making it choose a path in the tree structure, where in every node a test regarding an attribute is answered defining the branch that will lead it to the next node. When it reaches a leaf node, the observation’s outcome is predicted. A training sample is used for creating the decision tree model, where the selected attributes first follow a heuristic that finds the ones which separate the data into distinct subsets more clearly according to an attribute selection measure. When the data in a subset all belong to a common class, the internal node (the attribute associated to it), becomes a leaf node instead, so there is no need to apply the heuristic mentioned any further. For every distinctive value an attribute can take, a branch is formed, and that is where samples will be partitioned. This process stops when it reaches one of three conditions: the whole group of observations from the subset belongs to a common class; or all attribute values have already been used and therefore the highest frequency value from the attribute appearing on the subset becomes a leaf node; or no observations have a given attribute value and a leaf node is created with the majority of attribute values of that subset.

The attribute selection measure is usually information gain or entropy reduction. The information gain measure aims to minimize the amount of information required to classify observations into partitions, making them as little random as possible and it is described through Equation (2.1). It is possible to see how to compute entropy through Equation (2.2).

I(s1,s2,…,sm) =- ∑mi=1p_ilog2pi (2.1) Where:

I, is the information needed to classify a training set S 𝑠i, is the number of samples that belong to class of object i p_i, is the probability of a sample belonging to class of object i

(19)

E(A)= ∑ s1j+…+smj

s I(s1j,…,smj)

v

j=1 (2.2)

Where:

E(A), is the entropy/expected information based on partitioning into subsets by A A, is the attribute tested for splitting (with 𝑗=1 to 𝑣 attribute values)

𝑠Ii, is the number of samples that belong to subset j and have attribute A

The analyst is looking forward to having the minimum entropy possible in all partitions since it is a measure of impurity. Equation (2.3) shows the entropy reduction associated to knowing the attribute A. The goal is to get the highest possible information gain.

Gain(A) = I(s1,…,sm)-E(A) (2.3) Where:

I, is the information needed to classify a training set S

E(A), is the entropy/expected information based on partitioning into subsets by A

A common problem that may come up in several data mining techniques happens when the model obtained includes outlying and irregular patterns from the training observations which are not representative of the population. This problem is defined as overfitting and in order to handle that situation in decision trees, tree pruning is usually performed. Tree pruning aims to remove least reliable branches and it can be done during the creation of the decision tree or after it is formed. The first case is named prepruning and it implies that some branches are transformed into leaf nodes instead of being further partitioned. It usually works with a chosen threshold so there is a required minimum number of partitions. The second case is postpruning and it works by calculating the error of branches after having a fully-grown tree and cutting them off according to the adopted postprunning approach. The decision can be taken by minimizing error or computational cost, according to the analysis conditions and requirements.

The big advantages of decision trees rely on their very user-friendly and intuitive interpretability and their capacity to deal with non-linear patterns. However, some disadvantages inherent to their nature come up, such as the lack of capacity of handling outliers correctly which leads to producing overfitting models. With the purpose of improving these problems and still making use of its advantages, random forests were put into practice.

A random forest is an ensemble method which aggregates the results from several randomly chosen decision trees, each of them grown by using a subset of randomly selected independent variables. As we are dealing with a regression problem, the final result is computed through the mean of the outputs provided by the decision trees (Gong et al. 2018; Loureiro, Miguéis, and da Silva 2018). The algorithm uses an ensemble meta-algorithm in order to create the random choice of trees, named as bagging (also known as bootstrap aggregating) (Tech 2016). This kind of selection is what differentiates the Random Forest’s robustness from the decision trees’ one. These models have higher accuracy and have a good capacity of dealing with big datasets, as they are not as sensible to little variations (Loureiro, Miguéis, and da Silva 2018).

2.3.2 Gradient Boosting

Gradient Boosting is an algorithm specially characterized by the way it learns. It makes use of a sequential model construction that builds new models which learn from the shortcomings of the previous ones. To define its configuration four inputs are required from the analyst. One of them being training data, a vector containing the input variables and the target variable. The second one would be the stopping criteria, which includes the number of iterations or a minimal error improvement. It is also required to choose a loss function and a base-learner.

(20)

Regarding loss functions, there are several available for selection and they are constrained by the data type of the output variable. In case of continuous target variables, squared or absolute error are usually selected functions. The loss function aims at measuring the model predictive performance error in order to minimize it. When it comes to base-learner models, these can be linear or smooth models, decision trees or other functions. Frequently decision trees are the most adopted. Figure 4 shows the sequential learning process associated with trees creation (Natekin and Knoll 2013).

Figure 4 – Gradient boosting learning progress representation based on (KDnuggets 2019)

In order to reach the best possible prediction values, the loss function is minimized using a gradient descent optimization. Gradient descent defines the direction in which the function decreases at the highest rate. Thus, the negative gradient of the loss function is computed so it can achieve its local minima in the fastest way. This approach makes use of a stepwise procedure, where the step-size can be defined differently for every leaf of the trees. Figure 5 shows pseudo-code describing Friedman’s Gradient Boosting algorithm

(21)

Figure 5 – Friedman’s Gradient Boost algorithm in (Natekin and Knoll 2013)

According to Figure 5, the user starts by defining the number of iterations, the loss function and the base-learner model. Then, he assigns a constant value for the initial estimation, usually the mean or median of the observed values. Then, begins an iterative process where the loss function gradient is computed and the new base-learner function fits the data. The gradient descent step-size is then calculated and finally the initial estimate is updated by the new one. This process is repeated for M iterations (Natekin and Knoll 2013).

2.3.3 Artificial Neural Networks

Artificial neural networks are currently one of the most explored data mining techniques due to their good accuracy and capacity of identifying and predicting uncommon patterns. Despite that, the long running time and little intuitive interpretability are some of their shortcomings. This algorithm is called neural networks, since it is composed by a set of units that serve as input and/or output and are connected to each other. The typical structure of an ANN is composed by 3 or more layers, the input, output, and hidden layers in the middle. The number of layers, along with the number of neurons define the network’s topology, and is chosen by the analyst, usually through an iterative process. It can become a very complex and time-consuming process to choose the “right” values for these parameters. Figure 6 shows the structure of ANN. The first layer receives inputs, and they are transferred to the following layer

(22)

until the last one provides the final outputs (Han and Kamber 2001). The review of ANN was based on Data Mining: Concepts and Techniques from Han and Kamber ( 2001).

Figure 6 – Neural Network structure in (Han and Kamber 2001)

A set of weighted connections relates every neuron to all the ones present in the next layer. A bias is also associated to every neuron and it considers all connections from the previous layer, defining a threshold for every connection. For creating this model, training data is provided to the network, so it can adjust its weights and biases to best represent the data. Usually, this kind of learning technique benefits from receiving big amounts of data. However, this may lead to overfitting, and instead of representing the population, the model represents only the sample’s behavior. As the outputs only go to the following layers, this is considered a feedforward mechanism.

The learning process, however, may be performed according to different algorithms. The most commonly used is backpropagation and it “learns by iteratively processing a set of training samples, comparing the network’s prediction for each sample with the actual known class label”(Han and Kamber 2001, 305). For each training sample, the weights are modified to minimize the mean square error (Equation (2.5)) between the network’s prediction and the actual class. The computation of this error goes from the last layer until the first (backwards). The weights are expected to converge at a given moment. Equation (2.4) represents the input for unit j as the weighted sum of the i output connections to the previous layer plus the bias, commonly known as the activation function. This is schematically represented in Figure 7.

Ij= ∑ wi ijOi+θj (2.4)

Where:

Ij, is the input for unit j

wij, is the weight of connection with unit i from previous layer Oi, is the output of unit i

(23)

Figure 7 – Activation function scheme in (Han and Kamber 2001)

Err = O_j(1-O_j)(T_j-O_j) (2.5) Where:

Err, is the Error

Tj, is thetrue output value A Oj, is the output of unit j

As the error is propagated, also the weights and biases are updated, using a learning rate that considers the error. The learning rate is a value defined between 0 and 1 that helps finding the global minimum, in smaller or larger steps according to its value. Their update functions are shown in Equations (2.6) and (2.7).

w_ij=w_ij+l*Err*O_i (2.6)

Where:

w_ij, is the weight of the connection between neurons i and j l, is thelearning rate

Err, is the Error

Oi, is the output of unit i

θ_j=θ_j+(l)Err_j (2.7)

Where:

θj, is the bias of unit j Err, is the Error l, is thelearning rate

These equations ((2.6) and (2.7)) are applied after the presentation of each observation; thus, this is called case updating. On the other hand, the analyst may decide to update only after all samples are used and accumulate these values in separate variables. That technique is called epoch updating.

Finally, the algorithm stops when it gets bellow the defined threshold for change in weights, (the second member of Equation (2.7)), or the frequency of wrong outputs obtained in the epoch is lower than a predefined threshold or a given number of epochs was completed (Han and Kamber 2001).

2.3.4 Performance evaluation metrics for regression

After applying the previously mentioned algorithms or any other machine learning model, a performance evaluation needs to be done, to measure how good the model predicts. For regression problems, some of the error measures commonly used can be seen in Equations (2.8),

(24)

(2.9), (2.10) and (2.11). Regarding these measures, they all aim to compare the predicted values with the real ones, commonly mentioned as observed values. In the case of MAE and MAPE, they work with the absolute difference. The other two measures square it. This difference is called the residual difference, since it takes the residual error from each observation and its estimated value. Regarding MSE and RMSE, these are more sensitive to outliers, since they assign weights to the residuals. A good model aims to reach the minimum values of these measures as it indicates their predictions are closer to the real values. (Swalin n.d.)

The awareness about the order of magnitude and range of target variable is crucial for results comprehension, since, a given measure can be considered high if its baseline is much smaller. However the same absolute value can represent a much smaller error when the order of magnitude of its baseline takes bigger values.

The presented measures do not have an absolute scale such as R-squared for example, as they are measured in the units of the target variable. Thus, a better model can be chosen by comparing it to other models or to the observed values range and average for instance.

MAE= _N1∑ |yN_i=1 _i-ŷ_i| (2.8) Where:

MAE, is the mean absolute error yi, is the observed value ŷ_𝑖 is the predicted value

N, is the number of observations

MAPE= 1 N∑ | y_i-ŷ_i y_i | N i=1 (2.9) Where:

MAPE, is the mean absolute percentage error yi, is the observed value

ŷ_𝑖 is the predicted value

N, is the number of observations

MSE= _N1∑𝑁 (y_i-ŷ_i)2

i=1 (2.10)

Where:

MSE, is the mean squared error yi, is the observed value ŷ_𝑖 is the predicted value

N, is the number of observations RMSE= √1 𝑁∑ (yi-ŷi) 2 N i=1 (2.11) Where:

RMSE, is the root mean squared error yi, is the observed value

ŷ_𝑖 is the predicted value

(25)

3 AS-IS Situation – Context and company’s solution

This chapter aims to explain the assortment planning process and provide a concrete example from a widely used application for this purpose, framing the item attribute weight challenge using a solution of Oracle Retail and showing what is currently made. Other software solutions could have been used, since assortment planning is transversal to all fashion retailers.

3.1 Assortment planning

According to Oracle Retail user guide, “Assortment Planning is a business function that merchants and planners perform to determine the appropriate mix of products that maximizes organizational goals: sales, profits, inventory turn, and so on” (Goodman 2016, 1-7).

Several planning tasks are required for a retailer to launch a new collection. Activities require supervising along the full supply chain, making it important to have an organized platform where managers can decide, communicate and save information regarding all processes from the moment an item is ordered to the supplier until it is sold at the final customer. The definition of profit goals, assortment selection, allocation of product to specific channels, getting familiar with fashion latest trends and many other activities require time, dedication and supervising, since increased efficiency can be gained through the use of a good decision support system tool. It is in this context that assortment planning is framed, being one of the many required steps to increase the companies likelihood to be successful.

The example of Oracle Retail is explored in higher detail, afterwards, in order to gain a more practical sense of how these systems are used, organized and applied nowadays.

3.2 Merchandise hierarchy

Typically, during the planning process, decisions about the products need to be taken at different levels of aggregation. For instance, in an early stage, the planner wants to define that women’s collection from the Spring/Summer season is going to be composed by an X number of T-shirts and a Y number of trousers. In a later moment it becomes necessary to define which models and colors will be chosen from the trousers category to have in store Z. At the next phase, one must decide which sizes of those specific trousers will be present on that store. This phenomenon leads to a common practice of creating a hierarchy to distinguish the level of aggregation one is dealing with. Several terminologies can be used. Figure 8 shows an example of designations that may be adopted for this purpose. Some of this nomenclature is present on this project in the datasets provided for analysis.

(26)

Figure 8 – Merchandise hierarchy

Regarding the concepts present in the analyzed dataset, it is important to be familiar with the definition of SKU, stylecolor, look group and look. A SKU – Stock keeping unit – represents the lowest aggregation level at which a product can be described, containing all the information describing the product type, characteristics, color, size and sometimes the season. Stylecolor is the level that comes right after SKU, identifying the second lowest product aggregation level, with the only difference of not discriminating the size. When it comes to the concept of look group, it usually refers to a campaign period in calendar, for instance to have the Spring/Summer collection from February to August. Looks are a way of making shorter time divisions contained into look groups, identifying not only the timeframe, but also the products available for sale on that time. A look group is composed by several looks. These concepts were created to make it simpler to manage the season.

3.3 Oracle Retail tools for assortment planning

3.3.1 Integrated fashion planning and optimization overview

Oracle Retail’s integrated fashion planning and optimization software is composed by several smaller modules, including the Assortment Planning one. These modules can be seen on Figure 9. The first one aims to create a Merchandise Financial Plan, where strategies are defined focusing mainly on the capital perspective of the business. Afterwards, retailers must plan the assortment of the product offering, along with all managerial decisions it implies. At Item Planning, there is a lower decision-making level followed by the definition of all promotional events that will happen in the time period considered. Clearance and Optimization are next accomplished, followed by Size Profile Optimization (Goodman 2016).

(27)

Figure 9 - Integrated fashion planning and optimization in (Goodman 2016)

This project’s motivation is to fill a gap regarding the assortment planning process. For that reason, a better understanding of this concept and associated software is necessary.

3.3.2 Assortment planning step by step

Assortment Planning is a module directed to several users with different management responsibilities. These people can be divided into four groups: the buyers (senior and regular buyers), store informants (including the visual merchant and the sales planner), merchandise and allocation. Table 1 shows all steps included on the assortment planning business process along with the responsible assigned to them.

Table 1- Assortment Planning business process tasks and responsible in (Goodman 2016)

▬ Plan Setup

Merchadise Financial Planning

Assortment

Planning Item Planning

Clearance Optimization

Engine Size Profile

(28)

In this phase, all program’s configurations must be done, so it can be prepared to receive the information users will provide as an input. The information is mostly related to store attributes and their values, targets to achieve, packs, available capacities and so on.

Looks and look groups are defined in this phase too. For planning the setup, strategies and item sets are assigned to looks and look groups. Besides that, goals are set for each look, through definition of the percentage of styles that are part of the assortment by attribute and color. For example, a look can be defined by having denim products, composed by 50% of shirts and 50% of jeans. Supposing this look group is assigned to a cluster of stores with 5 stores, where each will receive 100 items of this look. This would mean that each store would receive 50 denim shirts and 50 jeans to have exposed in the store for that period of time. The decision regarding how many items to assign this look and the assortment that composes it, will be reflected on the product the client finds on the shelves. So, if it is correct, the selected composition of products will correspond to the demand, having no over or understock. This would be the ideal for any company to achieve as it would represent a maximization of sales and no sunk costs. However, if it is wrong, sales opportunities can be lost and products can become overstock as they will not satisfy customer demand.

This is a very important step, as these percentages correspond to the weights of attributes that are being calculated on this project. Currently, these weights are chosen by the user, based on his experience, his perspective of the market and the conclusions he takes from going through data of items with similar behavior from past seasons, for instance. The objective of this dissertation relates to finding those weights based on data-driven predictive techniques.

▬ Selling Maintenance Curve

In this step, the sales planner can set manually or load from external systems, product lifecycle curve for new item selling patterns, according to similar products that existed in the past.

▬ Cluster Maintenance

Clusters of stores are defined in this step, according, either to the store performance (related to the strategy chosen in the looks phase, ex: sales growth), and/or according to store attributes, such as store size or temperature.

▬ Create the Shopping List

The Shopping list is where the senior buyer makes strategic decisions, working at a highly aggregated level of assortment. First, historical data is analyzed. Afterwards, the assortment size (number of styles) is set and the shopping list’s goals are defined, without disregarding the need to guarantee the financial objectives and to respect the constraints defined before. It’s also important to maintain the attributes and customer segment goals previously defined when planning the setup. However, they can be changed upon reconsideration. Finally, the list is created and then approved. During this process, relevant decisions are taken, including the definition of number of colors each style will carry and its sales potential, the quantities to buy from each item category and the selling goals to achieve.

▬ Fill/Build the Wedge

A wedge is a concept frequently used in retail due to the picture created when showing the dimension of the assortment grouped by clusters creating the shape of a wedge. This step is meant to perform the assortment with an increased detail, so it refines the Shopping List created before. Adjustments are made to the previously

(29)

defined strategies, in order to accommodate the lower level conditions. Besides that, the stylecolors are chosen here.

▬ Buy Planning

The buying plan defines the projection of sales over time and the receipt and inventory plan. It considers the initially defined targets, along with the assortment decisions made during the process and tries to allocate timings and quantities and predict sales and costs associated to different PoC. The receipt and inventory plan includes inputs from Build the Wedge step to which lead and lag times are assigned.

▬ Size/Pack Allocation

This is where optimization of product orders is made, taking into account packaging and product dimension restrictions.

▬ In-Season

At this point, the season has already begun and product is already spread around different PoC. Here, the performance of every item, store, look, and so on is reviewed and evaluated. If considered necessary, changes are made. (Goodman 2016)

3.4 Problem definition

The item attribute weights in this specific context are defined based on the managers’ experience, who provide a manual input for each of the weights. The object of this project is to provide a data-based suggestion for fashion companies to use and maybe tools such as the one from Oracle Retail may integrate in their solutions. The method proposed to explore this question, including visual analysis, models creation and feature importance prioritization can contribute to the purpose of better predicting fashion sales and understanding if the items physical characteristics have a high impact on them.

(30)

4 Proposed solution

This chapter aims to describe the methodology adopted to reach the proposed solution. Data from two years from a Mexican multibrand fashion company, named here Company X, was used. This company works with several brands simultaneously and has clothes, accessories, shoes and lingerie for all ages. It also has stores spread all around the country. A better description of the company’s available dataset is given in subchapter 4.1.

The methodology proposed involves three main steps. The first one is to understand, clean, organize and filter the data making use of the MySQL infrastructure to handle the database. The second step involves a visual data analysis making use of Tableau software to look for correlations of predictive variables. The third phase involves testing the performance of three machine learning models using Random Forest, Artificial Neural Networks and Catboost algorithms to identify relevant predictive variables according to their influence in sales. For that, RStudio and Spyder were used, exploring different packages that better suited the purpose. Finally, a proposal of criteria selection is made through a combination of the last two steps.

4.1 Data preprocessing 4.1.1 Variables description

Table 2 shows the final data obtained after the preprocessing phase, its type and a brief description. It also identifies the class which variables belong to, for better comprehension of the dataset. All predictive variables belong to the categorical type. However, three different scenarios regarding the target variable definition were tested on the models. Therefore, the table presents three output variables, in the Final Sales class, all of them numerical. Further explanation on how they were computed is given in section 4.1.7.

Table 2 – Variables’ description

Class Name Type Description

Time Year Ordinal Year when the sale happened

Product attribute

Brand Categorical Identifies the brand the item belongs to Color Categorical Identifies the color of the product Fashionability Categorical

Describes the product according to how it follows recent trends

Gender Categorical Describes the target consumer gender Price label Categorical Identifies the price range of the product Product family Categorical Identifies the description of the product’s use

Product type Categorical Groups product family according to product function Sales

condition Sale type Categorical

Distinguishes the sale context on which the product was sold

Final Sales

Sales Units Numerical Total units of product sold in Year X, under Sale type Y Sales Units per

Week Numerical

Total units of product sold in Year X, under sale type Y per week

Ratio Numerical

Percentage of time with sales + Percentage of sales over average + Percentage of full price sales

4.1.2 Raw data

Data was extracted from the records saved on Oracle Retail’s application and was registered into four main fields: hierarchies, attributes, sales and assortment.

(31)

Hierarchy data describes all possible aggregation levels with respect to calendar, location and product. This kind of table contains information from the most detailed level of each of these topics - day, store and stylecolor, respectively, until the most generalized. Their structure can be seen bellow.

Attributes information was provided on product and store with the following variables, respectively:

 STORE: Store | Channel | Region | Store Type | Weather

These store attributes were disregarded due to their incoherence with the location hierarchy.

Sales information was registered at two different product aggregation levels: SKU and stylecolor. The information regarding stylecolor sales was recorded distinguishing three different kinds of sales - Regular, Promotional and Clearance. Regular sales can be described as when a product is sold at its standard price, without any sort of campaign or change in price appealing for it to be bought. Promotional sales, on the other hand, are the exact opposite. They refer to products that have been subject to strategies such as changes in price, advertising or special location assigned in the stores, to catch customer’s attention and foster sales. Clearance sales happen when the product is identified as a sunken cost to the company, by not having sales, and thus becoming unsold stock. Other special situations may lead to selling products in clearance. In these occasions, its price is lowered until the product is sold, since the company no longer expects to make a good profit with those products.

The data provided at SKU sales level did not record the different types of sales, so the decision was taken to consider the stylecolor as the lowest aggregation level for the product.

When registering sales, it is important to refer the conditions in which the sale was made, regarding not only the product being sold but also where, when, at what price and in what sale circumstance. For that reason, the sales table is structured as follows:

Six of the variables from this table have the first name referring to the sale type performed (Regular, Promotional or Clearance) and the second identifying if it refers to the number of units sold – Units - or the value of sales in monetary units (Mexican pesos in this case) - Retail. Each observation was recorded by store, week and stylecolor. Zero-valued sales were not registered. Cost Retail refers to the cost of the product for the Company X.

The assortment information aims to inform about the stylecolors available for sale in a specific group of stores and period of time. This information could not be used as it didn’t have data for all stylecolors variety present within sales records.

 ASSORTMENT: Look | Look group | Store cluster | Stylecolor

Another problem in the assortment data relates to the fact that, most often these looks do not correspond to the real time products are in store. Even though there is an initial

(32)

assortment plan, the most likely situation to happen is for the product to be longer or shorter periods in the stores as a consequence of manager decisions upon verifying the sales behavior in real time. This way, many changes to these initially defined assortments are made. Thus, they don’t describe the true behavior of an article in a store.

An important detail regarding the assortment refers to the big variety of existing look groups due to the different seasons considered for each of the brands of Company X, the target consumer and the store cluster. For instance, a possible look group can be “Autumn/Winter Man Brand 1 Cluster A”. So, a very high number of possible combinations could be taken from here.

4.1.3 Lack of information

Despite the high number of variables present on the raw data, many of them were eliminated from the dataset due to the lack of filled-in data. For that motive a rule was created defining a minimum threshold of 70% of non-NA data for a variable to be considered for the final dataset. This led to a big reduction of data.

Another criterion that defined the lack of information of an input is its relevance. A variable was considered relevant if it would add value to describing the product’s characteristics and conditions in which sales were performed. Keeping in mind the final goal of understanding which product criteria could be used for a suggestion of assortment that will help maximize sales is, therefore, the core filter to choose variables to keep. For example, color is a variable that contributes to the product description, so it was kept. However, company cost is not an information that has a direct impact on sales or consumer’s interest in buying a certain item. Thus, it was eliminated.

4.1.4 Data aggregation

This project aims to implement a solution to support high level decisions on assortment planning. Thus, it becomes necessary to define in which timeframe, space division, product level and target consumer to decide.

Starting by the timeframe, the first option, which is more intuitive, is to predict these criteria for season (Spring/Summer, Autumn/Winter). Firstly, this option was considered. However, after looking at data it was possible to see that this seasonal division does not make sense for Company X, due to the timespans products remain in stores and the high variety of weather conditions found in different regions of Mexico. When it comes to the product, data shows that some products cross seasons and for that reason, this separation would probably not make sense. In this dataset, product attribute fashionability comes from the Subclass classification of items originally in the raw data and the sort of categories it considers is commonly used to describe the lifecycle of a product, where usually a basic item stays for longer periods in stores while fashion items have short life cycles. Figure 10 shows that despite products having different classifications for fashionability, most of them sell for a small number of weeks, and the remaining products can go up to 52 weeks from the first to the last registered sale. So, fashionability does not relate only to product lifecycle.

(33)

Figure 10 – Product life cycle histogram

The second reason for the time aggregation level choice relates to the fact that stores are spread all around Mexico which means that the weather from one store to the other can vary a lot, since temperatures and rain conditions can be very different from region to region (Barbezat 2018). Therefore, the decision was to predict for a whole year. Company X defines a year starting in February and going to February of the following year. This means including 2 seasons that complete a year. So, a year is the time aggregation level from February to February defined for this project.

Concerning geographic space, sales could be evaluated by store, region or all together. Considering the kind of decision being made, stores were considered all together since the assortment planning module has a later store assignment and clustering section that takes care of the store allocation.

Regarding the product and its target consumer, the company has a classification that distinguishes brands, from A to E, considering three variations of brand A, which represents almost half of the total sales. Company X also considers seven target consumer groups including Women, Man, Babies and Kids. Within these groups, there still is a classification into Accessories, Shoes and Underwear which despite being classified within the target consumer groups, include all types of consumer, differing only on the kind of product being sold. In order to choose a more homogeneous target consumer and products with more similar behavior, only clothes were considered for analysis, so accessories, underwear and shoes were discarded. Sales for different target consumers are presented on the graph from Figure 11, distinguishing brands. The values are merely representative of the real sales.

Item attribute weight in fashion assortment planning