• Nenhum resultado encontrado

4.2 Data Exploration

4.2.1 Data Features

FIGURE4.1: Trigonometric representation of the day in week

numeric. SaleYearis a numeric (integer) variable that represents the year e.g.

2017. The features described in this, as well as in the previous, paragraph depend only on theSaleDateportion of the key.

The variablesCloseDaysBeforeandCloseDaysAfterregard whether the store (part of record key) was closed in the short past or future days. Specifically, CloseDaysBefore represents how many days ago the store was closed. For instance, if the data record is on a Tuesday, and the related store is closed on Sundays, the value ofCloseDaysBeforewill be 2.

Similarly, CloseDaysAfter represents the number of days after which the store will close. For a better explanation, suppose that the current day is a Friday and the store is closed on Sundays. In this scenario, the features have the valuesCloseDaysBefore= 5 andCloseDaysAfter=2. Not only will these fea- tures capture standard closing days e.g. Sundays, but also major holidays like Christmas and Easter (in combination with the trigonometric month fea- tures). Also, it should be mentioned that some stores are open every day.

CloseDaysBeforeandCloseDaysAfterdepend on the (StoreID,SaleDate) portion of the key. For both variables, values over 7 are encoded to 0.

TheLatitudeandLongitudeare numeric variables that encode the location of the store on the map. This pair of variables was considered a better en- coding scheme than the categoricalStoreID variable, since they capture the geometric distance between stores. All other things being equal, stores that are very close to each other will probably achieve similar sales on a given

day. It is true thatLatitudeandLongitudedepend only on theStoreIDportion of the key.

MarketingCampaignsis a set of 15 boolean features, each one representing a different product promotional activity. Specifically, if a promotional activity took place for the record’s (Store,Date,Product), the related boolean feature is set to TRUE, otherwise it is set to FALSE. Some of these promotional activ- ities are sponsored by the retail business (instore activities) while others are sponsored by the product suppliers (external activities). Each promotional activity depends on different parts of the key. For instance, a promotion may target one or many products, in one or more stores. Among these boolean features are

• A television advertisement took place for the given product and date.

• A radio advertisement took place for the given product and date.

• The product was promoted in a retail booklet at the given date.

• The product was promoted in a wholesale booklet at the given date.

• The product is offered at a discounted price at the given store and date.

Each item belongs in certain product groups, which form a multi-leveled hierarchy. For example, a certain product may belong in the olive oils group, which is a subset of the oils group, which in turn is a subset of the groceries group. The olive oils group is considered a low level group, while the oils and groceries group arehigher level groups. Thus,GroupDescriptionis a set of nominal features that encode the different nested groups where the product resides. Because these features group together related and often antagonist products, they offer valuable information to forecasting models.

The next chapter will elaborate on how this information will be consid- ered during the modelling phase. EachGroupDescriptionfeature is connected to a numericGroupShare feature. The last represents the total amount of in- come (in Euros) that came from sales in this group, at the given store and day. While GroupDescription depends only on the ItemCode portion of the key,GroupSharedepends on the whole key (StoreID,ItemCode,SaleDate).

BrandName is a categorical (nominal) feature that is the brand (supplier company) of the product. Considering that popular brands make better sales than less known brands, this is an important feature for sales forecasting.

Especially, when a popular brand is supported by a marketing campaign, it may often dominate the sales of it’s group.

Products of different groups may have the same brand in cases where the supplier company creates multiple products. Some of these brands are private labels, which means that they belong to the retail industry. In this case,isPrivateLabel is set to TRUE, otherwise it is set to FALSE. Since prod- ucts that belong to a private label brand are promoted by special marketing campaigns, they are of special interest.BrandNameandisPrivateLabeldepend only on theItemCodeportion of the key.

The feature VolQty represents the physical volume of the product, mea- sured in liters. The same product may appear with a different set of values forVolQtybecause of the different promotional codes that exist. For instance, assume a standard product code (non-promoted) that has a volume of 0.75L.

A1+1promotion code of the same product will appear withVolQty = 1.5L . In this case, the promotion has the multiplicative effect of doubling the prod- uct’s volume. The numerical feature PromoQty is this exact multiplicative factor, such that in the aforementioned scenario,PromoQty=2. The following equation makes explicit the relationship betweenVolQtyandPromoQty. Both features depend on theItemCodeportion of the key.

V olQtyP romoted =V olQtyStandard∗P romoQty

The variableQuantity(Qty) represents the amount of daily sales. For some products, this quantity is expressed as units sold, while for other products it is expressed as weight sold. In any case, Quantity is a numeric variable, that also happens to be the target variable in this problem. Therefore, every forecasting model that will be mentioned hereafter will attempt to predict future sales, that is future values for the featureQuantity.

Given a recordrwith a key(rDate, rP roduct, rStore)from the historical database, Quantity represents the total quantity (units or weight) of product rP roduct

sold in the storerStore, at the daterDate. Therefore,Quantityis dependent on the whole key (StoreID,ItemCode,SaleDate).

This paragraph will introduce a set of more complex features, which as- pire to capture more information that can aid the forecasting process. In a record of key (StoreId,ItemCode,SaleDate), the featureDailyBasketNrrepresents the total number of baskets (transactions) that took place at the given store and date. Similarly, the feature CountItemBaskets is the number of baskets that contain the related product, at the given store and date. In the context of a product group,GroupShareCountis the number of distinct items that were sold (at least once) at the given date and store. Therefore,GroupShareCountis the number ofantagonist productsthat exist in the given group.

An obvious problem is that the above three features will not be known for the target date. However, their estimation is considered easier than the prediction ofQty.

Documentos relacionados